Medium Data API: Extract Structured JSON in 2026
Learn how to extract structured Medium data via API using AlterLab's Extract API to get JSON fields like title, author, date, tags, and URL with zero parsing.
TL;DR
To get structured Medium data via API, define a JSON schema for the fields you need (title, author, published_date, tags, url) and POST it to AlterLab's Extract API endpoint. The service returns validated JSON in a single request, handling anti‑bot measures and delivering typed output without any HTML parsing.
Why use Medium data?
Medium hosts a vast repository of technical articles, making it a valuable source for several engineering workflows. Teams building large language models often scrape public tech blogs to diversify training data with real‑world explanations and code snippets. Product analysts use Medium feeds to monitor competitor announcements, emerging frameworks, and developer sentiment for strategic planning. Data engineers also create pipelines that enrich internal knowledge bases with curated external content, improving search relevance and recommendation quality.
What data can you extract?
All article metadata visible on a public Medium page is accessible through structured extraction. The most commonly requested fields for tech‑focused pipelines include:
- title: The headline of the article as displayed.
- author: The display name of the writer or publication.
- published_date: The ISO‑8601 timestamp when the story was posted.
- tags: Topic tags attached by the author (e.g., "Python", "AI", "Startup").
- url: The canonical URL of the article, useful for deduplication and linking. These fields are sufficient for indexing, citation tracking, and trend analysis without needing to process full‑text HTML.
The extraction approach
Attempting to pull Medium data with raw HTTP requests and HTML parsers leads to brittle pipelines. Medium’s page structure changes frequently, its class names are obfuscated, and anti‑bot mechanisms challenge simple scrapers. Maintaining selectors, handling pagination, and dealing with intermittent blocks consumes engineering effort that could be spent on downstream analysis. A data API abstracts these concerns: you specify the schema you want, the service retrieves the page, applies AI‑guided extraction, validates the output, and returns clean JSON. This approach treats the web as a database, letting you focus on what data means rather than how to get it.
Quick start with AlterLab Extract API
AlterLab’s Extract API accepts a target URL and a JSON schema, then returns the matched data. Below is a minimal Python example that pulls the title, author, and published date from a sample Medium post. See the Extract API docs for full parameter details.
import alterlab
client = alterlab.Client("YOUR_API_KEY")
schema = {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title field"
},
"author": {
"type": "string",
"description": "The author field"
},
"published_date": {
"type": "string",
"description": "The published date field"
}
}
}
result = client.extract(
url="https://medium.com/@example/introduction-to-llms-2026",
schema=schema,
)
print(result.data)The equivalent cURL request looks like this:
curl -X POST https://api.alterlab.io/v1/extract \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://medium.com/@example/introduction-to-llms-2026",
"schema": {"properties": {"title": {"type": "string"}, "author": {"type": "string"}, "published_date": {"type": "string"}}}
}'Both snippets produce a JSON payload similar to:
{
"title": "Introduction to LLMs in 2026",
"author": "Jane Doe",
"published_date": "2026-02-14T08:30:00Z",
"url": "https://medium.com/@example/introduction-to-llms-2026"
}Define your schema
The schema parameter drives the entire extraction process. You declare each desired field with a type (string, number, boolean, array) and an optional description that helps the underlying model locate the correct element on the page. AlterLab validates the returned data against this schema, guaranteeing that every property exists and conforms to the declared type. If a field cannot be found, the API returns an error rather than guesswork, preventing silent data corruption. For the Medium use case, a typical schema might look like:
{
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"},
"published_date": {"type": "string", "format": "date-time"},
"tags": {"type": "array", "items": {"type": "string"}},
"url": {"type": "string", "format": "uri"}
},
"required": ["title", "author", "published_date", "url"]
}By supplying this schema to the extract endpoint, you receive a typed JSON object ready for direct insertion into a data warehouse or feature store.
Handle pagination and scale
When extracting dozens or thousands of Medium articles, efficiency matters. AlterLab supports high‑volume workloads through asynchronous job submission and built‑in rate‑limit handling. You can batch many extract requests into a single API call using the jobs endpoint, or parallelize calls with asyncio in Python. The following example demonstrates fetching a list of article URLs concurrently:
import asyncio
import alterlab
async def extract_one(client, url, schema):
return await client.extract(url=url, schema=schema)
async def main():
client = alterlab.AsyncClient("YOUR_API_KEY")
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"},
"published_date": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}},
"url": {"type": "string"}
}
}
urls = [
"https://medium.com/tag/python",
"https://medium.com/tag/ai",
"https://medium.com/tag/data-science"
] # In practice, generate this list from a sitemap or search API
tasks = [extract_one(client, u, schema) for u in urls]
results = await asyncio.gather(*tasks)
for r in results:
print(r.data)
asyncio.run(main())This pattern scales to thousands of URLs while respecting AlterLab’s concurrency limits. For cost estimates, visit the pricing page; you pay only for successful extractions, with volume discounts available at higher tiers.
Key takeaways
- Structured data extraction replaces fragile HTML parsing with a schema‑driven, AI‑powered API.
- Medium’s public article metadata (title, author, date, tags, URL) maps cleanly to JSON fields.
- AlterLab’s Extract API handles anti‑bot measures, validation, and scaling so you can focus on analytics.
- Start with a simple schema, test on a single URL, then expand to batch or async workflows for production pipelines.
- Always review Medium’s robots.txt and Terms of Service before scraping public data.
Extract structured tech data from Medium
Was this article helpful?
Frequently Asked Questions
Related Articles

AlterLab vs Apify: Best API for AI Agent Data Pipelines
Compare AlterLab and Apify for AI agent data pipelines: success rates, latency, anti-bot handling, pricing, and ease of integration to pick the right scraping API.
Herald Blog Service
AlterLab vs ProxyCrawl: Which Scraping API Is Better in 2026?
Discover whether AlterLab or ProxyCrawl is the better web scraping API for your project in 2026, comparing pricing, features, and ideal use cases.
Herald Blog Service
AlterLab vs ScrapFly: Which Scraping API Is Better in 2026?
A factual comparison of AlterLab and ScrapFly web scraping APIs covering pricing, features, and use cases to help developers choose the right tool in 2026.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Anti-Bot Handling API
Automatic challenge handling for protected sites — works out of the box.
JavaScript Rendering API
Render SPAs and dynamic content with headless Chromium.
Pricing
5-tier pricing from $0.0002/page. 5,000 free requests to start.
Documentation
API reference, SDKs, quickstart guides, and tutorials.
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.