How to Scrape Medium Data: Complete Guide for 2026
A practical guide to scraping publicly accessible tech data from Medium using Python and Node.js with AlterLab's web scraping API in 2026.
AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.
Try it freeTL;DR: To scrape Medium data in 2026, use AlterLab's API with automatic anti-bot handling. Start with T1/T2 tiers for public pages, escalate to T3/T4 for protected content, and extract structured data via Cortex for typed JSON output. Always respect robots.txt and rate limits.
This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.
Why collect tech data from Medium?
Medium hosts valuable public technical content useful for:
- Tech trend analysis: Monitor emerging frameworks, libraries, and architectural patterns in engineering blogs
- Competitive intelligence: Track how companies discuss product launches, API changes, or infrastructure shifts
- Content aggregation: Build curated feeds of high-quality technical articles for internal knowledge sharing or newsletters
Technical challenges
Medium implements standard anti-bot measures including rate limiting based on IP reputation, header validation (User-Agent, Accept), and occasional JavaScript challenges for suspicious traffic. Raw HTTP requests often receive 429 or 403 responses. AlterLab's Smart Rendering API mitigates these through:
- Automatic proxy rotation from a large residential pool
- Dynamic header management mimicking real browsers
- Tier escalation from T1 (curl) to T4 (headless browser) as needed
- Built-in retry logic with exponential backoff
Quick start with AlterLab API
See the Getting started guide for SDK installation. Below are examples for scraping a public Medium tech article.
Python example:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape("https://medium.com/@example/understanding-react-19-compiler-abc123")
print(response.text[:500]) # First 500 chars of HTMLNode.js example:
import { AlterLab } from "@alterlab/sdk";
const client = new AlterLab({ apiKey: "YOUR_API_KEY" });
const response = await client.scrape("https://medium.com/@example/understanding-react-19-compiler-abc123");
console.log(response.text.slice(0, 500));cURL example:
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-d '{"url": "https://medium.com/@example/understanding-react-19-compiler-abc123"}'Extracting structured data
For consistent data extraction, target these common CSS selectors on Medium article pages:
- Title:
h1[data-testid="storyTitle"]orh1.graf--title - Author:
a[data-testid="authorName"]ora[data-action="show-user-card"] - Publication date:
time[datetime](ISO 8601 format indatetimeattribute) - Reading time:
span[data-testid="readingTime"] - Claps:
button[data-testid="clapButton"](note: requires interaction for real count; static count may be in adjacent text) - Tags:
a[data-action="show-tag"]within the tag container
Example Python extraction:
import alterlab
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
html = client.scrape("https://medium.com/example/page").text
soup = BeautifulSoup(html, 'html.parser')
tags = [tag.get_text(strip=True) for tag in soup.select('a[data-action="show-tag"]')]
print(f"Tags: {tags}")Structured JSON extraction with Cortex
AlterLab's Cortex AI extracts typed JSON directly from pages without CSS selectors. Define a schema for Medium article metadata:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
result = client.extract(
url="https://medium.com/@example/understanding-react-19-compiler-abc123",
schema={
"type": "object",
"properties": {
"title": {"type": "string"},
"author": {"type": "string"},
"published_date": {"type": "string", "format": "date-time"},
"reading_time_minutes": {"type": "integer"},
"tags": {"type": "array", "items": {"type": "string"}}
},
"required": ["title", "author"]
}
)
print(result.data)
# Output: {"title": "Understanding React 19 Compiler", "author": "Jane Dev", ...}Cortex handles JavaScript rendering and anti-bot challenges automatically, returning validated JSON matching your schema.
Cost breakdown
AlterLab's pricing scales with technical difficulty. For Medium:
- T1/T2: Rarely sufficient due to header/JS checks
- T3: Typical for Medium's anti-bot level (stealth mode with proxy rotation)
- T4: Needed if heavy client-side rendering obstructs content
See AlterLab pricing for full details. Note: AlterLab auto-escalates tiers — start at T1 and the API promotes automatically if a lower tier fails. You only pay for the tier that succeeds.
| Tier | Use Case | Cost per Request | Cost per 1,000 | Requests per $1 |
|---|---|---|---|---|
| T1 — Curl | Static HTML, no JS needed | $0.0002 | $0.20 | 5,000 |
| T2 — HTTP | Standard pages with headers | $0.0003 | $0.30 | 3,333 |
| T3 — Stealth | Protected pages, anti-bot active | $0.002 | $2.00 | 500 |
| T4 — Browser | Full JS rendering required | $0.004 | $4.00 | 250 |
| T5 — CAPTCHA | CAPTCHA solving + JS rendering | $0.02 | $20.00 | 50 |
Best practices
- Rate limiting: Start with 1 request/second; adjust based on response headers (AlterLab includes
X-RateLimit-Remaining) - Robots.txt compliance: Check
https://medium.com/robots.txt— disallow/api/,/login/, but allow/ @username/paths - Dynamic content: Use Cortex for JS-dependent data instead of manual scrolling/waiting
- Error handling: Implement retries for 429/5xx; alterlab SDK auto-retries transient failures
- Data freshness: For time-sensitive data, pair with AlterLab's scheduling (cron expressions) or webhooks
Scaling up
For large-scale Medium data collection:
- Batch requests: Use AlterLab's
/batchendpoint (up to 100 URLs/request) to reduce overhead - Scheduling: Set up recurring scrapes via AlterLab's dashboard API for weekly trend analysis
- Responsible scaling:
- Monitor success rates per domain; pause if >5% failure rate
- Use AlterLab's usage alerts to avoid unexpected costs
- Store raw HTML minimally; extract only needed fields to reduce storage
- Consider sampling: scrape 10% of articles daily instead of 100%
Try scraping Medium with AlterLab
Key takeaways
- Medium's public tech content is scrapeable with proper anti-bot handling via AlterLab's tiered system
- Always verify data accessibility through robots.txt and ToS before scraping
- Use Cortex for reliable structured output instead of fragile CSS selectors
- Budget for T3/T4 tiers ($0.002-$0.004/request) for consistent Medium access
- Implement rate limiting and monitoring to maintain sustainable scraping practices
Hit reply if you have questions.
Was this article helpful?
Frequently Asked Questions
Related Articles

How to Scrape DoorDash Data: Complete Guide for 2026
Learn how to scrape DoorDash data using Python and Node.js. A technical guide on extracting public food data, handling anti-bot protections, and structured AI extraction.
Herald Blog Service

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026
Compare Playwright, Puppeteer, and Selenium for web scraping in 2026. Learn which browser automation tool is best for speed, reliability, and bot detection handling.
Herald Blog Service
SEC EDGAR Data API: Extract Structured JSON in 2026
Get structured JSON from SEC EDGAR via AlterLab’s API. Extract title, identifier, date_published and more with schema validation. Always start with the answer and keep it concise.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Anti-Bot Handling API
Automatic challenge handling for protected sites — works out of the box.
JavaScript Rendering API
Render SPAs and dynamic content with headless Chromium.
Pricing
5-tier pricing from $0.0002/page. 5,000 free requests to start.
Documentation
API reference, SDKs, quickstart guides, and tutorials.
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.