How to Scrape Medium: Complete Guide for 2026

Learn how to scrape Medium articles, author data, and engagement metrics with Python. Includes working code examples, anti-bot bypass, and scaling strategies.

Yash DubeyApril 9, 2026

7 min read

302 views

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

Why scrape Medium?

Medium hosts millions of technical articles, opinion pieces, and industry analysis. Engineers scrape it for three primary use cases.

Content research and trend analysis. Track which topics gain traction over time. Measure engagement patterns across tags like machine-learning, devops, or cybersecurity. Build datasets that show how technical discourse shifts quarter over quarter.

Author and publication monitoring. Follow specific writers or publications for competitive intelligence. Track posting frequency, topic evolution, and audience response. Useful for content teams planning editorial calendars or recruiters identifying subject matter experts.

Training data for NLP models. Medium articles provide clean, well-formatted text suitable for language model fine-tuning, sentiment analysis, and topic classification. The platform's consistent structure makes extraction straightforward once you handle the anti-bot layer.

Anti-bot challenges on medium.com

Medium serves its content through a React-based single-page application. The initial HTML response contains minimal article content. JavaScript must execute before the actual text, images, and engagement metrics appear in the DOM.

Their anti-bot stack includes:

JavaScript challenges that verify the client can execute code before serving content
Browser fingerprinting that checks for headless browser signatures, missing fonts, and inconsistent navigator properties
Rate limiting on repeated requests from the same IP range
Cookie-based session validation that tracks request patterns over time

Running a basic requests.get() against medium.com returns a nearly empty HTML shell. You need a real browser environment with proper TLS fingerprints, consistent viewport dimensions, and realistic timing between interactions. Managing this infrastructure yourself means maintaining browser instances, rotating residential proxies, and updating evasion scripts whenever Medium changes their detection logic.

AlterLab handles all of this through its anti-bot bypass API. You send a URL, get back fully rendered HTML or structured JSON. No browser management, no proxy rotation, no fingerprint tuning.

99.2%Success Rate

1.2sAvg Response

T3Min Tier Needed

0Proxy Management

Quick start with AlterLab API

Install the Python SDK and make your first request. The getting started guide covers account setup and API key generation.

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
    "https://medium.com/@example/your-article-slug-abc123",
    formats=["json"],
    min_tier=3
)
print(response.json)

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://medium.com/@example/your-article-slug-abc123",
    "formats": ["json"],
    "min_tier": 3
  }'

The min_tier=3 parameter tells the system to skip basic HTTP tiers and go straight to headless browser rendering. Medium requires JavaScript execution, so tiers 1 and 2 will return incomplete content. Setting min_tier=3 saves you a retry cycle.

The response includes the fully rendered page. With formats=["json"], you get a parsed structure instead of raw HTML. This matters because Medium's DOM is deeply nested and class names change frequently.

Try it yourself

Try scraping Medium with AlterLab

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://medium.com"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Extracting structured data

Medium articles follow a consistent internal structure. Here are the selectors and extraction patterns for the data points engineers typically need.

Article content and metadata

Python

import alterlab
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
    "https://medium.com/@author/article-title-xyz789",
    min_tier=3
)

soup = BeautifulSoup(response.text, "html.parser")

title = soup.find("h1", class_="pw-post-title")
subtitle = soup.find("h2", class_="pw-subtitle-paragraph")
author = soup.find("div", class_="pw-post-author-name")
publish_date = soup.find("time")
content_sections = soup.find_all("p")

article_data = {
    "title": title.text.strip() if title else None,
    "subtitle": subtitle.text.strip() if subtitle else None,
    "author": author.text.strip() if author else None,
    "published": publish_date["datetime"] if publish_date else None,
    "body": "\n".join(p.text for p in content_sections)
}

Engagement metrics

Claps, responses, and reading time render client-side. You need to wait for the page to fully hydrate before extracting these values.

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
    "https://medium.com/@author/article-title-xyz789",
    min_tier=3,
    wait_for=".js-clapCount"
)

claps = response.css(".js-clapCount").text
responses = response.css(".js-responsesCount").text
reading_time = response.css(".postMetaInline-readingTime").text

The wait_for parameter ensures the scraper pauses until the engagement counters render. Without it, you will capture placeholder elements before the JavaScript populates actual numbers.

Author profile data

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
    "https://medium.com/@username",
    min_tier=3,
    formats=["json"]
)

profile = response.json
author_name = profile.get("author", {}).get("name")
follower_count = profile.get("author", {}).get("stats", {}).get("followersCount")
article_count = profile.get("author", {}).get("stats", {}).get("postsCount")
bio = profile.get("author", {}).get("bio")

When you request JSON format, AlterLab attempts to extract structured entities from the page. For author profiles, this includes name, bio, follower counts, and publication history. The exact fields depend on what Medium exposes in the rendered DOM at request time.

Common pitfalls

Rate limiting and request patterns

Medium throttles aggressive scraping. Sending 50 requests per minute from a single IP triggers temporary blocks. Space your requests out. If you are pulling article lists or search results, add 2-3 seconds between requests. AlterLab's proxy rotation handles IP-level distribution, but you should still implement reasonable delays in your own code to avoid triggering application-level rate limits.

Dynamic content and lazy loading

Medium loads images, embedded tweets, and code snippets asynchronously. The initial render may not include all content. Use the wait_for parameter to target specific elements that load late in the page lifecycle. For articles with embedded gists or code blocks, wait_for=".gist" ensures those render before capture.

Some Medium content requires a logged-in session. Member-only articles display a paywall overlay that blocks the full text. AlterLab can scrape publicly visible content without authentication. If you need access to member-only articles, you must provide session cookies:

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
    "https://medium.com/@author/member-only-article-abc",
    min_tier=3,
    cookies=[
        {"name": "uid", "value": "YOUR_SESSION_COOKIE", "domain": ".medium.com"}
    ]
)

Note that sharing or automating credential acquisition violates Medium's Terms of Service. Only use cookies from accounts you own and have explicit permission to use.

Class name volatility

Medium updates their frontend regularly. CSS class names like pw-post-title or js-clapCount may change between deployments. Build your extraction logic to handle missing fields gracefully. Log when selectors fail so you can update them before your pipeline accumulates bad data.

Scaling up

When you move from scraping a dozen articles to thousands, three things matter: cost control, scheduling, and error handling.

Batch processing

Structure your scraper to handle URL lists efficiently. Process articles in parallel where possible, but respect Medium's infrastructure by capping concurrent requests.

Python

import alterlab
import asyncio

client = alterlab.Client("YOUR_API_KEY")
urls = [
    "https://medium.com/@author/article-one-abc",
    "https://medium.com/@author/article-two-def",
    "https://medium.com/@author/article-three-ghi",
]

async def scrape_batch(url_list):
    tasks = [client.scrape_async(url, min_tier=3) for url in url_list]
    results = await asyncio.gather(*tasks)
    return results

articles = asyncio.run(scrape_batch(urls))
for article in articles:
    print(article.status_code, article.url)

Scheduling recurring scrapes

If you monitor specific authors or tags, set up recurring scrapes instead of running manual jobs. AlterLab's scheduling system uses cron expressions.

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")
schedule = client.schedules.create(
    url="https://medium.com/tag/machine-learning",
    cron="0 9 * * 1",
    formats=["json"],
    min_tier=3,
    webhook_url="https://your-server.com/webhook/medium-articles"
)
print(f"Schedule created: {schedule.id}")

This runs every Monday at 9 AM UTC, scrapes the machine-learning tag page, and pushes results to your webhook endpoint. You get fresh data without maintaining a cron daemon or retry logic.

Cost management

Medium pages require headless browser rendering, which costs more per request than static HTML pages. Each scrape at tier 3 consumes more balance than a tier 1 curl request. Monitor your usage dashboard and set spend limits on your API keys to prevent unexpected charges.

For high-volume operations, consider caching results. If you scrape the same article URL multiple times, store the output locally and only re-scrape when you need updated engagement metrics. This reduces redundant requests and keeps costs predictable. Review AlterLab pricing to understand per-request costs at each tier and plan your budget accordingly.

Error handling and retries

Network failures, temporary blocks, and page structure changes will cause individual requests to fail. Wrap your scraping calls in retry logic with exponential backoff.

Python

import alterlab
import time

client = alterlab.Client("YOUR_API_KEY")

def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.scrape(url, min_tier=3)
            if response.status_code == 200:
                return response
            time.sleep(2 ** attempt)
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)
    return None

Log failed URLs separately so you can investigate whether failures stem from selector changes, temporary outages, or permanent page removals.

Key takeaways

Medium requires headless browser rendering due to its React-based architecture and anti-bot protections. Setting min_tier=3 ensures you skip tiers that cannot execute JavaScript.

Use formats=["json"] to get structured data instead of parsing volatile HTML class names. Add wait_for selectors to capture lazy-loaded engagement metrics. Space out requests to avoid application-level rate limits, and cache results to reduce redundant scrapes.

For recurring monitoring, use scheduled scrapes with webhooks instead of maintaining your own cron infrastructure. Set spend limits on API keys to control costs at scale.

T3+Required Tier

JSONRecommended Format

2-3sRequest Spacing

asyncBatch Processing

Was this article helpful?

Try it yourself

Skip the proxy management overhead

AlterLab handles proxy rotation, browser environments, and challenge resolution for you.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Scraping publicly accessible data on Medium is generally legal, but you must respect their Terms of Service and robots.txt. Avoid scraping behind authenticated walls, do not overload their servers, and use the data for personal or research purposes rather than republishing full articles.

Medium uses standard anti-bot protections including JavaScript challenges, fingerprinting, and rate limiting. AlterLab handles these automatically through its [anti-bot bypass API](/anti-bot-bypass-api), rotating proxies, and headless browser rendering so you get clean HTML without managing evasion logic yourself.

Cost depends on volume and whether you need headless browser rendering. Medium pages typically require JavaScript execution, so you will want tier 3 or higher. Check [AlterLab pricing](/pricing) for per-request rates. Most engineers scraping a few thousand articles per month spend under $50.

Yash Dubey

View all posts

Tutorials

How to Give Your AI Agent Access to Medium Data

Learn how to connect your AI agent to Medium using AlterLab's Extract API to retrieve structured, public data for RAG pipelines and content intelligence.

Herald Blog Service

Jul 9, 2026

Best Practices

Managing Headless Browser Overhead in Data Pipelines

Learn how to reduce latency and resource consumption when using headless browsers for data extraction in large-scale web scraping pipelines.

Herald Blog Service

Jul 8, 2026

Tutorials

How to Give Your AI Agent Access to AngelList Data

Enable AI agents to retrieve AngelList job data via AlterLab structured extraction with clean JSON output and automatic anti bot handling

Herald Blog Service

Jul 7, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Why scrape Medium?

Anti-bot challenges on medium.com

Quick start with AlterLab API

Extracting structured data

Article content and metadata

Engagement metrics

Author profile data

Common pitfalls

Rate limiting and request patterns

Dynamic content and lazy loading

Session and cookie handling

Class name volatility

Scaling up

Batch processing

Scheduling recurring scrapes

Cost management

Error handling and retries

Key takeaways

Related guides

Frequently Asked Questions

Related Articles

How to Give Your AI Agent Access to Medium Data

Managing Headless Browser Overhead in Data Pipelines

How to Give Your AI Agent Access to AngelList Data

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

How to Scrape Twitter/X: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources