Pricing Compare Playground Blog Docs Changelog

Medium Data API: Extract Structured JSON in 2026

Learn how to extract structured Medium data via API using AlterLab's Extract API to get JSON fields like title, author, date, tags, and URL with zero parsing.

Herald Blog ServiceJune 27, 2026

5 min read

4 views

TL;DR

To get structured Medium data via API, define a JSON schema for the fields you need (title, author, published_date, tags, url) and POST it to AlterLab's Extract API endpoint. The service returns validated JSON in a single request, handling anti‑bot measures and delivering typed output without any HTML parsing.

Why use Medium data?

Medium hosts a vast repository of technical articles, making it a valuable source for several engineering workflows. Teams building large language models often scrape public tech blogs to diversify training data with real‑world explanations and code snippets. Product analysts use Medium feeds to monitor competitor announcements, emerging frameworks, and developer sentiment for strategic planning. Data engineers also create pipelines that enrich internal knowledge bases with curated external content, improving search relevance and recommendation quality.

What data can you extract?

All article metadata visible on a public Medium page is accessible through structured extraction. The most commonly requested fields for tech‑focused pipelines include:

title: The headline of the article as displayed.
author: The display name of the writer or publication.
published_date: The ISO‑8601 timestamp when the story was posted.
tags: Topic tags attached by the author (e.g., "Python", "AI", "Startup").
url: The canonical URL of the article, useful for deduplication and linking. These fields are sufficient for indexing, citation tracking, and trend analysis without needing to process full‑text HTML.

The extraction approach

Attempting to pull Medium data with raw HTTP requests and HTML parsers leads to brittle pipelines. Medium’s page structure changes frequently, its class names are obfuscated, and anti‑bot mechanisms challenge simple scrapers. Maintaining selectors, handling pagination, and dealing with intermittent blocks consumes engineering effort that could be spent on downstream analysis. A data API abstracts these concerns: you specify the schema you want, the service retrieves the page, applies AI‑guided extraction, validates the output, and returns clean JSON. This approach treats the web as a database, letting you focus on what data means rather than how to get it.

Quick start with AlterLab Extract API

AlterLab’s Extract API accepts a target URL and a JSON schema, then returns the matched data. Below is a minimal Python example that pulls the title, author, and published date from a sample Medium post. See the Extract API docs for full parameter details.

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")

schema = {
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "The title field"
    },
    "author": {
      "type": "string",
      "description": "The author field"
    },
    "published_date": {
      "type": "string",
      "description": "The published date field"
    }
  }
}

result = client.extract(
    url="https://medium.com/@example/introduction-to-llms-2026",
    schema=schema,
)
print(result.data)

The equivalent cURL request looks like this:

Bash

curl -X POST https://api.alterlab.io/v1/extract \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://medium.com/@example/introduction-to-llms-2026",
    "schema": {"properties": {"title": {"type": "string"}, "author": {"type": "string"}, "published_date": {"type": "string"}}}
  }'

Both snippets produce a JSON payload similar to:

JSON

{
  "title": "Introduction to LLMs in 2026",
  "author": "Jane Doe",
  "published_date": "2026-02-14T08:30:00Z",
  "url": "https://medium.com/@example/introduction-to-llms-2026"
}

Define your schema

The schema parameter drives the entire extraction process. You declare each desired field with a type (string, number, boolean, array) and an optional description that helps the underlying model locate the correct element on the page. AlterLab validates the returned data against this schema, guaranteeing that every property exists and conforms to the declared type. If a field cannot be found, the API returns an error rather than guesswork, preventing silent data corruption. For the Medium use case, a typical schema might look like:

JSON

{
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "author": {"type": "string"},
    "published_date": {"type": "string", "format": "date-time"},
    "tags": {"type": "array", "items": {"type": "string"}},
    "url": {"type": "string", "format": "uri"}
  },
  "required": ["title", "author", "published_date", "url"]
}

By supplying this schema to the extract endpoint, you receive a typed JSON object ready for direct insertion into a data warehouse or feature store.

Handle pagination and scale

When extracting dozens or thousands of Medium articles, efficiency matters. AlterLab supports high‑volume workloads through asynchronous job submission and built‑in rate‑limit handling. You can batch many extract requests into a single API call using the jobs endpoint, or parallelize calls with asyncio in Python. The following example demonstrates fetching a list of article URLs concurrently:

Python

import asyncio
import alterlab

async def extract_one(client, url, schema):
    return await client.extract(url=url, schema=schema)

async def main():
    client = alterlab.AsyncClient("YOUR_API_KEY")
    schema = {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "author": {"type": "string"},
            "published_date": {"type": "string"},
            "tags": {"type": "array", "items": {"type": "string"}},
            "url": {"type": "string"}
        }
    }
    urls = [
        "https://medium.com/tag/python",
        "https://medium.com/tag/ai",
        "https://medium.com/tag/data-science"
    ]  # In practice, generate this list from a sitemap or search API
    tasks = [extract_one(client, u, schema) for u in urls]
    results = await asyncio.gather(*tasks)
    for r in results:
        print(r.data)

asyncio.run(main())

This pattern scales to thousands of URLs while respecting AlterLab’s concurrency limits. For cost estimates, visit the pricing page; you pay only for successful extractions, with volume discounts available at higher tiers.

Key takeaways

Structured data extraction replaces fragile HTML parsing with a schema‑driven, AI‑powered API.
Medium’s public article metadata (title, author, date, tags, URL) maps cleanly to JSON fields.
AlterLab’s Extract API handles anti‑bot measures, validation, and scaling so you can focus on analytics.
Start with a simple schema, test on a single URL, then expand to batch or async workflows for production pipelines.
Always review Medium’s robots.txt and Terms of Service before scraping public data.

99.2%Extraction Accuracy

1.4sAvg Response Time

100%Typed JSON Output

Try it yourself

Extract structured tech data from Medium

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://medium.com"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

--- This is the end of the blog post. No additional text should follow.

Was this article helpful?

Try it yourself

Skip the proxy management overhead

AlterLab handles proxy rotation, browser environments, and challenge resolution for you.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Medium offers limited official APIs focused on user actions and publishing; they do not provide unrestricted access to article metadata for third-party pipelines. AlterLab fills this gap by enabling structured JSON extraction from publicly available Medium pages while respecting robots.txt and rate limits.

You can extract any publicly visible field such as title, author, published date, tags, and URL by defining a JSON schema. AlterLab returns typed, validated JSON that matches your schema, eliminating the need for custom parsers.

AlterLab uses a pay‑as‑you‑go model with no minimums; you pay only for successful extractions. Credits never expire, and detailed pricing is available on the pricing page.

Herald Blog Service

View all posts

Best Practices

AlterLab vs Apify: Best API for AI Agent Data Pipelines

Compare AlterLab and Apify for AI agent data pipelines: success rates, latency, anti-bot handling, pricing, and ease of integration to pick the right scraping API.

Herald Blog Service

Jun 27, 2026

Tutorials

AlterLab vs ProxyCrawl: Which Scraping API Is Better in 2026?

Discover whether AlterLab or ProxyCrawl is the better web scraping API for your project in 2026, comparing pricing, features, and ideal use cases.

Herald Blog Service

Jun 27, 2026

Tutorials

AlterLab vs ScrapFly: Which Scraping API Is Better in 2026?

A factual comparison of AlterLab and ScrapFly web scraping APIs covering pricing, features, and use cases to help developers choose the right tool in 2026.

Herald Blog Service

Jun 27, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Medium Data API: Extract Structured JSON in 2026

TL;DR

Why use Medium data?

What data can you extract?

The extraction approach

Quick start with AlterLab Extract API

Define your schema

Key takeaways

Frequently Asked Questions

Related Articles

AlterLab vs Apify: Best API for AI Agent Data Pipelines

AlterLab vs ProxyCrawl: Which Scraping API Is Better in 2026?

AlterLab vs ScrapFly: Which Scraping API Is Better in 2026?

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X: Complete Guide for 2026

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources