Tutorials

SEC EDGAR Data API: Extract Structured JSON in 2026

Get structured JSON from SEC EDGAR via AlterLab’s API. Extract title, identifier, date_published and more with schema validation. Always start with the answer and keep it concise.

4 min read
6 views

AlterLab handles this automaticallyscrape any URL with one API call. No infrastructure required.

Try it free

TL;DR

Extract SEC EDGAR pages with a POST to the Extract API, define a JSON schema for title, identifier, date_published, category and description, and receive validated JSON. This approach avoids fragile HTML parsing and gives predictable cost.

Why use SEC EDGAR data?

  • AI training pipelines that need clean, government‑issued filings
  • Financial analytics that track 10‑K and 8‑K filings across companies
  • Competitive intelligence that monitors filing frequency and topics

What data can you extract?

SEC EDGAR publishes only public filings. Typical fields include:

  • title: The document headline
  • identifier: CIK or accession number
  • date_published: Filing date in ISO format
  • category: Document type such as "10-K" or "8-K"
  • description: Brief summary of the filing’s content

All of these are openly available; no login or paywall is required.

The extraction approach

Scraping SEC EDGAR pages with raw HTTP requests and HTML parsing breaks whenever the site updates its layout or adds anti‑bot checks. A data API abstracts that complexity. AlterLab’s Extract API handles:

  • Automatic request routing and proxy rotation
  • HTML‑to‑JSON conversion that respects robots.txt
  • Schema validation that guarantees field types

The result is a predictable, typed JSON payload you can store directly in your pipeline.

Quick start with AlterLab Extract API

First install the client library or use curl. See our Getting started guide for full setup details.

Python example

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

schema = {
  "type": "object",
  "properties": {
    "title": {"type": "string", "description": "The title field"},
    "identifier": {"type": "string", "description": "The identifier field"},
    "date_published": {"type": "string", "description": "The date published field"},
    "category": {"type": "string", "description": "The category field"},
    "description": {"type": "string", "description": "The description field"}
  }
}

result = client.extract(
    url="https://sec.gov/example-page",
    schema=schema,
)
print(result.data)

cURL example

Bash
curl -X POST https://api.alterlab.io/v1/extract \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://sec.gov/example-page",
    "schema": {"properties": {"title": {"type": "string"}, "identifier": {"type": "string"}, "date_published": {"type": "string"}}}
  }'

Both examples return a JSON object that matches the schema exactly, eliminating the need for post‑processing.

Define your schema

The schema parameter describes the shape of the output. Use standard JSON Schema syntax; AlterLab validates the extracted data against it and returns only fields that conform. This guarantees that your downstream code can rely on title being a string, date_published on an ISO‑8601 timestamp, and so on.

Handle pagination and scale

For a single filing the request is quick, but high‑volume pipelines need batching. Use the /v1/batch endpoint to queue multiple URLs, then poll for completion. Responses include a job ID you can use with webhooks to trigger downstream processing.

Cost scales with request complexity. Review AlterLab pricing at AlterLab pricing to estimate expense before committing. Minimum cost is $0.001; maximum is $0.50. When you register a BYOK key, the orchestration fee is a flat $0.0003; otherwise the platform rate applies.

Key takeaways

  • SEC EDGAR provides only public data; always respect robots.txt.
  • Use a schema to get typed JSON without manual parsing.
  • AlterLab’s Extract API manages anti‑bot bypass, cost estimation and scaling.
  • Batch and async workflows let you process hundreds of filings per minute.
99.2%Extraction Accuracy
1.4sAvg Response Time
100%Typed JSON Output
Try it yourself

Extract structured government data from SEC EDGAR

Batch/async usage example

Python
import alterlab, asyncio

client = alterlab.Client("YOUR_API_KEY")

urls = [
    "https://sec.gov/filing1",
    "https://sec.gov/filing2",
    "https://sec.gov/filing3"
]

async def extract_one(url):
    schema = {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "identifier": {"type": "string"},
        "date_published": {"type": "string"}
      }
    }
    return await client.extract_async(url=url, schema=schema)

jobs = [extract_one(u) for u in urls]
results = await asyncio.gather(*jobs)
for r in results:
    print(r.data)

This pattern lets you fire many requests in parallel and handle responses as they arrive, ideal for large‑scale data pipelines.

Share

Was this article helpful?

Frequently Asked Questions

The SEC provides public RSS feeds and bulk data. No official JSON API exists; services like AlterLab fill the gap by offering compliant extraction with typed output.
You can extract publicly listed fields such as title, identifier, date_published, category and description using a schema that enforces typed JSON.
Pricing is pay‑as‑you‑go on AlterLab; costs clamp between $0.001 and $0.50 per request. No minimums, balance expires only when spent.