
Hacker News Data API: Extract Structured JSON in 2026
Extract structured Hacker News data via API using AlterLab's Extract AI. Get typed JSON output for title, author, date and more—no HTML parsing needed.
This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.
TL;DR
To get structured Hacker News data via API, use AlterLab's Extract endpoint with a JSON schema defining your desired fields (title, author, published_date, tags, URL). Pass the schema and target URL to receive validated, typed JSON output—eliminating fragile HTML parsing. The process requires only two lines of Python code after setup.
Why use Hacker News data?
Hacker News provides real-time insights into tech trends, making it valuable for:
- AI training datasets: Collecting technical article titles and discussions for natural language processing models
- Competitive intelligence: Monitoring emerging technologies and startup announcements mentioned in threads
- Content aggregation: Building tech news feeds or trend analysis tools for developer communities
What data can you extract?
From public Hacker News pages, you can extract these structured fields:
title: The headline of the story or discussionauthor: The username of the submitterpublished_date: Timestamp when the item was postedtags: Associated categories or keywords (if visible in the snippet)url: Direct link to the external article or internal discussion
All fields are publicly visible on the news.ycombinator.com homepage and item pages. AlterLab's AI identifies and extracts them based on your schema definition.
The extraction approach
Raw HTTP requests combined with HTML parsing fail frequently on Hacker News due to:
- Dynamic content loaded via JavaScript
- Frequent frontend updates breaking CSS selectors
- Anti-bot measures requiring session handling
A data API approach solves these by:
- Handling JavaScript rendering and anti-bot challenges automatically
- Returning structured data matching your schema instead of raw HTML
- Providing built-in retry logic and rate limit management
- Eliminating the need for maintenance-heavy parsing code
Quick start with AlterLab Extract API
First, install the AlterLab Python client and follow the Getting started guide. Then extract data with minimal code:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
schema = {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title field"
},
"author": {
"type": "string",
"description": "The author field"
},
"published_date": {
"type": "string",
"description": "The published date field"
},
"tags": {
"type": "string",
"description": "The tags field"
},
"url": {
"type": "string",
"description": "The url field"
}
}
}
result = client.extract(
url="https://news.ycombinator.com/item?id=40000000",
schema=schema,
)
print(result.data)The equivalent cURL request:
curl -X POST https://api.alterlab.io/v1/extract \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com/item?id=40000000",
"schema": {"properties": {"title": {"type": "string"}, "author": {"type": "string"}, "published_date": {"type": "string"}}}
}'Both examples return structured JSON like:
{
"title": "Example Tech Article",
"author": "techblogger",
"published_date": "2026-03-15T14:30:00Z",
"tags": ["programming", "ai"],
"url": "https://example.com/tech-article"
}Define your schema
The schema parameter drives AlterLab's extraction accuracy. Key principles:
- Type safety: Define
string,number,boolean, orarraytypes for each field - Description hints: Help the AI understand context (e.g., "ISO 8601 timestamp")
- Required fields: Omit
"required"array to allow partial extraction when data is missing - Nested objects: Extract complex structures like comment threads using
objecttypes
AlterLab validates output against your schema, returning only matching fields. If the AI cannot find a field, it returns null for that key—never inventing data.
Handle pagination and scale
For extracting multiple Hacker News pages:
- Batch processing: Use async requests with
alterlab.extract_batch()for concurrent processing - Rate limiting: AlterLab automatically respects Hacker News's crawl-delay; adjust via
max_concurrencyparameter - Error handling: Check
result.successflag andresult.errorfor failed extractions - Cost optimization: See AlterLab pricing for volume discounts—pay only for successful extractions
Example async batch job:
import alterlab
import asyncio
client = alterlab.Client("YOUR_API_KEY")
urls = [
"https://news.ycombinator.com",
"https://news.ycombinator.com/news?p=2",
"https://news.ycombinator.com/news?p=3"
]
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"url": {"type": "string"}
}
}
async def extract_all():
tasks = [
client.extract(url=url, schema=schema)
for url in urls
]
results = await asyncio.gather(*tasks)
for result in results:
if result.success:
print(result.data)
asyncio.run(extract_all())Key takeaways
- AlterLab's Extract API converts public web pages into typed JSON without parsing fragility
- Define your exact data needs via JSON schema for validated, consistent output
- The service handles JavaScript, anti-bot measures, and rate limiting automatically
- Start with a single endpoint call; scale to batches using async patterns
- Always verify compliance with robots.txt and Terms of Service before extraction
Begin extracting structured Hacker News data today—visit the Extract API docs for full reference.
Was this article helpful?
Frequently Asked Questions
Related Articles

AlterLab vs Apify: Best API for AI Agent Data Pipelines
Compare AlterLab and Apify for AI agent data pipelines: success rates, latency, anti-bot handling, pricing, and ease of integration to pick the right scraping API.
Herald Blog Service
AlterLab vs ProxyCrawl: Which Scraping API Is Better in 2026?
Discover whether AlterLab or ProxyCrawl is the better web scraping API for your project in 2026, comparing pricing, features, and ideal use cases.
Herald Blog Service
AlterLab vs ScrapFly: Which Scraping API Is Better in 2026?
A factual comparison of AlterLab and ScrapFly web scraping APIs covering pricing, features, and use cases to help developers choose the right tool in 2026.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.