Pricing Compare Playground Blog Docs Changelog

Hacker News Data API: Extract Structured JSON in 2026

Extract structured Hacker News data via API using AlterLab's Extract AI. Get typed JSON output for title, author, date and more—no HTML parsing needed.

Herald Blog ServiceJune 27, 2026

4 min read

3 views

This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

To get structured Hacker News data via API, use AlterLab's Extract endpoint with a JSON schema defining your desired fields (title, author, published_date, tags, URL). Pass the schema and target URL to receive validated, typed JSON output—eliminating fragile HTML parsing. The process requires only two lines of Python code after setup.

Why use Hacker News data?

Hacker News provides real-time insights into tech trends, making it valuable for:

AI training datasets: Collecting technical article titles and discussions for natural language processing models
Competitive intelligence: Monitoring emerging technologies and startup announcements mentioned in threads
Content aggregation: Building tech news feeds or trend analysis tools for developer communities

What data can you extract?

From public Hacker News pages, you can extract these structured fields:

title: The headline of the story or discussion
author: The username of the submitter
published_date: Timestamp when the item was posted
tags: Associated categories or keywords (if visible in the snippet)
url: Direct link to the external article or internal discussion

All fields are publicly visible on the news.ycombinator.com homepage and item pages. AlterLab's AI identifies and extracts them based on your schema definition.

The extraction approach

Raw HTTP requests combined with HTML parsing fail frequently on Hacker News due to:

Dynamic content loaded via JavaScript
Frequent frontend updates breaking CSS selectors
Anti-bot measures requiring session handling

A data API approach solves these by:

Handling JavaScript rendering and anti-bot challenges automatically
Returning structured data matching your schema instead of raw HTML
Providing built-in retry logic and rate limit management
Eliminating the need for maintenance-heavy parsing code

Quick start with AlterLab Extract API

First, install the AlterLab Python client and follow the Getting started guide. Then extract data with minimal code:

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")

schema = {
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "The title field"
    },
    "author": {
      "type": "string",
      "description": "The author field"
    },
    "published_date": {
      "type": "string",
      "description": "The published date field"
    },
    "tags": {
      "type": "string",
      "description": "The tags field"
    },
    "url": {
      "type": "string",
      "description": "The url field"
    }
  }
}

result = client.extract(
    url="https://news.ycombinator.com/item?id=40000000",
    schema=schema,
)
print(result.data)

The equivalent cURL request:

Bash

curl -X POST https://api.alterlab.io/v1/extract \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com/item?id=40000000",
    "schema": {"properties": {"title": {"type": "string"}, "author": {"type": "string"}, "published_date": {"type": "string"}}}
  }'

Both examples return structured JSON like:

JSON

{
  "title": "Example Tech Article",
  "author": "techblogger",
  "published_date": "2026-03-15T14:30:00Z",
  "tags": ["programming", "ai"],
  "url": "https://example.com/tech-article"
}

Define your schema

The schema parameter drives AlterLab's extraction accuracy. Key principles:

Type safety: Define string, number, boolean, or array types for each field
Description hints: Help the AI understand context (e.g., "ISO 8601 timestamp")
Required fields: Omit "required" array to allow partial extraction when data is missing
Nested objects: Extract complex structures like comment threads using object types

AlterLab validates output against your schema, returning only matching fields. If the AI cannot find a field, it returns null for that key—never inventing data.

Handle pagination and scale

For extracting multiple Hacker News pages:

Batch processing: Use async requests with alterlab.extract_batch() for concurrent processing
Rate limiting: AlterLab automatically respects Hacker News's crawl-delay; adjust via max_concurrency parameter
Error handling: Check result.success flag and result.error for failed extractions
Cost optimization: See AlterLab pricing for volume discounts—pay only for successful extractions

Example async batch job:

Python

import alterlab
import asyncio

client = alterlab.Client("YOUR_API_KEY")

urls = [
    "https://news.ycombinator.com",
    "https://news.ycombinator.com/news?p=2",
    "https://news.ycombinator.com/news?p=3"
]

schema = {
  "type": "object",
  "properties": {
    "title": {"type": "string"},
    "url": {"type": "string"}
  }
}

async def extract_all():
    tasks = [
        client.extract(url=url, schema=schema)
        for url in urls
    ]
    results = await asyncio.gather(*tasks)
    for result in results:
        if result.success:
            print(result.data)

asyncio.run(extract_all())

Key takeaways

AlterLab's Extract API converts public web pages into typed JSON without parsing fragility
Define your exact data needs via JSON schema for validated, consistent output
The service handles JavaScript, anti-bot measures, and rate limiting automatically
Start with a single endpoint call; scale to batches using async patterns
Always verify compliance with robots.txt and Terms of Service before extraction

Begin extracting structured Hacker News data today—visit the Extract API docs for full reference.

Was this article helpful?

Try it yourself

One API call. Any language.

Python SDK, Node SDK, or plain HTTP. Get started in under a minute.

from alterlab import AlterLab

client = AlterLab(api_key="YOUR_KEY")
result = client.scrape("https://example.com")
print(result.markdown)

No credit card required · 5,000 free requests

Frequently Asked Questions

Hacker News offers an unofficial Firebase API for basic story data, but it lacks structured output for fields like full article content or metadata. AlterLab fills this gap by extracting structured JSON from public pages using AI, respecting robots.txt and rate limits.

You can extract publicly available fields including title, author, published_date, tags, and URL from Hacker News pages. Define your desired schema and AlterLab returns validated, typed JSON—no CSS selectors or parsing required.

AlterLab uses pay-as-you-go pricing with no minimums or expiration. Costs scale with extraction volume; see /pricing for details. You pay only for successful extractions, making it efficient for data pipelines.

Herald Blog Service

View all posts

Best Practices

AlterLab vs Apify: Best API for AI Agent Data Pipelines

Compare AlterLab and Apify for AI agent data pipelines: success rates, latency, anti-bot handling, pricing, and ease of integration to pick the right scraping API.

Herald Blog Service

Jun 27, 2026

Tutorials

AlterLab vs ProxyCrawl: Which Scraping API Is Better in 2026?

Discover whether AlterLab or ProxyCrawl is the better web scraping API for your project in 2026, comparing pricing, features, and ideal use cases.

Herald Blog Service

Jun 27, 2026

Tutorials

AlterLab vs ScrapFly: Which Scraping API Is Better in 2026?

A factual comparison of AlterLab and ScrapFly web scraping APIs covering pricing, features, and use cases to help developers choose the right tool in 2026.

Herald Blog Service

Jun 27, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Hacker News Data API: Extract Structured JSON in 2026

TL;DR

Why use Hacker News data?

What data can you extract?

The extraction approach

Quick start with AlterLab Extract API

Define your schema

Key takeaways

Frequently Asked Questions

Related Articles

AlterLab vs Apify: Best API for AI Agent Data Pipelines

AlterLab vs ProxyCrawl: Which Scraping API Is Better in 2026?

AlterLab vs ScrapFly: Which Scraping API Is Better in 2026?

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X: Complete Guide for 2026

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Stay in the Loop

Explore AlterLab

Python Web Scraping API

Compare Scraping APIs

Pricing

Documentation

Web Scraping API Resources