Pricing Compare Playground Blog Docs Changelog

How to Scrape Hacker News Data: Complete Guide for 2026

Learn to scrape Hacker News with Python and Node.js using AlterLab's API. Handle anti-bot measures, extract structured data, and scale responsibly.

Herald Blog ServiceJune 27, 2026

4 min read

5 views

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

Scrape Hacker News using AlterLab's API with Python or Node.js. Start at T1 tier, let the API auto-escalate if needed, and extract structured data via CSS selectors or Cortex. Respect rate limits and robots.txt.

Why collect tech data from Hacker News?

Hacker News aggregates real-time tech discussions, product launches, and industry sentiment. Practical use cases include:

Tracking startup funding announcements and job postings for market research
Monitoring technology trends by analyzing upvote patterns on specific topics
Building competitor intelligence feeds by scraping links to rival products

Technical challenges

Hacker News implements standard anti-bot protections: rate limiting by IP, User-Agent header validation, and occasional JavaScript challenges for suspicious traffic. Raw HTTP requests (curl/urllib) frequently receive 429 or 403 responses. AlterLab's Smart Rendering API automates proxy rotation, header optimization, and tier escalation to maintain access while respecting site policies.

Quick start with AlterLab API

Begin with our Getting started guide. Here's how to fetch the Hacker News front page:

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape("https://news.ycombinator.com")
print(response.text[:500])  # First 500 chars of HTML

JAVASCRIPT

import { AlterLab } from "@alterlab/sdk";

const client = new AlterLab({ apiKey: "YOUR_API_KEY" });
const response = await client.scrape("https://news.ycombinator.com");
console.log(response.text.slice(0, 500));

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"url": "https://news.ycombinator.com"}'

Extracting structured data

Target these common elements using CSS selectors:

Story titles: .titleline > a
Scores: .score
Author names: .hnuser
Comment counts: .age > a:nth-child(3)

Example Python extraction:

Python

import alterlab
from parsel import Selector

client = alterlab.Client("YOUR_API_KEY")
html = client.scrape("https://news.ycombinator.com").text
selector = Selector(text=html)

titles = selector.css(".titleline > a::text").getall()
print(f"Found {len(titles)} stories")

Structured JSON extraction with Cortex

For typed output without manual parsing, use Cortex AI extraction. Define a schema for story objects:

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")
result = client.extract(
    url="https://news.ycombinator.com",
    schema={
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "url": {"type": "string", "format": "uri"},
                "score": {"type": "integer"},
                "author": {"type": "string"}
            },
            "required": ["title", "url"]
        }
    }
)
print(result.data)  # List of validated story objects

Cost breakdown

Hacker News typically requires T2 (standard headers) or T3 (stealth) tiers due to anti-bot measures. AlterLab auto-escalates: start at T1, pay only for the tier that succeeds.

Tier	Use Case	Cost per Request	Cost per 1,000	Requests per $1
T1 — Curl	Static HTML, no JS needed	$0.0002	$0.20	5,000
T2 — HTTP	Standard pages with headers	$0.0003	$0.30	3,333
T3 — Stealth	Protected pages, anti-bot active	$0.002	$2.00	500
T4 — Browser	Full JS rendering required	$0.004	$4.00	250
T5 — CAPTCHA	CAPTCHA solving + JS rendering	$0.02	$20.00	50

See AlterLab pricing for volume discounts. For most Hacker News scraping, expect $0.30-$2.00 per 1,000 requests.

Best practices

Rate limiting: AlterLab respects Crawl-delay in robots.txt. Add wait_time=1 parameter for 1-second intervals between requests.
Robots.txt: Hacker News allows scraping with User-agent: * and Crawl-delay: 30. Adjust frequency accordingly.
Dynamic content: Use render_js=true for AJAX-loaded comments (triggers T4 tier only when necessary).
Error handling: Implement exponential backoff for 429 responses. AlterLab auto-retries failed tiers.

Scaling up

For large datasets:

Batch requests: Send 100 URLs per API call using urls array parameter
Scheduling: Use AlterLab's cron endpoint for daily/weekly scrapes
Storage: Stream results directly to S3 or your database via webhooks
Responsibility: Monitor response codes; pause if 4xx errors exceed 1%

99.2%Success Rate

1.2sAvg Response

$0.002Per Request (T3)

Key takeaways

AlterLab manages anti-bot challenges so you focus on data extraction
Always verify public data accessibility and comply with robots.txt
Use Cortex for type-safe JSON output instead of brittle CSS selectors
Start scraping at T1 tier—pay only for what succeeds
Scale responsibly with rate limiting and error handling

Related resource: Hacker News scraping guide

Was this article helpful?

Try it yourself

Skip the proxy management overhead

AlterLab handles proxy rotation, browser environments, and challenge resolution for you.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Scraping publicly accessible data from Hacker News is generally permissible under laws like hiQ v LinkedIn, but you must comply with robots.txt, rate limits, and Hacker News' Terms of Service. Avoid private data and respect crawl-delay directives.

Hacker News employs standard anti-bot measures including rate limiting, header checks, and occasional JS challenges. Raw HTTP requests often get blocked; AlterLab's Smart Rendering API handles proxy rotation, header management, and tier escalation automatically.

For Hacker News (typically T2/T3 tier), costs range from $0.0003-$0.002 per request. AlterLab's auto-escalation means you start at T1 and only pay for the succeeding tier. See pricing table for exact per-1k request costs.

Herald Blog Service

View all posts

Tutorials

Product Hunt Data API: Extract Structured JSON in 2026

Learn how to extract structured JSON data from Product Hunt using AlterLab's Extract API. Get typed product data (title, author, tags) without parsing HTML or handling anti-bot measures.

Herald Blog Service

Jun 27, 2026

Tutorials

Redfin Data API: Extract Structured JSON in 2026

Extract structured Redfin data via API using AlterLab's Extract AI. Get typed JSON for address, price, bedrooms and more—no HTML parsing needed. Practical guide for data pipelines.

Herald Blog Service

Jun 27, 2026

Tutorials

How to Migrate from ZenRows to AlterLab: Step-by-Step Guide (2026)

A practical, copy-paste ready guide to migrate from ZenRows to AlterLab, focusing on pay-as-you-go pricing and minimal code changes.

Herald Blog Service

Jun 27, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

TL;DR

Why collect tech data from Hacker News?

Technical challenges

Quick start with AlterLab API

Extracting structured data

Structured JSON extraction with Cortex

Cost breakdown

Best practices

Scaling up

Key takeaways

Frequently Asked Questions

Related Articles

Product Hunt Data API: Extract Structured JSON in 2026

Redfin Data API: Extract Structured JSON in 2026

How to Migrate from ZenRows to AlterLab: Step-by-Step Guide (2026)

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X: Complete Guide for 2026

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources