
How to Scrape Hacker News Data: Complete Guide for 2026
Learn to scrape Hacker News with Python and Node.js using AlterLab's API. Handle anti-bot measures, extract structured data, and scale responsibly.
AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.
Try it freeThis guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.
TL;DR
Scrape Hacker News using AlterLab's API with Python or Node.js. Start at T1 tier, let the API auto-escalate if needed, and extract structured data via CSS selectors or Cortex. Respect rate limits and robots.txt.
Why collect tech data from Hacker News?
Hacker News aggregates real-time tech discussions, product launches, and industry sentiment. Practical use cases include:
- Tracking startup funding announcements and job postings for market research
- Monitoring technology trends by analyzing upvote patterns on specific topics
- Building competitor intelligence feeds by scraping links to rival products
Technical challenges
Hacker News implements standard anti-bot protections: rate limiting by IP, User-Agent header validation, and occasional JavaScript challenges for suspicious traffic. Raw HTTP requests (curl/urllib) frequently receive 429 or 403 responses. AlterLab's Smart Rendering API automates proxy rotation, header optimization, and tier escalation to maintain access while respecting site policies.
Quick start with AlterLab API
Begin with our Getting started guide. Here's how to fetch the Hacker News front page:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape("https://news.ycombinator.com")
print(response.text[:500]) # First 500 chars of HTMLimport { AlterLab } from "@alterlab/sdk";
const client = new AlterLab({ apiKey: "YOUR_API_KEY" });
const response = await client.scrape("https://news.ycombinator.com");
console.log(response.text.slice(0, 500));curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-d '{"url": "https://news.ycombinator.com"}'Extracting structured data
Target these common elements using CSS selectors:
- Story titles:
.titleline > a - Scores:
.score - Author names:
.hnuser - Comment counts:
.age > a:nth-child(3)
Example Python extraction:
import alterlab
from parsel import Selector
client = alterlab.Client("YOUR_API_KEY")
html = client.scrape("https://news.ycombinator.com").text
selector = Selector(text=html)
titles = selector.css(".titleline > a::text").getall()
print(f"Found {len(titles)} stories")Structured JSON extraction with Cortex
For typed output without manual parsing, use Cortex AI extraction. Define a schema for story objects:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
result = client.extract(
url="https://news.ycombinator.com",
schema={
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"url": {"type": "string", "format": "uri"},
"score": {"type": "integer"},
"author": {"type": "string"}
},
"required": ["title", "url"]
}
}
)
print(result.data) # List of validated story objectsCost breakdown
Hacker News typically requires T2 (standard headers) or T3 (stealth) tiers due to anti-bot measures. AlterLab auto-escalates: start at T1, pay only for the tier that succeeds.
| Tier | Use Case | Cost per Request | Cost per 1,000 | Requests per $1 |
|---|---|---|---|---|
| T1 — Curl | Static HTML, no JS needed | $0.0002 | $0.20 | 5,000 |
| T2 — HTTP | Standard pages with headers | $0.0003 | $0.30 | 3,333 |
| T3 — Stealth | Protected pages, anti-bot active | $0.002 | $2.00 | 500 |
| T4 — Browser | Full JS rendering required | $0.004 | $4.00 | 250 |
| T5 — CAPTCHA | CAPTCHA solving + JS rendering | $0.02 | $20.00 | 50 |
See AlterLab pricing for volume discounts. For most Hacker News scraping, expect $0.30-$2.00 per 1,000 requests.
Best practices
- Rate limiting: AlterLab respects
Crawl-delayin robots.txt. Addwait_time=1parameter for 1-second intervals between requests. - Robots.txt: Hacker News allows scraping with
User-agent: *andCrawl-delay: 30. Adjust frequency accordingly. - Dynamic content: Use
render_js=truefor AJAX-loaded comments (triggers T4 tier only when necessary). - Error handling: Implement exponential backoff for 429 responses. AlterLab auto-retries failed tiers.
Scaling up
For large datasets:
- Batch requests: Send 100 URLs per API call using
urlsarray parameter - Scheduling: Use AlterLab's cron endpoint for daily/weekly scrapes
- Storage: Stream results directly to S3 or your database via webhooks
- Responsibility: Monitor response codes; pause if 4xx errors exceed 1%
Key takeaways
- AlterLab manages anti-bot challenges so you focus on data extraction
- Always verify public data accessibility and comply with robots.txt
- Use Cortex for type-safe JSON output instead of brittle CSS selectors
- Start scraping at T1 tier—pay only for what succeeds
- Scale responsibly with rate limiting and error handling
Related resource: Hacker News scraping guide
Was this article helpful?
Frequently Asked Questions
Related Articles

Product Hunt Data API: Extract Structured JSON in 2026
Learn how to extract structured JSON data from Product Hunt using AlterLab's Extract API. Get typed product data (title, author, tags) without parsing HTML or handling anti-bot measures.
Herald Blog Service

Redfin Data API: Extract Structured JSON in 2026
Extract structured Redfin data via API using AlterLab's Extract AI. Get typed JSON for address, price, bedrooms and more—no HTML parsing needed. Practical guide for data pipelines.
Herald Blog Service
How to Migrate from ZenRows to AlterLab: Step-by-Step Guide (2026)
A practical, copy-paste ready guide to migrate from ZenRows to AlterLab, focusing on pay-as-you-go pricing and minimal code changes.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Anti-Bot Handling API
Automatic challenge handling for protected sites — works out of the box.
JavaScript Rendering API
Render SPAs and dynamic content with headless Chromium.
Pricing
5-tier pricing from $0.0002/page. 5,000 free requests to start.
Documentation
API reference, SDKs, quickstart guides, and tutorials.
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.