Product Hunt Data API: Extract Structured JSON in 2026
Tutorials

Product Hunt Data API: Extract Structured JSON in 2026

Learn how to extract structured JSON data from Product Hunt using AlterLab's Extract API. Get typed product data (title, author, tags) without parsing HTML or handling anti-bot measures.

5 min read
5 views

AlterLab handles this automaticallyscrape any URL with one API call. No infrastructure required.

Try it free

This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

To get structured Product Hunt data via API, use AlterLab's Extract API with a JSON schema defining the fields you need (title, author, published_date, tags, url). Send a POST request to the extract endpoint with the Product Hunt URL and your schema, and receive validated, typed JSON without HTML parsing. This approach handles anti-bot measures and delivers ready-to-use data for pipelines.

Why use Product Hunt data?

Product Hunt remains a leading indicator of emerging tech trends. Engineering teams leverage its public data for:

  • AI training: Curating datasets of new product launches to fine-tune models on innovation patterns
  • Analytics: Tracking category-specific launch velocity to identify rising developer tools or AI trends
  • Competitive intelligence: Monitoring competitor product announcements and feature releases in real time

What data can you extract?

From publicly accessible Product Hunt pages, you can extract:

  • title: Product name (string)
  • author: Maker's username (string)
  • published_date: Launch timestamp (string, ISO 8601 format)
  • tags: Topic categories (array of strings, e.g., ["AI", "Developer Tools"])
  • url: Canonical Product Hunt URL (string)

These fields form the core dataset for tech trend analysis, with tags providing critical context for categorization.

The extraction approach

Direct HTTP requests to Product Hunt frequently encounter anti-bot measures (rate limits, JavaScript challenges, IP blocking). Parsing raw HTML with CSS selectors is fragile—minor UI changes break selectors, requiring constant maintenance.

AlterLab's Extract API solves this by treating the web as a data source. Instead of parsing HTML, you define what data you want via a JSON schema. The API:

  • Automatically handles rendering, proxies, and CAPTCHA resolution
  • Uses AI to locate the highest the page may be (1000 tokens)
  • Returns validated, typed JSON matching your schema
  • Eliminates HTML parsing entirely

This shifts the burden from fragile scraping to precise data specification—ideal for production pipelines.

Quick start with AlterLab Extract API

Begin by installing the AlterLab SDK and making your first extraction request. See the Getting started guide for setup details.

Here's a Python example extracting structured data from a Product Hunt page:

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

schema = {
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "The product title"
    },
    "author": {
      "type": "string",
      "description": "The maker's username"
    },
    "published_date": {
      "type": "string",
      "description": "Launch date in ISO 8601 format"
    },
    "tags": {
      "type": "array",
      "items": {"type": "string"},
      "description": "Topic tags as string array"
    },
    "url": {
      "type": "string",
      "description": "Product Hunt page URL"
    }
  }
}

result = client.extract(
    url="https://producthunt.com/posts/example-product",
    schema=schema,
)
print(result.data)

Output:

JSON
{
  "title": "Example Product",
  "author": "jane_dev",
  "published_date": "2026-03-15T08:30:00Z",
  "tags": ["AI", "Developer Tools"],
  "url": "https://producthunt.com/posts/example-product"
}

For quick testing, use cURL:

Bash
curl -X POST https://api.alterlab.io/v1/extract \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://producthunt.com/posts/example-product",
    "schema": {
      "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"},
        "published_date": {"type": "string"},
        "tags": {"type": "array", "items": {"type": "string"}},
        "url": {"type": "string"}
      }
    }
  }'

Define your schema

The JSON schema parameter is central to AlterLab's Extract API. It uses JSON Schema Draft 07 to:

  • Validate structure: Ensures output matches your expected object shape
  • Enforce types: Converts extracted strings to booleans, numbers, or arrays as defined
  • Provide descriptions: Improves AI extraction accuracy for ambiguous fields

In the Product Hunt example above:

  • tags is defined as an array of strings to capture multiple categories
  • published_date uses string format (ISO 8601) since AlterLab preserves date strings as-is
  • All fields include descriptions to guide the AI extraction model

AlterLab returns only validated data—if a field can't be extracted or typed correctly, it omits that field (or returns null if nullable: true is set). This guarantees pipeline-ready output without null-checking overhead.

Handle pagination and scale

Product Hunt's tech section paginates via ?page=2, ?page=3, etc. For high-volume extraction:

  1. Batching: Process 10-20 pages per request batch to minimize API calls
  2. Rate limiting: AlterLab handles automatic retries with exponential backoff, but respect Product Hunt's public rate limits (aim for <1 req/sec sustained)
  3. Async jobs: Use AlterLab's job API for non-blocking extraction at scale

Example async batch processing:

Python
import alterlab
import asyncio

client = alterlab.Client("YOUR_API_KEY")

async def extract_page(page_num):
    url = f"https://producthunt.com/tech?page={page_num}"
    schema = {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "url": {"type": "string"}
            }
        }
    }
    return await client.extract_async(url=url, schema=schema)

async def main():
    # Extract pages 1-5 concurrently
    tasks = [extract_page(i) for i in range(1, 6)]
    results = await asyncio.gather(*tasks)
    for i, result in enumerate(results, 1):
        print(f"Page {i}: {len(result.data)} products extracted")

asyncio.run(main())

This approach processes multiple pages in parallel while AlterLab manages infrastructure complexity. For cost estimation, AlterLab's pricing scales with successful extractions—see pricing for volume discounts.

Key takeaways

  • Structured over raw: Define your data needs via JSON schema to get typed JSON—no HTML parsing required
  • Compliant by design: AlterLab handles anti-bot measures automatically while you focus on data utility
  • Pipeline-ready output: Validated, typed data flows directly into analytics or ML workflows
  • Cost efficiency: Pay only for successful extractions with no infrastructure overhead

Replace fragile scraping with precise data specification. Start extracting structured Product Hunt data today with AlterLab's Extract API.

99.2%Extraction Accuracy
1.4sAvg Response Time
100%Typed JSON Output
Try it yourself

Extract structured tech data from Product Hunt

Share

Was this article helpful?

Frequently Asked Questions

Product Hunt offers a limited public API for basic post and comment data, but it doesn't provide full product detail extraction or structured JSON output for arbitrary pages. AlterLab fills this gap by enabling schema-based extraction of any publicly accessible Product Hunt page with automated anti-bot handling.
You can extract any publicly visible field including title, author, published_date, tags (as array of strings), and URL by defining a JSON schema. AlterLab validates and types the output to match your schema exactly, delivering ready-to-use data for pipelines.
AlterLab charges per successful extraction request with pay-as-you-go pricing—no minimums or expiring credits. Costs scale with usage; see [pricing](/pricing) for detailed rates based on extraction volume and feature tiers.