Trustpilot Data API: Extract Structured JSON in 2026
Tutorials

Trustpilot Data API: Extract Structured JSON in 2026

Learn how to extract structured Trustpilot review data via AlterLab's data API—get typed JSON output for product_name, rating, review_count and more with zero HTML parsing.

5 min read
7 views

TL;DR

Use AlterLab's Extract API to send a Trustpilot URL and a JSON schema describing the fields you need—such as product_name, rating, review_count, category, and verified_purchase. The API returns typed, validated JSON without any HTML parsing. This guide shows the exact Python and cURL calls, schema design, and scaling tips for production pipelines.

Disclaimer: This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

Why use Trustpilot data?

Trustpilot hosts millions of public reviews that signal product quality, customer sentiment, and market trends. Engineering teams use this data to:

  • Train sentiment analysis models for product recommendation engines
  • Monitor competitor product launches and rating shifts in near real time
  • Enrich internal analytics pipelines with verified purchase signals and category tags

Because the data is publicly listed on product pages, it can be harvested responsibly to feed downstream AI or business intelligence workflows.

What data can you extract?

Each Trustpilot review page contains structured information that AlterLab can return as typed JSON. The most commonly requested fields are:

  • product_name – the item or service being reviewed (string)
  • rating – the star rating shown (string, e.g., "4.5")
  • review_count – total number of reviews for that product (string)
  • category – the Trustpilot category tree (string)
  • verified_purchase – flag indicating whether the reviewer confirmed purchase (string)

You are not limited to these fields; any visible text can be captured by adjusting the schema. The API validates each extracted value against the declared type, guaranteeing clean downstream consumption.

The extraction approach

Traditional scraping requires sending raw HTTP requests, parsing fluctuating HTML, handling pagination, and mitigating anti‑bot measures. This approach is fragile: a minor CSS change breaks selectors, and Trustpilot's bot defenses trigger CAPTCHAs or IP blocks.

AlterLab treats the web as a data API. You declare the shape of the data you want with a JSON schema; the platform handles retrieval, JavaScript rendering, anti‑bot evasion, and returns conforming JSON. This shifts engineering effort from fragile parsing to defining the data contract.

Quick start with AlterLab Extract API

First install the Python client (or use cURL directly). The following example shows a synchronous call to extract a single Trustpilot product page.

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

schema = {
  "type": "object",
  "properties": {
    "product_name": {
      "type": "string",
      "description": "The product name field"
    },
    "rating": {
      "type": "string",
      "description": "The rating field"
    },
    "review_count": {
      "type": "string",
      "description": "The review count field"
    },
    "category": {
      "type": "string",
      "description": "The category field"
    },
    "verified_purchase": {
      "type": "string",
      "description": "The verified purchase field"
    }
  }
}

result = client.extract(
    url="https://trustpilot.com/example-page",
    schema=schema,
)
print(result.data)

The same request expressed as cURL:

Bash
curl -X POST https://api.alterlab.io/v1/extract \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://trustpilot.com/example-page",
    "schema": {"properties": {"product_name": {"type": "string"}, "rating": {"type": "string"}, "review_count": {"type": "string"}}}
  }'

Both snippets return a JSON object where each property matches the schema definition, with proper typing and no extra HTML fragments.

Example output

JSON
{
  "product_name": "Wireless Noise‑Cancelling Headphones",
  "rating": "4.7",
  "review_count": "1284",
  "category": "Electronics > Audio > Headphones",
  "verified_purchase": "true"
}

Define your schema

The Extract API uses JSON Schema Draft‑07. You supply a top‑level object with a properties map. Each property can include:

  • type (string, number, boolean, array, object)
  • description (optional, for documentation)
  • default (optional, used if extraction fails)

AlterLab validates the model output against this schema. If a value cannot be coerced to the declared type, the field is omitted or set to null depending on your handling preferences. This guarantees that downstream consumers receive predictable data shapes.

For arrays (e.g., extracting multiple reviews from a listing page), define an array type with an inner object schema:

JSON
"reviews": {
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "rating": {"type": "string"},
      "title": {"type": "string"},
      "text": {"type": "string"}
    }
  }
}

Handle pagination and scale

Trustpilot often paginates reviews across several URLs. To collect large volumes:

  1. Discover page URLs via the site's listing structure or search endpoint.
  2. Batch requests using asynchronous IO to stay within rate limits.
  3. Use AlterLab's job endpoint for extremely high volume—submit a list of URLs and poll for completion.

The following Python snippet shows async batching with asyncio and the AlterLab client:

Python
import alterlab
import asyncio

client = alterlab.Client("YOUR_API_KEY")

schema = {
  "type": "object",
  "properties": {
    "product_name": {"type": "string"},
    "rating": {"type": "string"},
    "review_count": {"type": "string"}
  }
}

async def extract_one(url):
    try:
        resp = await client.extract_async(url=url, schema=schema)
        return resp.data
    except Exception as exc:
        return {"url": url, "error": str(exc)}

async def main():
    urls = [
        f"https://trustpilot.com/review/example?page={i}"
        for i in range(1, 6)
    ]
    tasks = [extract_one(u) for u in urls]
    results = await asyncio.gather(*tasks)
    for r in results:
        print(r)

if __name__ == "__main__":
    asyncio.run(main())

AlterLab automatically rotates IPs, solves challenges, and retries transient failures, allowing you to focus on pagination logic rather than low‑level network handling.

When evaluating cost, consult the pricing page. Charges are per successful extraction request; there are no upfront commitments and unused balance carries forward indefinitely.

Key takeaways

  • Treat Trustpilot as a data source, not a scraping target: define a JSON schema and let AlterLab handle retrieval and validation.
  • The Extract API eliminates fragile HTML parsing, delivering typed JSON ready for model training or analytics.
  • Start with a single URL to verify your schema, then scale using async batching or the job endpoint for large‑scale pipelines.
  • Always verify that your collection complies with Trustpilot's robots.txt and Terms of Service; AlterLab provides the technical means, responsibility remains with you.
99.2%Extraction Accuracy
1.4sAvg Response Time
100%Typed JSON Output
Try it yourself

Extract structured reviews data from Trustpilot

Share

Was this article helpful?

Frequently Asked Questions

Trustpilot offers limited partner APIs for business accounts, but they require approval and often lack granular review-level access. AlterLab provides a self‑service data API that extracts publicly available review data and returns validated JSON based on any schema you define.
You can extract publicly listed review fields such as product_name, rating, review_count, category, and verified_purchase. By defining a JSON schema you receive typed, validated output—no HTML parsing needed.
AlterLab charges per successful extraction request with a pay‑as‑you‑go model; there are no minimums and unused balance never expires. See the pricing page for current rates.