
Trustpilot Data API: Extract Structured JSON in 2026
Learn how to extract structured Trustpilot review data via AlterLab's data API—get typed JSON output for product_name, rating, review_count and more with zero HTML parsing.
TL;DR
Use AlterLab's Extract API to send a Trustpilot URL and a JSON schema describing the fields you need—such as product_name, rating, review_count, category, and verified_purchase. The API returns typed, validated JSON without any HTML parsing. This guide shows the exact Python and cURL calls, schema design, and scaling tips for production pipelines.
Disclaimer: This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.
Why use Trustpilot data?
Trustpilot hosts millions of public reviews that signal product quality, customer sentiment, and market trends. Engineering teams use this data to:
- Train sentiment analysis models for product recommendation engines
- Monitor competitor product launches and rating shifts in near real time
- Enrich internal analytics pipelines with verified purchase signals and category tags
Because the data is publicly listed on product pages, it can be harvested responsibly to feed downstream AI or business intelligence workflows.
What data can you extract?
Each Trustpilot review page contains structured information that AlterLab can return as typed JSON. The most commonly requested fields are:
- product_name – the item or service being reviewed (string)
- rating – the star rating shown (string, e.g., "4.5")
- review_count – total number of reviews for that product (string)
- category – the Trustpilot category tree (string)
- verified_purchase – flag indicating whether the reviewer confirmed purchase (string)
You are not limited to these fields; any visible text can be captured by adjusting the schema. The API validates each extracted value against the declared type, guaranteeing clean downstream consumption.
The extraction approach
Traditional scraping requires sending raw HTTP requests, parsing fluctuating HTML, handling pagination, and mitigating anti‑bot measures. This approach is fragile: a minor CSS change breaks selectors, and Trustpilot's bot defenses trigger CAPTCHAs or IP blocks.
AlterLab treats the web as a data API. You declare the shape of the data you want with a JSON schema; the platform handles retrieval, JavaScript rendering, anti‑bot evasion, and returns conforming JSON. This shifts engineering effort from fragile parsing to defining the data contract.
Quick start with AlterLab Extract API
First install the Python client (or use cURL directly). The following example shows a synchronous call to extract a single Trustpilot product page.
import alterlab
client = alterlab.Client("YOUR_API_KEY")
schema = {
"type": "object",
"properties": {
"product_name": {
"type": "string",
"description": "The product name field"
},
"rating": {
"type": "string",
"description": "The rating field"
},
"review_count": {
"type": "string",
"description": "The review count field"
},
"category": {
"type": "string",
"description": "The category field"
},
"verified_purchase": {
"type": "string",
"description": "The verified purchase field"
}
}
}
result = client.extract(
url="https://trustpilot.com/example-page",
schema=schema,
)
print(result.data)The same request expressed as cURL:
curl -X POST https://api.alterlab.io/v1/extract \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://trustpilot.com/example-page",
"schema": {"properties": {"product_name": {"type": "string"}, "rating": {"type": "string"}, "review_count": {"type": "string"}}}
}'Both snippets return a JSON object where each property matches the schema definition, with proper typing and no extra HTML fragments.
Example output
{
"product_name": "Wireless Noise‑Cancelling Headphones",
"rating": "4.7",
"review_count": "1284",
"category": "Electronics > Audio > Headphones",
"verified_purchase": "true"
}Define your schema
The Extract API uses JSON Schema Draft‑07. You supply a top‑level object with a properties map. Each property can include:
type(string, number, boolean, array, object)description(optional, for documentation)default(optional, used if extraction fails)
AlterLab validates the model output against this schema. If a value cannot be coerced to the declared type, the field is omitted or set to null depending on your handling preferences. This guarantees that downstream consumers receive predictable data shapes.
For arrays (e.g., extracting multiple reviews from a listing page), define an array type with an inner object schema:
"reviews": {
"type": "array",
"items": {
"type": "object",
"properties": {
"rating": {"type": "string"},
"title": {"type": "string"},
"text": {"type": "string"}
}
}
}Handle pagination and scale
Trustpilot often paginates reviews across several URLs. To collect large volumes:
- Discover page URLs via the site's listing structure or search endpoint.
- Batch requests using asynchronous IO to stay within rate limits.
- Use AlterLab's job endpoint for extremely high volume—submit a list of URLs and poll for completion.
The following Python snippet shows async batching with asyncio and the AlterLab client:
import alterlab
import asyncio
client = alterlab.Client("YOUR_API_KEY")
schema = {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"rating": {"type": "string"},
"review_count": {"type": "string"}
}
}
async def extract_one(url):
try:
resp = await client.extract_async(url=url, schema=schema)
return resp.data
except Exception as exc:
return {"url": url, "error": str(exc)}
async def main():
urls = [
f"https://trustpilot.com/review/example?page={i}"
for i in range(1, 6)
]
tasks = [extract_one(u) for u in urls]
results = await asyncio.gather(*tasks)
for r in results:
print(r)
if __name__ == "__main__":
asyncio.run(main())AlterLab automatically rotates IPs, solves challenges, and retries transient failures, allowing you to focus on pagination logic rather than low‑level network handling.
When evaluating cost, consult the pricing page. Charges are per successful extraction request; there are no upfront commitments and unused balance carries forward indefinitely.
Key takeaways
- Treat Trustpilot as a data source, not a scraping target: define a JSON schema and let AlterLab handle retrieval and validation.
- The Extract API eliminates fragile HTML parsing, delivering typed JSON ready for model training or analytics.
- Start with a single URL to verify your schema, then scale using async batching or the job endpoint for large‑scale pipelines.
- Always verify that your collection complies with Trustpilot's robots.txt and Terms of Service; AlterLab provides the technical means, responsibility remains with you.
Extract structured reviews data from Trustpilot
Was this article helpful?
Frequently Asked Questions
Related Articles

How to Give Your AI Agent Access to eBay Data
Learn how to equip your AI agent with live eBay data using AlterLab’s Extract and Search APIs for reliable, structured access.
Herald Blog Service

How to Give Your AI Agent Access to SimilarWeb Data
Learn how to give your AI agent direct access to SimilarWeb traffic data using structured extraction, anti‑bot bypass, and MCP tooling—no parsing, no headaches.
Herald Blog Service

How to Give Your AI Agent Access to Statista Data
Enable AI agents to access public Statista data via AlterLab's APIs for structured extraction, search, and MCP integration—no anti-bot barriers or parsing overhead.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Anti-Bot Handling API
Automatic challenge handling for protected sites — works out of the box.
JavaScript Rendering API
Render SPAs and dynamic content with headless Chromium.
Pricing
5-tier pricing from $0.0002/page. 5,000 free requests to start.
Documentation
API reference, SDKs, quickstart guides, and tutorials.
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.