
How to Scrape Trustpilot Data: Complete Guide for 2026
Learn how to scrape Trustpilot reviews using Python and AlterLab's API. Covers anti-bot handling, selectors, best practices, and scalable pipelines.
This guide shows how to extract publicly available review data from Trustpilot using Python and AlterLab's scraping API. All examples target pages that do not require authentication.
Disclaimer: This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.
TL;DR
To scrape Trustpilot reviews, send a GET request to AlterLab's /v1/scrape endpoint with the target URL, parse the returned HTML with CSS selectors or XPath for review text, rating, and date, and handle pagination programmatically. Use rate limiting and respect Trustpilot's robots.txt.
Why collect reviews data from Trustpilot?
- Market research – Aggregate sentiment across competitors to identify product strengths and weaknesses.
- Price monitoring – Correlate review spikes with pricing changes or promotional events.
- Data analysis pipelines – Feed structured review datasets into NLP models for trend detection or recommendation systems.
Technical challenges
Trustpilot loads most review content via JavaScript, employs rate‑limiting per IP, and uses bot‑challenge pages (e.g., Cloudflare Turnstile) to filter automated traffic. Plain requests.get often returns a challenge page or empty HTML. AlterLab's Smart Rendering API runs a headless browser, rotates residential proxies, and automatically solves challenges, delivering the fully rendered public page.
Quick start with AlterLab API
First, install the Python SDK and review the Getting started guide for authentication details.
import alterlab
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
url="https://www.trustpilot.com/review/example.com",
params={"render": True, "wait_for": ".review-card"}
)
print(response.text[:500])The equivalent cURL request:
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.trustpilot.com/review/example.com",
"render": true,
"wait_for": ".review-card"
}'Both examples naturally appears as:
Extracting structured data
After obtaining the HTML, use a parsing library such as BeautifulSoup or parsel to pull the needed fields. Trustpilot's public review cards use stable class names.
from parsel import Selector
selector = Selector(text=response.text)
reviews = []
for card in selector.css(".review-card"):
reviews.append({
"title": card.css(".review-title::text").get().strip(),
"rating": int(card.css(".star-rating-stroke::attr(data-rating)").get()),
"text": card.css(".review-content__text::text").get().strip(),
"date": card.css(".review-date::attr(data-service-review-date)").get(),
})
print(reviews[:2])Key selectors:
- Review container:
.review-card - Title:
.review-title::text - Rating:
.star-rating-stroke(data‑rating attribute) - Text:
.review-content__text::text - Date:
.review-date(data-service-review-date attribute)
For JSON‑LD structured data, you can also parse <script type="application/ld+json"> blocks that sometimes contain aggregated rating information.
Best practices
- Rate limiting – Start with 1 request per second; increase gradually while monitoring HTTP 429 responses.
- Robots.txt – Check
https://www.trustpilot.com/robots.txtfor disallowed paths; avoid scraping private user profiles. - Headers – Send a realistic
User‑Acceptheader; AlterLab adds one by default, but you can override if needed. - Error handling – Retry on 5xx or network errors with exponential backoff; treat 429 as a signal to pause.
- Data storage – Write each batch to a newline‑delimited JSON file to enable resumable runs.
Scaling up
For large‑scale projects, schedule nightly jobs via cron or a workflow orchestrator (e.g., Airflow). Use AlterLab's batch endpoint to send up to 100 URLs per request, reducing overhead. Monitor costs, reducing per‑call latency. See the pricing page for volume‑based rates; typical workloads of 100 k reviews/month fall into the Growth tier.
Example batch request:
urls = [
f"https://www.trustpilot.com/review/site{i}.com"
for i in range(1, 21)
]
batch_response = client.batch_scrape(
urls=[{"url": u, "render": True} for u in urls],
webhook_url="https://yourapi.example.com/webhook"
)
print(batch_response.id) # use to fetch results laterCombine the output with a scheduling service to refresh datasets daily, and store results in a data warehouse for downstream analytics.
Key takeaways
- Trustpilot's public review pages are accessible via AlterLab's Smart Rendering API, which handles JavaScript and bot challenges.
- Use CSS selectors (
.review-card,.review-title, etc.) to extract review title, rating, text, and date. - Apply responsible scraping: respect robots.txt, limit request rates, and handle errors gracefully.
- Scale with batch requests, scheduled jobs, and cost‑effective pricing tiers.
- Always verify that the data you collect is publicly available and compliant with Trustpilot's terms.
Try scraping Trustpilot with AlterLab
Was this article helpful?
Frequently Asked Questions
Related Articles

Shopify Stores Data API: Extract Structured JSON in 2026
Learn how to extract structured JSON data from Shopify Stores using AlterLab's Extract API. Get typed e-commerce data (title, price, SKU) without HTML parsing.
Herald Blog Service

Best Buy Data API: Extract Structured JSON in 2026
Extract structured JSON from Best Buy product pages using AlterLab's data API. Get typed fields like price, SKU, and availability without HTML parsing.
Herald Blog Service

Expedia Data API: Extract Structured JSON in 2026
Learn how to extract structured Expedia data as JSON using AlterLab's Extract API — define a schema, get typed results, and build reliable travel data pipelines.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Anti-Bot Handling API
Automatic challenge handling for protected sites — works out of the box.
JavaScript Rendering API
Render SPAs and dynamic content with headless Chromium.
Pricing
5-tier pricing from $0.0002/page. 5,000 free requests to start.
Documentation
API reference, SDKs, quickstart guides, and tutorials.
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.