How to Scrape Trustpilot Data: Complete Guide for 2026
Tutorials

How to Scrape Trustpilot Data: Complete Guide for 2026

Learn how to scrape Trustpilot reviews using Python and AlterLab's API. Covers anti-bot handling, selectors, best practices, and scalable pipelines.

4 min read
8 views

This guide shows how to extract publicly available review data from Trustpilot using Python and AlterLab's scraping API. All examples target pages that do not require authentication.

Disclaimer: This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

To scrape Trustpilot reviews, send a GET request to AlterLab's /v1/scrape endpoint with the target URL, parse the returned HTML with CSS selectors or XPath for review text, rating, and date, and handle pagination programmatically. Use rate limiting and respect Trustpilot's robots.txt.

Why collect reviews data from Trustpilot?

  1. Market research – Aggregate sentiment across competitors to identify product strengths and weaknesses.
  2. Price monitoring – Correlate review spikes with pricing changes or promotional events.
  3. Data analysis pipelines – Feed structured review datasets into NLP models for trend detection or recommendation systems.

Technical challenges

Trustpilot loads most review content via JavaScript, employs rate‑limiting per IP, and uses bot‑challenge pages (e.g., Cloudflare Turnstile) to filter automated traffic. Plain requests.get often returns a challenge page or empty HTML. AlterLab's Smart Rendering API runs a headless browser, rotates residential proxies, and automatically solves challenges, delivering the fully rendered public page.

99.2%Success Rate
1.2sAvg Response

Quick start with AlterLab API

First, install the Python SDK and review the Getting started guide for authentication details.

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
    url="https://www.trustpilot.com/review/example.com",
    params={"render": True, "wait_for": ".review-card"}
)
print(response.text[:500])

The equivalent cURL request:

Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.trustpilot.com/review/example.com",
    "render": true,
    "wait_for": ".review-card"
  }'

Both examples naturally appears as:

Extracting structured data

After obtaining the HTML, use a parsing library such as BeautifulSoup or parsel to pull the needed fields. Trustpilot's public review cards use stable class names.

Python
from parsel import Selector

selector = Selector(text=response.text)
reviews = []

for card in selector.css(".review-card"):
    reviews.append({
        "title": card.css(".review-title::text").get().strip(),
        "rating": int(card.css(".star-rating-stroke::attr(data-rating)").get()),
        "text": card.css(".review-content__text::text").get().strip(),
        "date": card.css(".review-date::attr(data-service-review-date)").get(),
    })

print(reviews[:2])

Key selectors:

  • Review container: .review-card
  • Title: .review-title::text
  • Rating: .star-rating-stroke (data‑rating attribute)
  • Text: .review-content__text::text
  • Date: .review-date (data-service-review-date attribute)

For JSON‑LD structured data, you can also parse <script type="application/ld+json"> blocks that sometimes contain aggregated rating information.

Best practices

  • Rate limiting – Start with 1 request per second; increase gradually while monitoring HTTP 429 responses.
  • Robots.txt – Check https://www.trustpilot.com/robots.txt for disallowed paths; avoid scraping private user profiles.
  • Headers – Send a realistic User‑Accept header; AlterLab adds one by default, but you can override if needed.
  • Error handling – Retry on 5xx or network errors with exponential backoff; treat 429 as a signal to pause.
  • Data storage – Write each batch to a newline‑delimited JSON file to enable resumable runs.

Scaling up

For large‑scale projects, schedule nightly jobs via cron or a workflow orchestrator (e.g., Airflow). Use AlterLab's batch endpoint to send up to 100 URLs per request, reducing overhead. Monitor costs, reducing per‑call latency. See the pricing page for volume‑based rates; typical workloads of 100 k reviews/month fall into the Growth tier.

Example batch request:

Python
urls = [
    f"https://www.trustpilot.com/review/site{i}.com"
    for i in range(1, 21)
]

batch_response = client.batch_scrape(
    urls=[{"url": u, "render": True} for u in urls],
    webhook_url="https://yourapi.example.com/webhook"
)
print(batch_response.id)  # use to fetch results later

Combine the output with a scheduling service to refresh datasets daily, and store results in a data warehouse for downstream analytics.

Key takeaways

  • Trustpilot's public review pages are accessible via AlterLab's Smart Rendering API, which handles JavaScript and bot challenges.
  • Use CSS selectors (.review-card, .review-title, etc.) to extract review title, rating, text, and date.
  • Apply responsible scraping: respect robots.txt, limit request rates, and handle errors gracefully.
  • Scale with batch requests, scheduled jobs, and cost‑effective pricing tiers.
  • Always verify that the data you collect is publicly available and compliant with Trustpilot's terms.
Try it yourself

Try scraping Trustpilot with AlterLab

Share

Was this article helpful?

Frequently Asked Questions

Scraping publicly accessible data is generally permissible under rulings like hiQ v LinkedIn, but you must review Trustpilot's robots.txt and Terms of Service, apply rate limiting, and avoid private or login‑protected information.
Trustpilot employs JavaScript rendering, rate limits, and bot detection mechanisms that block raw HTTP requests; AlterLab's Smart Rendering API handles headless browsers, proxies, and automatic retries to retrieve public pages reliably.
AlterLab charges per successful scrape; volume discounts lower the effective price. See the pricing page for tiered rates and estimate costs based on your request frequency and concurrency.