How to Scrape G2 Data: Complete Guide for 2026
Tutorials

How to Scrape G2 Data: Complete Guide for 2026

Learn how to scrape G2 reviews with Python using AlterLab's API, handling anti-bot measures and extracting structured data.

3 min read
4 views

This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

To scrape G2 reviews with AlterLab, send a POST request to the scrape endpoint with the target URL and your API key. The response returns the rendered HTML, which you can parse with Python libraries like BeautifulSoup to extract review titles, ratings, and dates. Use the Python SDK or cURL examples below to get started in under five minutes.

Why collect reviews data from G2?

G2 hosts verified user reviews for thousands of SaaS products, making it a rich source for market research. Teams extract this data to monitor competitor sentiment, identify feature gaps, and inform product roadmaps. Analysts also aggregate ratings over time to track market trends and validate pricing strategies.

Technical challenges

Scraping G2 presents three main obstacles. First, the site enforces strict rate limits on IP addresses, returning 429 responses after a few requests per minute. Second, many review sections load via JavaScript after the initial HTML, so a raw HTTP request misses the content. Third, G2 employs bot detection that challenges headless browsers with CAPTCHAs or JavaScript checks. AlterLab's Smart Rendering API addresses these by rotating residential proxies, executing pages in a real Chromium environment, and automatically solving challenges while preserving compliance with public‑data access.

Quick start with AlterLab API

Begin by installing the AlterLab Python SDK and making your first request. See the Getting started guide for installation details.

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape("https://g.com/project-management")
print(response.text[:500])
Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"url": "https://g2.com/project-management"}'

The response contains the fully rendered page, ready for parsing.

Extracting structured data

Once you have the HTML, use CSS selectors to pull the visible review fields. G2 structures each review inside a <div class="review-item">. Within that, the title lives in an <h3>, the rating in a <span class="rating">, and the date in a <time> tag.

Python
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
reviews = []
for block in soup.select("div.review-item"):
    title = block.select_one("h3").get_text(strip=True)
    rating = block.select_one("span.rating").get_text(strip=True)
    date = block.select_one("time").get("datetime")
    reviews.append({"title": title, "rating": rating, "date": date})
print(reviews[:3])

For JSON‑oriented endpoints, you can also request formats=["json"] to receive pre‑extracted fields, though the HTML route works for any custom data point.

Best practices

Respect G2's crawling policies by throttling requests to no more than one per second. Use AlterLab's built‑in rate limiting via the max_concurrency parameter to avoid 429 errors. Always check /robots.txt before scaling; the file permits scraping of /product/* pages with a crawl delay of 10 seconds. Handle dynamic content by enabling JavaScript rendering (render_js: true) and waiting for network idle. Finally, store results in a time‑series database to facilitate trend analysis without re‑scraping unchanged pages.

Scaling up

For large‑scale jobs, batch URLs into chunks of 100 and process them with asynchronous calls. Schedule nightly runs using AlterLab's cron‑based scheduling to keep datasets fresh. Monitor usage and costs on the dashboard; the pricing page shows tiered rates that drop as volume increases. If you need to extract data from thousands of product pages, consider enabling the min_tier parameter to skip levels that do not require JavaScript, reducing both time and expense.

Key takeaways

  • Use AlterLab's Smart Rendering API to bypass JavaScript and bot defenses on G2.
  • Parse the returned HTML with CSS selectors to extract review titles, ratings, and dates.
  • Apply rate limiting, respect robots.txt, and cache unchanged pages to scrape responsibly.
  • Automate recurring jobs with scheduling and track expenses via the pricing page.
99.2%Success Rate
1.2sAvg Response
Try it yourself

Try scraping G2 with AlterLab

```
Share

Was this article helpful?

Frequently Asked Questions

Scraping publicly accessible data is generally permissible under rulings like hiQ v LinkedIn, but you must review G2's robots.txt and Terms of Service, respect rate limits, and avoid private or login‑restricted information.
G2 employs rate limiting, JavaScript‑rendered content, and bot detection mechanisms that block raw HTTP requests; AlterLab's Smart Rendering API handles headless browsing, proxy rotation, and automatic retries to retrieve public data reliably.
AlterLab charges per successful scrape; costs scale with request volume and rendering tier, and the /pricing page details volume discounts for high‑frequency jobs.