How to Scrape Yelp Data: Complete Guide for 2026
Tutorials

How to Scrape Yelp Data: Complete Guide for 2026

Learn how to scrape Yelp for public business data using Python, AlterLab API, and best practices for handling JavaScript, rate limits, and anti-bot measures.

5 min read
11 views

This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

To scrape Yelp with Python, use AlterLab’s API to render JavaScript, extract public business details via CSS selectors, and respect rate limits. A single request returns clean HTML you can parse with BeautifulSoup or lxml.

Why collect local data from Yelp?

Yelp hosts a wealth of public business information useful for several engineering workflows:

  • Market research: Track competitor listings, review counts, and rating trends across categories.
  • Price monitoring: Extract menu items or service prices from restaurant and salon pages for dynamic pricing models.
  • Data enrichment: Augment internal databases with business hours, location coordinates, and category tags for local search features.

These use cases rely solely on data visible on public pages—no login or private data required.

Technical challenges

Yelp’s modern site presents three core obstacles for scrapers:

  1. JavaScript‑heavy rendering: Business details load client‑side, so a plain requests.get returns an empty container.
  2. Rate limiting & IP bans: Exceeding a modest request threshold triggers temporary blocks or CAPTCHAs.
  3. Bot detection headers: The server checks for typical automation signatures (missing user‑agent, lack of TLS fingerprinting).

Raw HTTP clients fail because they cannot execute the page’s React hydrate cycle. AlterLab’s Smart Rendering API solves this by launching a headless browser, applying rotating proxies, and waiting for network idle before returning the fully rendered DOM.

99.2%Success Rate
1.2sAvg Response

Quick start with AlterLab API

First, install the official Python SDK (see the Getting started guide for full setup). Then authenticate and scrape a public Yelp page.

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")
# Target a public business page – no login required
response = client.scrape(
    url="https://www.yelp.com/biz/example-restaurant-san-francisco",
    params={"render": True, "wait_for": "networkidle"}
)
print(response.status_code)  # 200 if successful
html = response.text

The equivalent cURL request looks like this:

Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.yelp.com/biz/example-restaurant-san-francisco",
    "render": true,
    "wait_for": "networkidle"
  }'

Both examples ask AlterLab to render the page (render: true) and wait until network activity settles, ensuring the business name, rating, and address are present in the returned HTML.

Extracting structured data

Once you have the HTML, use a parser to pull the fields you need. Below are CSS selectors for common public data points on a Yelp business page (as of 2026). Adjust if the class names change.

Python
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

# Business name – typically in an h1 with a specific data‑test attribute
name_tag = soup.select_one('h1[data-testid="business-name"]')
business_name = name_tag.get_text(strip=True) if name_tag else None

# Rating – often stored in a div with aria-label
rating_tag = soup.select_one('div[role="img"][aria-label*="star rating"]')
rating = rating_tag["aria-label"].split()[0] if rating_tag else None

# Review count – adjacent to the rating
review_tag = soup.select_one('p[class*="review-count"]')
review_count = review_tag.get_text(strip=True).split()[0] if review_tag else None

# Address – first line of the address block
address_tag = soup.select_one('address p')
address = address_tag.get_text(strip=True) if address_tag else None

print({
    "business_name": business_name,
    "rating": rating,
    "review_count": review_count,
    "address": address
})

If you prefer JSON‑style extraction, AlterLab can return structured data directly via its Cortex AI add‑on, but the CSS approach works for pure HTML output.

Best practices

Scraping responsibly keeps your pipelines running smoothly and respects the target site:

  • Rate limit yourself: Even with AlterLab’s proxy pool, send no more than 2–3 requests per second per IP to avoid triggering Yelp’s anti‑bot thresholds.
  • Honor robots.txt: Check https://www.yelp.com/robots.txt for disallowed paths (e.g., /ajax/*, /user/*). Stick to /biz/* and /search/* for public data.
  • Handle dynamic content: Use AlterLab’s wait_for parameter (networkidle or a specific selector) to ensure the DOM is ready before extracting.
  • Rotate user‑agents: Though AlterLab does this automatically, if you build a custom scraper, rotate a list of realistic browser strings.
  • Log failures: Capture HTTP 429 or 503 responses and implement exponential backoff.

Following these rules reduces the chance of temporary bans and keeps your data fresh.

Scaling up

When you need to scrape hundreds or thousands of Yelp pages, consider these patterns:

  • Batch requests: Send multiple URLs in a single API call using AlterLab’s batch endpoint (up to 20 URLs per request) to cut connection overhead.
  • Scheduling: Use the platform’s cron feature to run a nightly scrape of a changing dataset (e.g., new restaurant openings).
  • Cost awareness: Review the pricing page to estimate monthly spend based on your request volume and rendering tier. AlterLab’s pay‑as‑you‑go model means you only pay for successful scrapes.
  • Storage: Stream results directly to a data warehouse or object store; avoid holding large HTML strings in memory longer than necessary.

A typical scaling workflow might look like:

Key takeaways

  • Use AlterLab’s headless browser rendering to bypass Yelp’s JavaScript and anti‑bot measures.
  • Extract only publicly visible fields with reliable CSS selectors; avoid scraping behind login walls.
  • Apply polite rate limits, respect robots.txt, and log errors to maintain a sustainable scraper.
  • Leverage batching and scheduling to scale efficiently while monitoring cost via AlterLab’s pricing page.

Hit reply if you have questions.

Share

Was this article helpful?

Frequently Asked Questions

Scraping publicly accessible data is generally permissible under rulings like hiQ v LinkedIn, but you must review Yelp’s robots.txt and Terms of Service, respect rate limits, and avoid private or login‑restricted information.
Yelp employs JavaScript rendering, rate limiting, and bot detection mechanisms that break raw HTTP requests; AlterLab’s Smart Rendering API handles headless browsing, proxy rotation, and CAPTCHA solving to return clean HTML.
AlterLab charges per successful scrape; see the pricing page for volume discounts. Costs scale with request count and rendering tier, letting you pay only for what you use.