AlterLabAlterLab
Tutorials

How to Scrape Google Search Results in 2026: Python, APIs, and What Actually Works

Google blocks most scraping attempts within a few requests. Here is what works for extracting SERP data at scale in 2026, from raw Python to headless browsers to scraping APIs.

Yash Dubey

Yash Dubey

February 17, 2026

10 min read
43 views
Share:

Google search results are one of the most valuable data sources on the internet. SEO tools, market research platforms, ad intelligence products, and AI training pipelines all depend on SERP data. Google does not make this easy.

Google runs some of the most aggressive anti-bot systems on the web. A basic Python script will get blocked after a handful of requests. Headless browsers last slightly longer before hitting CAPTCHAs. Proxy rotation helps, but Google fingerprints far more than your IP address.

Here is what actually works in 2026, with code you can run today.

What You Get From Google SERPs

Before writing any code, know what data is available in a Google results page:

  • Organic results: Title, URL, description snippet, position
  • Featured snippets: The answer box at the top
  • People Also Ask: Related questions with expandable answers
  • Knowledge panels: Entity information cards
  • Ads: Sponsored listings with advertiser info
  • Local pack: Map results with business details
  • Shopping results: Product cards with prices
  • Image and video carousels: Media results

Each result type has its own HTML structure. A scraper that only grabs the ten blue links misses most of the page.

10+SERP Feature Types
~60%Clicks Go to Top 3 Results
$50B+SEO Tools Market by 2027

Method 1: Raw HTTP Requests (The Naive Approach)

The simplest approach is sending a GET request to Google. It works for about 5 minutes.

python
import requests
from bs4 import BeautifulSoup

def scrape_google(query):
    url = "https://www.google.com/search"
    params = {"q": query, "num": 10, "hl": "en"}
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
    }

    response = requests.get(url, params=params, headers=headers)

    if response.status_code != 200:
        print(f"Blocked: HTTP {response.status_code}")
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    results = []

    for div in soup.select("div.tF2Cxc"):
        title_el = div.select_one("h3")
        link_el = div.select_one("a")
        snippet_el = div.select_one(".VwiC3b")

        if title_el and link_el:
            results.append({
                "title": title_el.text,
                "url": link_el["href"],
                "snippet": snippet_el.text if snippet_el else "",
            })

    return results

results = scrape_google("best web scraping api")
for r in results:
    print(f"{r['title']}\n  {r['url']}\n")

This will return results for your first few queries. Then Google serves a CAPTCHA page, a 429 status code, or a consent form that blocks further requests.

Why it fails:

  • Google tracks request patterns across your IP
  • The User-Agent alone is not enough to look human
  • Missing cookies, TLS fingerprint, and JavaScript execution are all signals
  • Google's selectors (like div.tF2Cxc) change periodically, breaking your parser

This approach is fine for a one-off test. It is not viable for production use.

Method 2: Headless Browser with Playwright

Using a real browser solves the JavaScript execution problem and gives you a more realistic fingerprint.

python
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup

async def scrape_google_playwright(query):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = await context.new_page()

        await page.goto(
            f"https://www.google.com/search?q={query}&hl=en",
            wait_until="networkidle",
        )

        # Handle consent screen (common in EU)
        try:
            consent_btn = page.locator("button:has-text('Accept all')")
            if await consent_btn.is_visible(timeout=3000):
                await consent_btn.click()
                await page.wait_for_load_state("networkidle")
        except Exception:
            pass

        html = await page.content()
        await browser.close()

    soup = BeautifulSoup(html, "html.parser")
    results = []

    for div in soup.select("div.tF2Cxc"):
        title_el = div.select_one("h3")
        link_el = div.select_one("a")
        snippet_el = div.select_one(".VwiC3b")

        if title_el and link_el:
            results.append({
                "title": title_el.text,
                "url": link_el["href"],
                "snippet": snippet_el.text if snippet_el else "",
            })

    return results

results = asyncio.run(scrape_google_playwright("best web scraping api"))
for r in results:
    print(f"{r['title']}\n  {r['url']}\n")

Playwright gets you further than raw requests. Google sees a real Chromium browser executing JavaScript and rendering the page. But headless detection has gotten sophisticated in 2026.

Where Playwright breaks down:

  • Google detects headless Chromium through WebGL fingerprinting, navigator properties, and behavioral analysis
  • Each browser instance uses 200-400 MB of RAM, making scale expensive
  • CAPTCHAs still appear after 20-50 queries from the same IP
  • Consent screens, cookie banners, and localized results add parsing complexity

Method 3: Stealth Patches and Fingerprint Spoofing

You can make Playwright harder to detect by patching the browser fingerprint:

python
import asyncio
from playwright.async_api import async_playwright

async def stealth_scrape(query):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--disable-features=IsolateOrigins,site-per-process",
            ],
        )
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
            timezone_id="America/New_York",
            geolocation={"longitude": -73.935242, "latitude": 40.730610},
            permissions=["geolocation"],
        )

        page = await context.new_page()

        # Patch navigator.webdriver
        await page.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {
                get: () => undefined
            });
            Object.defineProperty(navigator, 'languages', {
                get: () => ['en-US', 'en']
            });
            Object.defineProperty(navigator, 'plugins', {
                get: () => [1, 2, 3, 4, 5]
            });
        """)

        # Add random delays to look human
        await page.goto(f"https://www.google.com/search?q={query}&hl=en")
        await page.wait_for_timeout(1000 + int(asyncio.get_event_loop().time() * 1000) % 2000)

        html = await page.content()
        await browser.close()
        return html

This buys you more queries before detection, but it is a treadmill. Google updates their detection signatures regularly. The stealth patches that work today may not work next month.

Method 4: Proxy Rotation

Adding proxies distributes your requests across many IP addresses, which is necessary at any meaningful scale.

python
import requests
from itertools import cycle

proxies = [
    "http://user:[email protected]:8080",
    "http://user:[email protected]:8080",
    "http://user:[email protected]:8080",
]

proxy_pool = cycle(proxies)

def scrape_with_proxy(query):
    proxy = next(proxy_pool)
    response = requests.get(
        "https://www.google.com/search",
        params={"q": query, "num": 10, "hl": "en"},
        headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
        },
        proxies={"http": proxy, "https": proxy},
        timeout=15,
    )
    return response

Proxy rotation gets more complex than this quickly. You need to handle:

  • Proxy health checks: Dead proxies waste time and quota
  • Geographic targeting: Google results vary by location
  • Proxy type selection: Datacenter proxies get flagged fast, residential proxies are expensive
  • Session management: Some queries need consistent IP across pagination
  • Cost tracking: Residential bandwidth is billed per GB
FeatureProxy TypeDetection RateCost per GBSpeed
DatacenterHigh~$1Fast
ResidentialLow~$10-15Medium
MobileVery Low~$20-30Variable
ISP (Static Residential)Low~$3-5Fast

For Google specifically, residential proxies are the minimum viable option. Datacenter IPs get blocked within minutes.

Method 5: Scraping API

A scraping API handles the proxy rotation, browser rendering, CAPTCHA solving, and fingerprint management for you. You send a URL, you get back the HTML.

python
import requests

def scrape_google_api(query):
    response = requests.post(
        "https://alterlab.io/api/v1/scrape",
        headers={
            "X-API-Key": "your-api-key",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.google.com/search?q={query}&num=10&hl=en",
            "render_js": True,
        },
    )

    data = response.json()
    return data.get("content", "")

Your parsing logic stays the same. The difference is that someone else manages the infrastructure that keeps requests from getting blocked.

1

Send Query

POST your Google search URL to the scraping API

2

Smart Routing

API selects optimal proxy, browser, and fingerprint

3

Anti-Bot Bypass

Handles CAPTCHAs, consent screens, and detection evasion

4

Get Results

Receive clean HTML or structured data back

Parsing Google Results Properly

Regardless of how you fetch the HTML, you need to parse it. Google's DOM structure is deeply nested and changes without notice. Here is a more robust parser that handles multiple result types:

python
from bs4 import BeautifulSoup
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class OrganicResult:
    position: int
    title: str
    url: str
    snippet: str
    displayed_url: str = ""

@dataclass
class PeopleAlsoAsk:
    question: str
    snippet: Optional[str] = None

@dataclass
class SERPData:
    query: str
    organic: list[OrganicResult] = field(default_factory=list)
    people_also_ask: list[PeopleAlsoAsk] = field(default_factory=list)
    featured_snippet: Optional[str] = None
    total_results: Optional[str] = None

def parse_serp(html: str, query: str) -> SERPData:
    soup = BeautifulSoup(html, "html.parser")
    serp = SERPData(query=query)

    # Total results count
    stats = soup.select_one("#result-stats")
    if stats:
        serp.total_results = stats.text.strip()

    # Featured snippet
    featured = soup.select_one(".xpdopen .hgKElc")
    if featured:
        serp.featured_snippet = featured.text.strip()

    # Organic results
    position = 1
    for div in soup.select("div.tF2Cxc"):
        title_el = div.select_one("h3")
        link_el = div.select_one("a")
        snippet_el = div.select_one(".VwiC3b")
        cite_el = div.select_one("cite")

        if title_el and link_el:
            serp.organic.append(OrganicResult(
                position=position,
                title=title_el.text.strip(),
                url=link_el.get("href", ""),
                snippet=snippet_el.text.strip() if snippet_el else "",
                displayed_url=cite_el.text.strip() if cite_el else "",
            ))
            position += 1

    # People Also Ask
    for paa in soup.select(".related-question-pair"):
        question_el = paa.select_one(".CSkcDe")
        answer_el = paa.select_one(".wDYxhc")
        if question_el:
            serp.people_also_ask.append(PeopleAlsoAsk(
                question=question_el.text.strip(),
                snippet=answer_el.text.strip() if answer_el else None,
            ))

    return serp

Important note on selectors: Google uses obfuscated class names that change over time. Classes like tF2Cxc, VwiC3b, and CSkcDe are not stable identifiers. Production scrapers need a selector update mechanism, either manual monitoring or automated detection when parsing starts returning empty results.

Handling Pagination and Scale

Google paginates results using the start parameter. Page 2 is start=10, page 3 is start=20, and so on.

python
import time
import random

def scrape_multiple_pages(query, pages=3):
    all_results = []

    for page_num in range(pages):
        start = page_num * 10
        url = f"https://www.google.com/search?q={query}&num=10&start={start}&hl=en"

        # Fetch using your preferred method
        html = fetch_with_api(url)  # or playwright, or requests+proxy
        serp = parse_serp(html, query)
        all_results.extend(serp.organic)

        # Random delay between pages
        if page_num < pages - 1:
            time.sleep(2 + random.uniform(0, 3))

    return all_results

At scale (thousands of queries per day), you also need:

  • Query queuing: Spread requests over time to avoid burst patterns
  • Result caching: Same query within 24 hours can use cached results
  • Deduplication: Google sometimes returns the same URL at different positions across pages
  • Error classification: Distinguish between blocks (retry with different proxy), CAPTCHAs (solve or rotate), and genuine errors (skip)

Localized and Device-Specific Results

Google returns different results based on location and device. Control this with URL parameters:

python
# Location-specific results
params = {
    "q": "coffee shops",
    "gl": "us",       # Country (ISO 3166-1 alpha-2)
    "hl": "en",       # Language
    "uule": "w+CAIQICI...",  # Encoded location for city-level targeting
    "num": 10,
}

# Mobile results (use mobile User-Agent)
mobile_ua = "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1"

The gl parameter controls the country. For city-level targeting, you need the uule parameter, which is a base64-encoded location string. The format is documented in various SEO tool blogs, but it amounts to encoding the canonical name of the location from Google's geographic targeting database.

Structured Output: Skip the HTML Entirely

If you use a scraping API, you can request structured data formats instead of parsing raw HTML yourself.

python
import requests

response = requests.post(
    "https://alterlab.io/api/v1/scrape",
    headers={
        "X-API-Key": "your-api-key",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.google.com/search?q=best+crm+software&num=10",
        "render_js": True,
        "formats": ["json", "markdown"],
    },
)

data = response.json()
# Structured JSON with titles, URLs, snippets already extracted
# Plus markdown for LLM ingestion or documentation

This saves you from maintaining brittle CSS selectors. When Google changes their DOM structure, the API provider updates their parsers, not you.

Cost Comparison: DIY vs API

Here is what Google SERP scraping costs at 50,000 queries per month.

FeatureComponentDIY StackScraping API
Residential Proxies$500-800/moIncluded
Server (Browser Instances)$100-200/moIncluded
CAPTCHA Solving$50-150/moIncluded
Engineering Maintenance10-20 hrs/mo0 hrs
Total Cost$650-1150+~$250-500

The engineering time is the hidden cost. When Google updates their anti-bot detection, someone has to debug why success rates dropped from 95% to 40% overnight. That someone is you.

Common Pitfalls

Scraping too fast. Google correlates request timing. Sending 100 queries per minute from the same IP block gets every IP in that block flagged. Space requests 3-10 seconds apart minimum.

Ignoring Google's Terms of Service. Google's ToS prohibit automated access. Whether this is legally enforceable depends on your jurisdiction and use case. The hiQ vs LinkedIn ruling in the US established some precedent for scraping publicly available data. Consult a lawyer if your business depends on this.

Parsing only organic results. Modern SERPs are mostly features: knowledge panels, shopping carousels, video results, related searches. If you only parse the ten blue links, you miss what most users actually see and click on.

Not caching results. Google results for most queries change slowly. Caching results for 4-24 hours reduces your request volume and costs without losing data freshness for most use cases.

Hardcoding selectors. Google's CSS class names are machine-generated and change without notice. Build your parser to fail gracefully when selectors break, and add monitoring to detect when it happens.

When Each Method Makes Sense

FeatureMethodBest ForVolume Limit
Raw requestsQuick one-off tests~10-20 queries
Playwright + stealthSmall projects with fixed targets~100-500/day with proxies
Proxy rotation + browserMedium scale with engineering capacity~1K-10K/day
Scraping APIProduction workloads at any scaleUnlimited

Start with the simplest method that meets your needs. Move to the next level when you are spending more time maintaining infrastructure than using the data.

AlterLab handles Google SERP scraping with automatic proxy rotation, JS rendering, and anti-bot bypass. Pay per successful request. If a request fails, you do not pay for it.

Quick Reference

ParameterValuePurpose
qYour search queryThe search terms
num10, 20, 50, 100Results per page
start0, 10, 20...Pagination offset
hlen, es, fr, de...Interface language
glus, uk, de, in...Country for results
tbmnws, isch, vid, shopSearch type (news, images, video, shopping)
tbsqdr:d, qdr:w, qdr:mTime filter (day, week, month)

These parameters work in the URL regardless of your scraping method. Combine them to get exactly the SERP data your application needs.

Yash Dubey

Yash Dubey