Pricing Compare Playground Blog Docs Changelog

How to Scrape Google Search Results in 2026: Python, APIs, and What Actually Works

Google blocks most scraping attempts within a few requests. Here is what works for extracting SERP data at scale in 2026, from raw Python to headless browsers to scraping APIs.

Yash DubeyFebruary 17, 2026

10 min read

305 views

Google search results are one of the most valuable data sources on the internet. SEO tools, market research platforms, ad intelligence products, and AI training pipelines all depend on SERP data. Google does not make this easy.

Google runs some of the most aggressive anti-bot systems on the web. A basic Python script will get blocked after a handful of requests. Headless browsers last slightly longer before hitting CAPTCHAs. Proxy rotation helps, but Google fingerprints far more than your IP address.

Here is what actually works in 2026, with code you can run today.

What You Get From Google SERPs

Before writing any code, know what data is available in a Google results page:

Organic results: Title, URL, description snippet, position
Featured snippets: The answer box at the top
People Also Ask: Related questions with expandable answers
Knowledge panels: Entity information cards
Ads: Sponsored listings with advertiser info
Local pack: Map results with business details
Shopping results: Product cards with prices
Image and video carousels: Media results

Each result type has its own HTML structure. A scraper that only grabs the ten blue links misses most of the page.

10+SERP Feature Types

~60%Clicks Go to Top 3 Results

$50B+SEO Tools Market by 2027

Method 1: Raw HTTP Requests (The Naive Approach)

The simplest approach is sending a GET request to Google. It works for about 5 minutes.

Python

import requests
from bs4 import BeautifulSoup

def scrape_google(query):
    url = "https://www.google.com/search"
    params = {"q": query, "num": 10, "hl": "en"}
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
    }

    response = requests.get(url, params=params, headers=headers)

    if response.status_code != 200:
        print(f"Blocked: HTTP {response.status_code}")
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    results = []

    for div in soup.select("div.tF2Cxc"):
        title_el = div.select_one("h3")
        link_el = div.select_one("a")
        snippet_el = div.select_one(".VwiC3b")

        if title_el and link_el:
            results.append({
                "title": title_el.text,
                "url": link_el["href"],
                "snippet": snippet_el.text if snippet_el else "",
            })

    return results

results = scrape_google("best web scraping api")
for r in results:
    print(f"{r['title']}\n  {r['url']}\n")

This will return results for your first few queries. Then Google serves a CAPTCHA page, a 429 status code, or a consent form that blocks further requests.

Why it fails:

Google tracks request patterns across your IP
The User-Agent alone is not enough to look human
Missing cookies, TLS fingerprint, and JavaScript execution are all signals
Google's selectors (like div.tF2Cxc) change periodically, breaking your parser

This approach is fine for a one-off test. It is not viable for production use.

Method 2: Headless Browser with Playwright

Using a real browser solves the JavaScript execution problem and gives you a more realistic fingerprint.

Python

import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup

async def scrape_google_playwright(query):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = await context.new_page()

        await page.goto(
            f"https://www.google.com/search?q={query}&hl=en",
            wait_until="networkidle",
        )

        # Handle consent screen (common in EU)
        try:
            consent_btn = page.locator("button:has-text('Accept all')")
            if await consent_btn.is_visible(timeout=3000):
                await consent_btn.click()
                await page.wait_for_load_state("networkidle")
        except Exception:
            pass

        html = await page.content()
        await browser.close()

    soup = BeautifulSoup(html, "html.parser")
    results = []

    for div in soup.select("div.tF2Cxc"):
        title_el = div.select_one("h3")
        link_el = div.select_one("a")
        snippet_el = div.select_one(".VwiC3b")

        if title_el and link_el:
            results.append({
                "title": title_el.text,
                "url": link_el["href"],
                "snippet": snippet_el.text if snippet_el else "",
            })

    return results

results = asyncio.run(scrape_google_playwright("best web scraping api"))
for r in results:
    print(f"{r['title']}\n  {r['url']}\n")

Playwright gets you further than raw requests. Google sees a real Chromium browser executing JavaScript and rendering the page. But headless detection has gotten sophisticated in 2026.

Where Playwright breaks down:

Google detects headless Chromium through WebGL fingerprinting, navigator properties, and behavioral analysis
Each browser instance uses 200-400 MB of RAM, making scale expensive
CAPTCHAs still appear after 20-50 queries from the same IP
Consent screens, cookie banners, and localized results add parsing complexity

Method 3: Stealth Patches and Fingerprint Spoofing

You can make Playwright harder to detect by patching the browser fingerprint:

Python

import asyncio
from playwright.async_api import async_playwright

async def stealth_scrape(query):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--disable-features=IsolateOrigins,site-per-process",
            ],
        )
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
            timezone_id="America/New_York",
            geolocation={"longitude": -73.935242, "latitude": 40.730610},
            permissions=["geolocation"],
        )

        page = await context.new_page()

        # Patch navigator.webdriver
        await page.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {
                get: () => undefined
            });
            Object.defineProperty(navigator, 'languages', {
                get: () => ['en-US', 'en']
            });
            Object.defineProperty(navigator, 'plugins', {
                get: () => [1, 2, 3, 4, 5]
            });
        """)

        # Add random delays to look human
        await page.goto(f"https://www.google.com/search?q={query}&hl=en")
        await page.wait_for_timeout(1000 + int(asyncio.get_event_loop().time() * 1000) % 2000)

        html = await page.content()
        await browser.close()
        return html

This buys you more queries before detection, but it is a treadmill. Google updates their detection signatures regularly. The stealth patches that work today may not work next month.

Method 4: Proxy Rotation

Adding proxies distributes your requests across many IP addresses, which is necessary at any meaningful scale.

Python

import requests
from itertools import cycle

proxies = [
    "http://user:[email protected]:8080",
    "http://user:[email protected]:8080",
    "http://user:[email protected]:8080",
]

proxy_pool = cycle(proxies)

def scrape_with_proxy(query):
    proxy = next(proxy_pool)
    response = requests.get(
        "https://www.google.com/search",
        params={"q": query, "num": 10, "hl": "en"},
        headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
        },
        proxies={"http": proxy, "https": proxy},
        timeout=15,
    )
    return response

Proxy rotation gets more complex than this quickly. You need to handle:

Proxy health checks: Dead proxies waste time and quota
Geographic targeting: Google results vary by location
Proxy type selection: Datacenter proxies get flagged fast, residential proxies are expensive
Session management: Some queries need consistent IP across pagination
Cost tracking: Residential bandwidth is billed per GB

Feature	Proxy Type	Detection Rate	Cost per GB
Datacenter	High	~$1	Fast
Residential	Low	~$10-15	Medium
Mobile	Very Low	~$20-30	Variable
ISP (Static Residential)	Low	~$3-5	Fast

For Google specifically, residential proxies are the minimum viable option. Datacenter IPs get blocked within minutes.

Method 5: Scraping API

A scraping API handles the proxy rotation, browser rendering, CAPTCHA solving, and fingerprint management for you. You send a URL, you get back the HTML.

Python

import requests

def scrape_google_api(query):
    response = requests.post(
        "https://alterlab.io/api/v1/scrape",
        headers={
            "X-API-Key": "your-api-key",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.google.com/search?q={query}&num=10&hl=en",
            "render_js": True,
        },
    )

    data = response.json()
    return data.get("content", "")

Your parsing logic stays the same. The difference is that someone else manages the infrastructure that keeps requests from getting blocked.

Send Query

POST your Google search URL to the scraping API

Smart Routing

API selects optimal proxy, browser, and fingerprint

Anti-Bot Bypass

Handles CAPTCHAs, consent screens, and detection evasion

Get Results

Receive clean HTML or structured data back

Parsing Google Results Properly

Regardless of how you fetch the HTML, you need to parse it. Google's DOM structure is deeply nested and changes without notice. Here is a more robust parser that handles multiple result types:

Python

from bs4 import BeautifulSoup
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class OrganicResult:
    position: int
    title: str
    url: str
    snippet: str
    displayed_url: str = ""

@dataclass
class PeopleAlsoAsk:
    question: str
    snippet: Optional[str] = None

@dataclass
class SERPData:
    query: str
    organic: list[OrganicResult] = field(default_factory=list)
    people_also_ask: list[PeopleAlsoAsk] = field(default_factory=list)
    featured_snippet: Optional[str] = None
    total_results: Optional[str] = None

def parse_serp(html: str, query: str) -> SERPData:
    soup = BeautifulSoup(html, "html.parser")
    serp = SERPData(query=query)

    # Total results count
    stats = soup.select_one("#result-stats")
    if stats:
        serp.total_results = stats.text.strip()

    # Featured snippet
    featured = soup.select_one(".xpdopen .hgKElc")
    if featured:
        serp.featured_snippet = featured.text.strip()

    # Organic results
    position = 1
    for div in soup.select("div.tF2Cxc"):
        title_el = div.select_one("h3")
        link_el = div.select_one("a")
        snippet_el = div.select_one(".VwiC3b")
        cite_el = div.select_one("cite")

        if title_el and link_el:
            serp.organic.append(OrganicResult(
                position=position,
                title=title_el.text.strip(),
                url=link_el.get("href", ""),
                snippet=snippet_el.text.strip() if snippet_el else "",
                displayed_url=cite_el.text.strip() if cite_el else "",
            ))
            position += 1

    # People Also Ask
    for paa in soup.select(".related-question-pair"):
        question_el = paa.select_one(".CSkcDe")
        answer_el = paa.select_one(".wDYxhc")
        if question_el:
            serp.people_also_ask.append(PeopleAlsoAsk(
                question=question_el.text.strip(),
                snippet=answer_el.text.strip() if answer_el else None,
            ))

    return serp

Important note on selectors: Google uses obfuscated class names that change over time. Classes like tF2Cxc, VwiC3b, and CSkcDe are not stable identifiers. Production scrapers need a selector update mechanism, either manual monitoring or automated detection when parsing starts returning empty results.

Handling Pagination and Scale

Google paginates results using the start parameter. Page 2 is start=10, page 3 is start=20, and so on.

Python

import time
import random

def scrape_multiple_pages(query, pages=3):
    all_results = []

    for page_num in range(pages):
        start = page_num * 10
        url = f"https://www.google.com/search?q={query}&num=10&start={start}&hl=en"

        # Fetch using your preferred method
        html = fetch_with_api(url)  # or playwright, or requests+proxy
        serp = parse_serp(html, query)
        all_results.extend(serp.organic)

        # Random delay between pages
        if page_num < pages - 1:
            time.sleep(2 + random.uniform(0, 3))

    return all_results

At scale (thousands of queries per day), you also need:

Query queuing: Spread requests over time to avoid burst patterns
Result caching: Same query within 24 hours can use cached results
Deduplication: Google sometimes returns the same URL at different positions across pages
Error classification: Distinguish between blocks (retry with different proxy), CAPTCHAs (solve or rotate), and genuine errors (skip)

Localized and Device-Specific Results

Google returns different results based on location and device. Control this with URL parameters:

Python

# Location-specific results
params = {
    "q": "coffee shops",
    "gl": "us",       # Country (ISO 3166-1 alpha-2)
    "hl": "en",       # Language
    "uule": "w+CAIQICI...",  # Encoded location for city-level targeting
    "num": 10,
}

# Mobile results (use mobile User-Agent)
mobile_ua = "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1"

The gl parameter controls the country. For city-level targeting, you need the uule parameter, which is a base64-encoded location string. The format is documented in various SEO tool blogs, but it amounts to encoding the canonical name of the location from Google's geographic targeting database.

Structured Output: Skip the HTML Entirely

If you use a scraping API, you can request structured data formats instead of parsing raw HTML yourself.

Python

import requests

response = requests.post(
    "https://alterlab.io/api/v1/scrape",
    headers={
        "X-API-Key": "your-api-key",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.google.com/search?q=best+crm+software&num=10",
        "render_js": True,
        "formats": ["json", "markdown"],
    },
)

data = response.json()
# Structured JSON with titles, URLs, snippets already extracted
# Plus markdown for LLM ingestion or documentation

This saves you from maintaining brittle CSS selectors. When Google changes their DOM structure, the API provider updates their parsers, not you.

Cost Comparison: DIY vs API

Here is what Google SERP scraping costs at 50,000 queries per month.

Feature	Component	DIY Stack	Scraping API
Residential Proxies	$500-800/mo	Included
Server (Browser Instances)	$100-200/mo	Included
CAPTCHA Solving	$50-150/mo	Included
Engineering Maintenance	10-20 hrs/mo	0 hrs
Total Cost	$650-1	150+	~$250-500

The engineering time is the hidden cost. When Google updates their anti-bot detection, someone has to debug why success rates dropped from 95% to 40% overnight. That someone is you.

Common Pitfalls

Scraping too fast. Google correlates request timing. Sending 100 queries per minute from the same IP block gets every IP in that block flagged. Space requests 3-10 seconds apart minimum.

Ignoring Google's Terms of Service. Google's ToS prohibit automated access. Whether this is legally enforceable depends on your jurisdiction and use case. The hiQ vs LinkedIn ruling in the US established some precedent for scraping publicly available data. Consult a lawyer if your business depends on this.

Parsing only organic results. Modern SERPs are mostly features: knowledge panels, shopping carousels, video results, related searches. If you only parse the ten blue links, you miss what most users actually see and click on.

Not caching results. Google results for most queries change slowly. Caching results for 4-24 hours reduces your request volume and costs without losing data freshness for most use cases.

Hardcoding selectors. Google's CSS class names are machine-generated and change without notice. Build your parser to fail gracefully when selectors break, and add monitoring to detect when it happens.

When Each Method Makes Sense

Feature	Method	Best For
Raw requests	Quick one-off tests	~10-20 queries
Playwright + stealth	Small projects with fixed targets	~100-500/day with proxies
Proxy rotation + browser	Medium scale with engineering capacity	~1K-10K/day
Scraping API	Production workloads at any scale	Unlimited

Start with the simplest method that meets your needs. Move to the next level when you are spending more time maintaining infrastructure than using the data.

AlterLab handles Google SERP scraping with automatic proxy rotation, JS rendering, and anti-bot bypass. Pay per successful request. If a request fails, you do not pay for it.

Quick Reference

Parameter	Value	Purpose
`q`	Your search query	The search terms
`num`	10, 20, 50, 100	Results per page
`start`	0, 10, 20...	Pagination offset
`hl`	en, es, fr, de...	Interface language
`gl`	us, uk, de, in...	Country for results
`tbm`	nws, isch, vid, shop	Search type (news, images, video, shopping)
`tbs`	qdr:d, qdr:w, qdr:m	Time filter (day, week, month)

These parameters work in the URL regardless of your scraping method. Combine them to get exactly the SERP data your application needs.

Was this article helpful?

Try it yourself

Extract Google search results

Get structured SERP data with automatic website compatibility built in.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://google.com/search?q=web+scraping"}'

No credit card required · 5,000 free requests

Yash Dubey

View all posts

Tutorials

Handling Infinite Scroll & Pagination in Headless Browsers

Learn how to reliably handle infinite scroll, cursor-based pagination, and dynamic rendering for autonomous AI web scraping agents using headless browsers.

Herald Blog Service

Jun 13, 2026

Tutorials

Playwright Network Interception Guide for AI Data Extraction

Learn how to intercept and block network requests in Playwright to accelerate AI agent data extraction, reduce bandwidth, and capture raw API JSON payloads.

Herald Blog Service

Jun 13, 2026

13m

Tutorials

Building an Autonomous CrewAI Web Scraping Tool for JSON Extraction

Learn how to build a custom CrewAI tool that autonomously scrapes dynamic websites and returns structured JSON using a headless browser API.

Herald Blog Service

Jun 12, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

What You Get From Google SERPs

Method 1: Raw HTTP Requests (The Naive Approach)

Method 2: Headless Browser with Playwright

Method 3: Stealth Patches and Fingerprint Spoofing

Method 4: Proxy Rotation

Method 5: Scraping API

Send Query

Smart Routing

Anti-Bot Bypass

Get Results

Parsing Google Results Properly

Handling Pagination and Scale

Localized and Device-Specific Results

Structured Output: Skip the HTML Entirely

Cost Comparison: DIY vs API

Common Pitfalls

When Each Method Makes Sense

Quick Reference

Related Articles

Handling Infinite Scroll & Pagination in Headless Browsers

Playwright Network Interception Guide for AI Data Extraction

Building an Autonomous CrewAI Web Scraping Tool for JSON Extraction

Popular Posts

Why Your Headless Browser Gets Detected (and How to Fix It)

Best Web Scraping APIs in 2026: Complete Comparison Guide

Playwright Bot Detection: What Actually Works in 2026

How to Scrape Twitter/X: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape Amazon in 2026: Engineering Guide

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Indeed: Complete Guide for 2026

How to Scrape Twitter/X Data: Complete Guide for 2026

Newsletter

Recommended Reading

How to Scrape Amazon in 2026: Engineering Guide

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Indeed: Complete Guide for 2026

How to Scrape Twitter/X Data: Complete Guide for 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources