
Scrape Google Search Results Without Getting Blocked (2026)
Google's bot defenses have hardened in 2026. Learn detection signals, bypass techniques, and production-ready Python code to scrape SERPs reliably at scale.
March 27, 2026
Google's SERP scraping fails at the proxy, protocol, and header layers simultaneously. The fix: residential proxies + TLS fingerprint impersonation + browser-consistent headers. Everything else is implementation detail.
Most scrapers return a CAPTCHA page — or worse, silently return one and parse zero results without logging the failure. This post explains exactly which detection layers Google operates, how to defeat each one, and how to build a parser that holds up across Google's class name rotations.
Why Google Blocks Most Scrapers Immediately
Google's bot detection is not a single check — it's five concurrent scoring signals evaluated before any HTML is served. Address all five or expect consistent failures.
Layer 1 — IP Reputation
Every datacenter ASN is pre-flagged. AWS (54.x.x.x), GCP (34.x.x.x), Azure, Hetzner, DigitalOcean, Vultr — all scored as high-bot-probability before your request is processed. Rotating 10,000 datacenter IPs does not help; the entire ASN range carries the penalty. Even clean residential IPs get scored for velocity: more than 20–30 Google requests per hour from a single IP triggers rate scoring.
Layer 2 — TLS Fingerprinting
The TLS ClientHello exposes your HTTP client before a single application-layer byte is read. Python's requests (backed by urllib3) produces a distinct cipher suite order and extension set — different from Chrome, different from curl, identifiable in under a millisecond. Google scores this fingerprint independently of your User-Agent header.
Layer 3 — HTTP/2 Fingerprinting
Chrome negotiates HTTP/2 with specific SETTINGS frames (HEADER_TABLE_SIZE, MAX_CONCURRENT_STREAMS, INITIAL_WINDOW_SIZE) and HEADERS priority values. httpx, aiohttp, and raw h2 all produce different SETTINGS sequences than Chrome. Google captures this fingerprint alongside TLS.
Layer 4 — JavaScript / Browser Fingerprint
For persistent challenge scenarios, injected JavaScript reads navigator.webdriver (set true by default in headless Chrome), canvas entropy, WebGL renderer string, and plugin enumeration. Missing or spoofed values elevate CAPTCHA probability.
Layer 5 — Behavioral Signals
Uniform request intervals (fixed time.sleep(2)), missing referrer headers on paginated requests, and zero dwell time between sequential page fetches are all behavioral anomalies that compound the bot score over a session.
Layer 2 Fix: TLS and HTTP/2 Fingerprint Impersonation
The curl_cffi library links against a patched libcurl that reproduces Chrome's exact TLS cipher suite order, extension list, and HTTP/2 SETTINGS frames. It's the most reliable open-source solution to protocol-level fingerprinting.
from curl_cffi import requests as cffi_requests
# impersonate="chrome120" patches TLS ClientHello + HTTP/2 SETTINGS
session = cffi_requests.Session(impersonate="chrome120")
params = {
"q": "web scraping api 2026",
"hl": "en",
"gl": "us",
"num": "10",
}
response = session.get(
"https://www.google.com/search",
params=params,
proxies={"https": "http://user:[email protected]:8080"},
timeout=15,
)
print(response.status_code) # 200 means fingerprint passed
print(len(response.text)) # Verify HTML length — CAPTCHA pages are shortcurl_cffi versions track Chrome releases. Pin to a specific version in your requirements.txt and update after major Chrome bumps — Google begins scoring outdated fingerprints within weeks of a new Chrome stable release.
Layer 3 Fix: Header Consistency
A Chrome 120 TLS fingerprint paired with User-Agent: python-requests/2.31.0 is an immediate contradiction. Every header must match the impersonated browser version.
CHROME_120_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "same-origin",
"Sec-CH-UA": '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
"Sec-CH-UA-Mobile": "?0",
"Sec-CH-UA-Platform": '"Windows"',
"DNT": "1",
"Upgrade-Insecure-Requests": "1",
}Sec-Fetch-* headers have been standard in Chrome since version 80. Their absence is a strong non-browser signal. Sec-CH-UA-* values must match the version in your User-Agent string exactly — a mismatch (Chrome/120 UA with Sec-CH-UA: ...Chromium;v="119") is scored as a fingerprint inconsistency.
Using a Managed API for Production Scale
Building and maintaining this stack — proxy rotation, TLS impersonation, CAPTCHA solving, header consistency — requires ongoing engineering investment as Google evolves its detection. When a new Chrome version ships, your fingerprint silently starts failing until you update curl_cffi and re-validate headers.
For production pipelines, the anti-bot bypass API handles all of this transparently. You send a URL; it manages proxy selection, fingerprint matching, and JavaScript challenges.
Try scraping this Google SERP with AlterLab's anti-bot bypass
Python SDK
The Python scraping API ships a batteries-included client that covers the common SERP workflow:
import alterlab
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
url="https://www.google.com/search",
params={
"q": "best web scraping API 2026",
"hl": "en",
"gl": "us",
"num": "10",
},
render_js=False, # set True for JS-rendered content (slower, costs more)
country="us",
)
soup = BeautifulSoup(response.html, "html.parser")
results = []
for g in soup.select("div.g"):
title_el = g.select_one("h3")
link_el = g.select_one("a[href]")
# VwiC3b is the primary snippet class; data-sncf is the fallback attribute
snippet_el = g.select_one(".VwiC3b") or g.select_one("div[data-sncf]")
if title_el and link_el:
results.append({
"title": title_el.get_text(strip=True),
"url": link_el["href"],
"snippet": snippet_el.get_text(strip=True) if snippet_el else "",
})
print(f"Extracted {len(results)} organic results")cURL Equivalent
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.google.com/search?q=web+scraping+api+2026&hl=en&gl=us&num=10",
"render_js": false,
"country": "us"
}'Parsing SERP HTML Reliably
Google's class names rotate on an irregular cadence. Hard-coding .LC20lb as your title selector will break without warning. Use h3 inside div.g (structural selectors) as your primary strategy, with class-based selectors as a fast path and attribute selectors as fallback.
Google wraps organic result URLs in redirect links (/url?q=https://...). Always unwrap them:
from urllib.parse import urlparse, parse_qs
def unwrap_google_url(href: str) -> str:
"""Extract the real target URL from a Google redirect href."""
if href.startswith("/url"):
params = parse_qs(urlparse(href).query)
return params.get("q", [href])[0]
# Newer SERP format: direct URLs without redirect wrapper
return hrefHandling Pagination
Paginate via the start parameter. Page 1 is start=0, page 2 is start=10 (when num=10). Always set a Referer header on pages 2+ — a direct hit on page 5 with no referrer is an anomaly signal.
import time
import random
import alterlab
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
def scrape_serp_pages(query: str, pages: int = 5) -> list[dict]:
results = []
for page in range(pages):
start = page * 10
response = client.scrape(
url="https://www.google.com/search",
params={
"q": query,
"start": str(start),
"num": "10",
"hl": "en",
"gl": "us",
},
country="us",
)
html = response.html
if not _is_valid_serp(html):
print(f"[WARN] Page {page + 1} returned a challenge page — skipping")
continue
soup = BeautifulSoup(html, "html.parser")
for g in soup.select("div.g"):
title_el = g.select_one("h3")
link_el = g.select_one("a[href]")
if title_el and link_el:
results.append({
"title": title_el.get_text(strip=True),
"url": link_el["href"],
"page": page + 1,
})
# Jitter delay: uniform fixed intervals are a bot signal
if page < pages - 1:
time.sleep(random.uniform(2.0, 5.0))
return results
def _is_valid_serp(html: str) -> bool:
challenge_strings = [
"Our systems have detected unusual traffic",
"www.google.com/recaptcha",
"/sorry/index",
]
return not any(s in html for s in challenge_strings)DIY Stack vs Managed API
Common Mistakes That Get You Blocked
Datacenter IPs. The entire ASN range is pre-scored. No amount of fingerprint tuning recovers from a 34.x.x.x source IP for Google requests.
Reusing proxies too frequently. Even residential IPs have velocity ceilings. Rotate per request, and distribute across geographies to avoid single-IP velocity scoring.
Missing Sec-Fetch-* headers. These have been standard in Chrome since v80. A request without them did not come from a real browser — full stop.
Fixed sleep intervals. time.sleep(2) repeated identically across every request is a bot pattern. Use random.uniform(lower, upper) in a human-realistic range (2–8 seconds for SERP-level pacing).
No referrer on paginated requests. Page 2+ requests from a real user always carry Referer: https://www.google.com/search?q=.... Direct hits on deep pages with no referrer compound the anomaly score.
Parsing without response validation. CAPTCHA pages return HTTP 200. Your BeautifulSoup parser will run against them and return zero results silently. Always call a validation function before parsing, and log the raw HTML on zero-result responses.
Takeaways
- Datacenter IPs are a dead end for Google. Residential or mobile proxies are required from the first request.
- TLS and HTTP/2 fingerprinting catches most scripted clients. Use
curl_cffiwithimpersonate="chrome120"or a managed API that handles this at the infrastructure level. - All
Sec-Fetch-*andSec-CH-UA-*headers must be internally consistent with your User-Agent. Mismatches are scored as synthetic traffic signals. - Jitter every delay. Replace any
time.sleep(N)constant withrandom.uniform(min, max). - Validate before parsing. CAPTCHA pages return 200 — check the response body for challenge strings before running your parser.
- Build SERP selectors defensively. Prioritize structural selectors (
h3,div.g) over volatile class names. Implement fallback chains and log failures.
To get a working API key and run your first SERP request in minutes, follow the quickstart guide. AlterLab's pay-as-you-go pricing means there's no minimum commitment while you validate your pipeline.
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.


