How to Scrape Google Search Results in 2026: Python, APIs, and What Actually Works
Google blocks most scraping attempts within a few requests. Here is what works for extracting SERP data at scale in 2026, from raw Python to headless browsers to scraping APIs.
Yash Dubey
February 17, 2026
Google search results are one of the most valuable data sources on the internet. SEO tools, market research platforms, ad intelligence products, and AI training pipelines all depend on SERP data. Google does not make this easy.
Google runs some of the most aggressive anti-bot systems on the web. A basic Python script will get blocked after a handful of requests. Headless browsers last slightly longer before hitting CAPTCHAs. Proxy rotation helps, but Google fingerprints far more than your IP address.
Here is what actually works in 2026, with code you can run today.
What You Get From Google SERPs
Before writing any code, know what data is available in a Google results page:
- Organic results: Title, URL, description snippet, position
- Featured snippets: The answer box at the top
- People Also Ask: Related questions with expandable answers
- Knowledge panels: Entity information cards
- Ads: Sponsored listings with advertiser info
- Local pack: Map results with business details
- Shopping results: Product cards with prices
- Image and video carousels: Media results
Each result type has its own HTML structure. A scraper that only grabs the ten blue links misses most of the page.
Method 1: Raw HTTP Requests (The Naive Approach)
The simplest approach is sending a GET request to Google. It works for about 5 minutes.
import requests
from bs4 import BeautifulSoup
def scrape_google(query):
url = "https://www.google.com/search"
params = {"q": query, "num": 10, "hl": "en"}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
}
response = requests.get(url, params=params, headers=headers)
if response.status_code != 200:
print(f"Blocked: HTTP {response.status_code}")
return []
soup = BeautifulSoup(response.text, "html.parser")
results = []
for div in soup.select("div.tF2Cxc"):
title_el = div.select_one("h3")
link_el = div.select_one("a")
snippet_el = div.select_one(".VwiC3b")
if title_el and link_el:
results.append({
"title": title_el.text,
"url": link_el["href"],
"snippet": snippet_el.text if snippet_el else "",
})
return results
results = scrape_google("best web scraping api")
for r in results:
print(f"{r['title']}\n {r['url']}\n")This will return results for your first few queries. Then Google serves a CAPTCHA page, a 429 status code, or a consent form that blocks further requests.
Why it fails:
- Google tracks request patterns across your IP
- The User-Agent alone is not enough to look human
- Missing cookies, TLS fingerprint, and JavaScript execution are all signals
- Google's selectors (like
div.tF2Cxc) change periodically, breaking your parser
This approach is fine for a one-off test. It is not viable for production use.
Method 2: Headless Browser with Playwright
Using a real browser solves the JavaScript execution problem and gives you a more realistic fingerprint.
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
async def scrape_google_playwright(query):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
viewport={"width": 1920, "height": 1080},
locale="en-US",
)
page = await context.new_page()
await page.goto(
f"https://www.google.com/search?q={query}&hl=en",
wait_until="networkidle",
)
# Handle consent screen (common in EU)
try:
consent_btn = page.locator("button:has-text('Accept all')")
if await consent_btn.is_visible(timeout=3000):
await consent_btn.click()
await page.wait_for_load_state("networkidle")
except Exception:
pass
html = await page.content()
await browser.close()
soup = BeautifulSoup(html, "html.parser")
results = []
for div in soup.select("div.tF2Cxc"):
title_el = div.select_one("h3")
link_el = div.select_one("a")
snippet_el = div.select_one(".VwiC3b")
if title_el and link_el:
results.append({
"title": title_el.text,
"url": link_el["href"],
"snippet": snippet_el.text if snippet_el else "",
})
return results
results = asyncio.run(scrape_google_playwright("best web scraping api"))
for r in results:
print(f"{r['title']}\n {r['url']}\n")Playwright gets you further than raw requests. Google sees a real Chromium browser executing JavaScript and rendering the page. But headless detection has gotten sophisticated in 2026.
Where Playwright breaks down:
- Google detects headless Chromium through WebGL fingerprinting, navigator properties, and behavioral analysis
- Each browser instance uses 200-400 MB of RAM, making scale expensive
- CAPTCHAs still appear after 20-50 queries from the same IP
- Consent screens, cookie banners, and localized results add parsing complexity
Method 3: Stealth Patches and Fingerprint Spoofing
You can make Playwright harder to detect by patching the browser fingerprint:
import asyncio
from playwright.async_api import async_playwright
async def stealth_scrape(query):
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
],
)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
viewport={"width": 1920, "height": 1080},
locale="en-US",
timezone_id="America/New_York",
geolocation={"longitude": -73.935242, "latitude": 40.730610},
permissions=["geolocation"],
)
page = await context.new_page()
# Patch navigator.webdriver
await page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5]
});
""")
# Add random delays to look human
await page.goto(f"https://www.google.com/search?q={query}&hl=en")
await page.wait_for_timeout(1000 + int(asyncio.get_event_loop().time() * 1000) % 2000)
html = await page.content()
await browser.close()
return htmlThis buys you more queries before detection, but it is a treadmill. Google updates their detection signatures regularly. The stealth patches that work today may not work next month.
Method 4: Proxy Rotation
Adding proxies distributes your requests across many IP addresses, which is necessary at any meaningful scale.
import requests
from itertools import cycle
proxies = [
"http://user:[email protected]:8080",
"http://user:[email protected]:8080",
"http://user:[email protected]:8080",
]
proxy_pool = cycle(proxies)
def scrape_with_proxy(query):
proxy = next(proxy_pool)
response = requests.get(
"https://www.google.com/search",
params={"q": query, "num": 10, "hl": "en"},
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
},
proxies={"http": proxy, "https": proxy},
timeout=15,
)
return responseProxy rotation gets more complex than this quickly. You need to handle:
- Proxy health checks: Dead proxies waste time and quota
- Geographic targeting: Google results vary by location
- Proxy type selection: Datacenter proxies get flagged fast, residential proxies are expensive
- Session management: Some queries need consistent IP across pagination
- Cost tracking: Residential bandwidth is billed per GB
| Feature | Proxy Type | Detection Rate | Cost per GB | Speed |
|---|---|---|---|---|
| Datacenter | High | ~$1 | Fast | |
| Residential | Low | ~$10-15 | Medium | |
| Mobile | Very Low | ~$20-30 | Variable | |
| ISP (Static Residential) | Low | ~$3-5 | Fast |
For Google specifically, residential proxies are the minimum viable option. Datacenter IPs get blocked within minutes.
Method 5: Scraping API
A scraping API handles the proxy rotation, browser rendering, CAPTCHA solving, and fingerprint management for you. You send a URL, you get back the HTML.
import requests
def scrape_google_api(query):
response = requests.post(
"https://alterlab.io/api/v1/scrape",
headers={
"X-API-Key": "your-api-key",
"Content-Type": "application/json",
},
json={
"url": f"https://www.google.com/search?q={query}&num=10&hl=en",
"render_js": True,
},
)
data = response.json()
return data.get("content", "")Your parsing logic stays the same. The difference is that someone else manages the infrastructure that keeps requests from getting blocked.
Send Query
POST your Google search URL to the scraping API
Smart Routing
API selects optimal proxy, browser, and fingerprint
Anti-Bot Bypass
Handles CAPTCHAs, consent screens, and detection evasion
Get Results
Receive clean HTML or structured data back
Parsing Google Results Properly
Regardless of how you fetch the HTML, you need to parse it. Google's DOM structure is deeply nested and changes without notice. Here is a more robust parser that handles multiple result types:
from bs4 import BeautifulSoup
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class OrganicResult:
position: int
title: str
url: str
snippet: str
displayed_url: str = ""
@dataclass
class PeopleAlsoAsk:
question: str
snippet: Optional[str] = None
@dataclass
class SERPData:
query: str
organic: list[OrganicResult] = field(default_factory=list)
people_also_ask: list[PeopleAlsoAsk] = field(default_factory=list)
featured_snippet: Optional[str] = None
total_results: Optional[str] = None
def parse_serp(html: str, query: str) -> SERPData:
soup = BeautifulSoup(html, "html.parser")
serp = SERPData(query=query)
# Total results count
stats = soup.select_one("#result-stats")
if stats:
serp.total_results = stats.text.strip()
# Featured snippet
featured = soup.select_one(".xpdopen .hgKElc")
if featured:
serp.featured_snippet = featured.text.strip()
# Organic results
position = 1
for div in soup.select("div.tF2Cxc"):
title_el = div.select_one("h3")
link_el = div.select_one("a")
snippet_el = div.select_one(".VwiC3b")
cite_el = div.select_one("cite")
if title_el and link_el:
serp.organic.append(OrganicResult(
position=position,
title=title_el.text.strip(),
url=link_el.get("href", ""),
snippet=snippet_el.text.strip() if snippet_el else "",
displayed_url=cite_el.text.strip() if cite_el else "",
))
position += 1
# People Also Ask
for paa in soup.select(".related-question-pair"):
question_el = paa.select_one(".CSkcDe")
answer_el = paa.select_one(".wDYxhc")
if question_el:
serp.people_also_ask.append(PeopleAlsoAsk(
question=question_el.text.strip(),
snippet=answer_el.text.strip() if answer_el else None,
))
return serpImportant note on selectors: Google uses obfuscated class names that change over time. Classes like tF2Cxc, VwiC3b, and CSkcDe are not stable identifiers. Production scrapers need a selector update mechanism, either manual monitoring or automated detection when parsing starts returning empty results.
Handling Pagination and Scale
Google paginates results using the start parameter. Page 2 is start=10, page 3 is start=20, and so on.
import time
import random
def scrape_multiple_pages(query, pages=3):
all_results = []
for page_num in range(pages):
start = page_num * 10
url = f"https://www.google.com/search?q={query}&num=10&start={start}&hl=en"
# Fetch using your preferred method
html = fetch_with_api(url) # or playwright, or requests+proxy
serp = parse_serp(html, query)
all_results.extend(serp.organic)
# Random delay between pages
if page_num < pages - 1:
time.sleep(2 + random.uniform(0, 3))
return all_resultsAt scale (thousands of queries per day), you also need:
- Query queuing: Spread requests over time to avoid burst patterns
- Result caching: Same query within 24 hours can use cached results
- Deduplication: Google sometimes returns the same URL at different positions across pages
- Error classification: Distinguish between blocks (retry with different proxy), CAPTCHAs (solve or rotate), and genuine errors (skip)
Localized and Device-Specific Results
Google returns different results based on location and device. Control this with URL parameters:
# Location-specific results
params = {
"q": "coffee shops",
"gl": "us", # Country (ISO 3166-1 alpha-2)
"hl": "en", # Language
"uule": "w+CAIQICI...", # Encoded location for city-level targeting
"num": 10,
}
# Mobile results (use mobile User-Agent)
mobile_ua = "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1"The gl parameter controls the country. For city-level targeting, you need the uule parameter, which is a base64-encoded location string. The format is documented in various SEO tool blogs, but it amounts to encoding the canonical name of the location from Google's geographic targeting database.
Structured Output: Skip the HTML Entirely
If you use a scraping API, you can request structured data formats instead of parsing raw HTML yourself.
import requests
response = requests.post(
"https://alterlab.io/api/v1/scrape",
headers={
"X-API-Key": "your-api-key",
"Content-Type": "application/json",
},
json={
"url": "https://www.google.com/search?q=best+crm+software&num=10",
"render_js": True,
"formats": ["json", "markdown"],
},
)
data = response.json()
# Structured JSON with titles, URLs, snippets already extracted
# Plus markdown for LLM ingestion or documentationThis saves you from maintaining brittle CSS selectors. When Google changes their DOM structure, the API provider updates their parsers, not you.
Cost Comparison: DIY vs API
Here is what Google SERP scraping costs at 50,000 queries per month.
| Feature | Component | DIY Stack | Scraping API |
|---|---|---|---|
| Residential Proxies | $500-800/mo | Included | |
| Server (Browser Instances) | $100-200/mo | Included | |
| CAPTCHA Solving | $50-150/mo | Included | |
| Engineering Maintenance | 10-20 hrs/mo | 0 hrs | |
| Total Cost | $650-1 | 150+ | ~$250-500 |
The engineering time is the hidden cost. When Google updates their anti-bot detection, someone has to debug why success rates dropped from 95% to 40% overnight. That someone is you.
Common Pitfalls
Scraping too fast. Google correlates request timing. Sending 100 queries per minute from the same IP block gets every IP in that block flagged. Space requests 3-10 seconds apart minimum.
Ignoring Google's Terms of Service. Google's ToS prohibit automated access. Whether this is legally enforceable depends on your jurisdiction and use case. The hiQ vs LinkedIn ruling in the US established some precedent for scraping publicly available data. Consult a lawyer if your business depends on this.
Parsing only organic results. Modern SERPs are mostly features: knowledge panels, shopping carousels, video results, related searches. If you only parse the ten blue links, you miss what most users actually see and click on.
Not caching results. Google results for most queries change slowly. Caching results for 4-24 hours reduces your request volume and costs without losing data freshness for most use cases.
Hardcoding selectors. Google's CSS class names are machine-generated and change without notice. Build your parser to fail gracefully when selectors break, and add monitoring to detect when it happens.
When Each Method Makes Sense
| Feature | Method | Best For | Volume Limit |
|---|---|---|---|
| Raw requests | Quick one-off tests | ~10-20 queries | |
| Playwright + stealth | Small projects with fixed targets | ~100-500/day with proxies | |
| Proxy rotation + browser | Medium scale with engineering capacity | ~1K-10K/day | |
| Scraping API | Production workloads at any scale | Unlimited |
Start with the simplest method that meets your needs. Move to the next level when you are spending more time maintaining infrastructure than using the data.
AlterLab handles Google SERP scraping with automatic proxy rotation, JS rendering, and anti-bot bypass. Pay per successful request. If a request fails, you do not pay for it.
Quick Reference
| Parameter | Value | Purpose |
|---|---|---|
q | Your search query | The search terms |
num | 10, 20, 50, 100 | Results per page |
start | 0, 10, 20... | Pagination offset |
hl | en, es, fr, de... | Interface language |
gl | us, uk, de, in... | Country for results |
tbm | nws, isch, vid, shop | Search type (news, images, video, shopping) |
tbs | qdr:d, qdr:w, qdr:m | Time filter (day, week, month) |
These parameters work in the URL regardless of your scraping method. Combine them to get exactly the SERP data your application needs.