How to Bypass Cloudflare Bot Protection When Web Scraping
Cloudflare blocks most scraping tools by default. Here is what actually works in 2026 to get past their bot detection without getting your IP banned.
Yash Dubey
February 8, 2026
Cloudflare sits in front of roughly 20% of the web. If you are scraping at any real scale, you will hit their bot detection. Most guides tell you to "just use headers" or "rotate user agents." That stopped working years ago.
Here is what actually works right now.
Why Cloudflare Blocks Your Scraper
Cloudflare runs multiple detection layers. Understanding them saves you from wasting time on fixes that do not address the real problem.
TLS fingerprinting. Every HTTP client has a unique TLS handshake signature. Python requests, Go net/http, Node axios - they all look different from real browsers. Cloudflare checks this before your request even reaches the server.
JavaScript challenges. Cloudflare injects JS that checks for browser APIs, canvas rendering, WebGL, and other signals that headless browsers often miss or implement incorrectly.
Behavioral analysis. Request timing, mouse movements, scroll patterns. A script that fires 100 requests per second from the same IP with identical timing is not subtle.
IP reputation. Datacenter IPs are flagged by default. Residential and mobile IPs get much more leeway.
The Approaches That Work
1. Patched Browsers
Tools like puppeteer-extra with the stealth plugin patch many of the fingerprint leaks in headless Chrome. The key patches:
- Override
navigator.webdriverto return false - Fix the Chrome runtime signatures
- Patch
chrome.cdcmarkers that ChromeDriver injects - Emulate proper plugin and language arrays
# Using Playwright with stealth patches
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(
headless=False, # headed mode passes more checks
args=[
"--disable-blink-features=AutomationControlled",
]
)
context = browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
page = context.new_page()
page.goto("https://target-site.com")The problem: Cloudflare updates their detection regularly. Stealth plugins lag behind. You end up in a maintenance cycle patching one leak after another.
2. Residential Proxies
Switching from datacenter to residential IPs solves the IP reputation problem immediately. Services like Bright Data, Oxylabs, and IPRoyal sell access to residential proxy pools.
The economics get rough fast. Residential bandwidth costs $8-15 per GB depending on the provider. A typical product page is 2-5 MB with all assets loaded. At scale, you are looking at serious costs just for the proxy layer.
3. Browser Farms
Running real browsers at scale through services like Browserless or your own infrastructure. Full Chrome instances with real rendering, proper TLS stacks, the whole thing.
This works well but costs $50-200/month minimum for the compute. You also need to manage sessions, handle crashes, and deal with memory leaks from long-running browser instances.
4. Scraping APIs
This is the approach that makes the most sense for most teams. Instead of assembling the proxy layer, browser management, CAPTCHA solving, and fingerprint patching yourself, you send a URL and get back HTML or structured data.
Services like AlterLab, ScraperAPI, and ScrapingBee handle the Cloudflare bypass internally. The key differences between them are pricing model, success rates on hard targets, and whether they support JS rendering.
AlterLab uses a tiered approach - light scrapes on static pages cost less, while JS-rendered pages with anti-bot bypass cost more. You only pay for the complexity your target actually requires.
What Does Not Work Anymore
Just setting headers. Cloudflare stopped relying on User-Agent strings alone around 2022. Headers are necessary but not sufficient.
curl with impersonate flags. curl-impersonate was great for a while. Cloudflare has adapted to detect its specific TLS patterns.
Headless Chrome with default settings. The navigator.webdriver flag is the most basic check. Cloudflare runs dozens more.
Free proxy lists. Those IPs are already burned. Every scraping bot on the internet has used them.
Pick Your Tradeoff
Every approach trades off between cost, reliability, and maintenance burden.
If you scrape fewer than 10K pages per month, a patched browser with residential proxies works fine. The maintenance is manageable at that scale.
If you scrape more than that, the math starts favoring a scraping API. The time you spend maintaining proxy rotation, browser patching, and CAPTCHA solving infrastructure is time not spent on whatever you are actually building.
The key metric is cost per successful request. Factor in your engineering time, not just the API or proxy bill.
Testing Your Setup
Before running any scraper at scale, verify against known Cloudflare-protected sites:
# Quick check if your approach works
curl -s -o /dev/null -w "%{http_code}" https://www.target-site.com
# 200 = success
# 403 = blocked
# 503 = Cloudflare challenge pageIf you are getting 403s or challenge pages, your fingerprint is getting caught. Go back and check your TLS stack first - that is where most detection starts.