Pricing Compare Playground Blog Docs Changelog

How to Scrape Cloudflare-Protected Sites in 2026

Cloudflare blocks most scraping tools by default. Here is what actually works in 2026 to get past their bot detection without getting your IP banned.

Yash DubeyFebruary 8, 2026

8 min read

3,370 views

Python

Anti-Bot

Cloudflare sits in front of roughly 20% of the web. If you are scraping at any real scale, you will hit their bot detection. Most guides tell you to "just use headers" or "rotate user agents." That stopped working years ago.

Here is what actually works right now.

Why Cloudflare Blocks Your Scraper

Cloudflare runs multiple detection layers. Understanding them saves you from wasting time on fixes that do not address the real problem.

TLS fingerprinting. Every HTTP client has a unique TLS handshake signature. Python requests, Go net/http, Node axios - they all look different from real browsers. Cloudflare checks this before your request even reaches the server.

JavaScript challenges. Cloudflare injects JS that checks for browser APIs, canvas rendering, WebGL, and other signals that headless browsers often miss or implement incorrectly.

Behavioral analysis. Request timing, mouse movements, scroll patterns. A script that fires 100 requests per second from the same IP with identical timing is not subtle.

IP reputation. Datacenter IPs are flagged by default. Residential and mobile IPs get much more leeway.

The Approaches That Work

1. Patched Browsers

Tools like puppeteer-extra with the stealth plugin patch many of the fingerprint leaks in headless Chrome. The key patches:

Override navigator.webdriver to return false
Fix the Chrome runtime signatures
Patch chrome.cdc markers that ChromeDriver injects
Emulate proper plugin and language arrays

Python

# Using Playwright with stealth patches
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=False,  # headed mode passes more checks
        args=[
            "--disable-blink-features=AutomationControlled",
        ]
    )
    context = browser.new_context(
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    )
    page = context.new_page()
    page.goto("https://target-site.com")

The problem: Cloudflare updates their detection regularly. Stealth plugins lag behind. You end up in a maintenance cycle patching one leak after another.

2. Residential Proxies

Switching from datacenter to residential IPs solves the IP reputation problem immediately. Services like Bright Data, Oxylabs, and IPRoyal sell access to residential proxy pools.

The economics get rough fast. Residential bandwidth costs $8-15 per GB depending on the provider. A typical product page is 2-5 MB with all assets loaded. At scale, you are looking at serious costs just for the proxy layer.

3. Browser Farms

Running real browsers at scale through services like Browserless or your own infrastructure. Full Chrome instances with real rendering, proper TLS stacks, the whole thing.

This works well but costs $50-200/month minimum for the compute. You also need to manage sessions, handle crashes, and deal with memory leaks from long-running browser instances.

4. Scraping APIs

This is the approach that makes the most sense for most teams. Instead of assembling the proxy layer, browser management, CAPTCHA solving, and fingerprint patching yourself, you send a URL and get back HTML or structured data.

Services like AlterLab, ScraperAPI, and ScrapingBee handle the Cloudflare bypass internally. The key differences between them are pricing model, success rates on hard targets, and whether they support JS rendering.

AlterLab uses a tiered approach - light scrapes on static pages cost less, while JS-rendered pages with anti-bot bypass cost more. You only pay for the complexity your target actually requires.

What Does Not Work Anymore

Just setting headers. Cloudflare stopped relying on User-Agent strings alone around 2022. Headers are necessary but not sufficient.

curl with impersonate flags. curl-impersonate was great for a while. Cloudflare has adapted to detect its specific TLS patterns.

Headless Chrome with default settings. The navigator.webdriver flag is the most basic check. Cloudflare runs dozens more.

Free proxy lists. Those IPs are already burned. Every scraping bot on the internet has used them.

Pick Your Tradeoff

Every approach trades off between cost, reliability, and maintenance burden.

If you scrape fewer than 10K pages per month, a patched browser with residential proxies works fine. The maintenance is manageable at that scale.

If you scrape more than that, the math starts favoring a scraping API. The time you spend maintaining proxy rotation, browser patching, and CAPTCHA solving infrastructure is time not spent on whatever you are actually building.

The key metric is cost per successful request. Factor in your engineering time, not just the API or proxy bill.

Testing Your Setup

Before running any scraper at scale, verify against known Cloudflare-protected sites:

Bash

# Quick check if your approach works
curl -s -o /dev/null -w "%{http_code}" https://www.target-site.com

# 200 = success
# 403 = blocked
# 503 = Cloudflare challenge page

If you are getting 403s or challenge pages, your fingerprint is getting caught. Go back and check your TLS stack first - that is where most detection starts.

Was this article helpful?

Try it yourself

Skip the proxy management overhead

AlterLab handles proxy rotation, browser environments, and challenge resolution for you.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Yes, but it requires addressing multiple detection layers simultaneously. Cloudflare uses TLS fingerprinting, JavaScript challenges, behavioral analysis, and IP reputation scoring. Simple header spoofing or user-agent rotation no longer works. Effective approaches include using real browser engines with stealth patches, residential proxies, and managed scraping APIs like AlterLab that handle all detection layers automatically.

HTTP headers are only one detection signal. Cloudflare also checks your TLS handshake fingerprint (JA3/JA4), JavaScript execution environment, canvas and WebGL rendering, and request timing patterns. Python requests and Node axios have completely different TLS fingerprints from real browsers, which Cloudflare detects before your headers are even read.

TLS fingerprinting (JA3/JA4) identifies your HTTP client by the unique pattern of cipher suites, extensions, and elliptic curves in the TLS handshake. This happens at the network level before any HTTP headers are sent. Cloudflare compares your fingerprint against known browser profiles. Scripting libraries like Python requests or Go net/http produce fingerprints that are trivially distinguishable from Chrome or Firefox.

Residential proxies help with the IP reputation layer but do not solve TLS fingerprinting or JavaScript challenge detection. Datacenter IPs are flagged by default, so residential or mobile proxies reduce one friction point. However, you still need a real browser engine or stealth-patched client to pass the other detection layers.

The fastest approach is using a managed scraping API like AlterLab that handles Cloudflare bypass automatically. It manages browser fingerprinting, proxy rotation, JavaScript rendering, and challenge solving in one API call. For DIY approaches, Playwright with stealth plugins and residential proxies is the most reliable but requires significant infrastructure to scale.

Yash Dubey

View all posts

Tutorials

Handling Infinite Scroll & Pagination in Headless Browsers

Learn how to reliably handle infinite scroll, cursor-based pagination, and dynamic rendering for autonomous AI web scraping agents using headless browsers.

Herald Blog Service

Jun 13, 2026

Tutorials

Playwright Network Interception Guide for AI Data Extraction

Learn how to intercept and block network requests in Playwright to accelerate AI agent data extraction, reduce bandwidth, and capture raw API JSON payloads.

Herald Blog Service

Jun 13, 2026

13m

Tutorials

Building an Autonomous CrewAI Web Scraping Tool for JSON Extraction

Learn how to build a custom CrewAI tool that autonomously scrapes dynamic websites and returns structured JSON using a headless browser API.

Herald Blog Service

Jun 12, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Why Cloudflare Blocks Your Scraper

The Approaches That Work

1. Patched Browsers

2. Residential Proxies

3. Browser Farms

4. Scraping APIs

What Does Not Work Anymore

Pick Your Tradeoff

Testing Your Setup

Frequently Asked Questions

Related Articles

Handling Infinite Scroll & Pagination in Headless Browsers

Playwright Network Interception Guide for AI Data Extraction

Building an Autonomous CrewAI Web Scraping Tool for JSON Extraction

Popular Posts

Why Your Headless Browser Gets Detected (and How to Fix It)

Best Web Scraping APIs in 2026: Complete Comparison Guide

Playwright Bot Detection: What Actually Works in 2026

How to Scrape Twitter/X: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape Amazon in 2026: Engineering Guide

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Indeed: Complete Guide for 2026

How to Scrape Twitter/X Data: Complete Guide for 2026

Newsletter

Recommended Reading

How to Scrape Amazon in 2026: Engineering Guide

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Indeed: Complete Guide for 2026

How to Scrape Twitter/X Data: Complete Guide for 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources