Playwright Anti-Bot Detection: What Actually Works in 2026
Playwright AntiBot Detection: What Actually Works in 2026 You picked Playwright because it is fast, has a clean API, and supports all major browsers. But...
Yash Dubey
February 19, 2026
You picked Playwright because it is fast, has a clean API, and supports all major browsers. But within five minutes of scraping a real target, you hit a Cloudflare challenge page. Your headless browser is getting detected, and no amount of await page.wait_for_load_state("networkidle") is going to fix it.
This guide covers the specific detection vectors that flag Playwright, what stealth techniques actually work in 2026, and how to handle the major anti-bot providers (Cloudflare, DataDome, PerimeterX) with working Python code.
Why Playwright Gets Detected
Playwright is not invisible. Out of the box, it leaves a trail of signals that anti-bot systems check in milliseconds. Understanding these signals is the first step to avoiding them.
The navigator.webdriver Flag
Every Playwright browser instance sets navigator.webdriver to true. This is a W3C WebDriver spec requirement. Anti-bot scripts check this property first because it is the cheapest detection method available.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://example.com")
# This returns True in default Playwright — instant detection
is_bot = page.evaluate("() => navigator.webdriver")
print(f"webdriver flag: {is_bot}") # True
browser.close()Headless Browser Tells
Headless Chromium differs from headed Chrome in dozens of subtle ways. Anti-bot systems check for:
- Missing plugins:
navigator.pluginsis empty in headless mode. Real Chrome reports PDF Viewer, Chrome PDF Plugin, etc. - Missing WebGL renderer: Headless Chrome uses SwiftShader as its GPU renderer. Real browsers report actual GPU hardware like "ANGLE (NVIDIA GeForce RTX 3080)".
- Screen dimensions: Headless defaults to 800x600 with 0 values for
screen.availHeightandscreen.availWidth. - Missing permissions API behavior:
Notification.permissionreturns unexpected values in headless mode. - Chrome runtime objects: Real Chrome injects
window.chromewithruntime,loadTimes, andcsiobjects. Headless Chrome is missing or has incomplete versions of these.
JavaScript Fingerprinting
Modern anti-bot systems build a fingerprint from 50+ browser properties and compare it against known profiles. Playwright's fingerprint is distinctive:
# What anti-bot scripts collect (simplified)
fingerprint_checks = """
() => ({
webdriver: navigator.webdriver,
plugins: navigator.plugins.length,
languages: navigator.languages,
platform: navigator.platform,
hardwareConcurrency: navigator.hardwareConcurrency,
deviceMemory: navigator.deviceMemory,
webgl: (() => {
const canvas = document.createElement('canvas');
const gl = canvas.getContext('webgl');
return gl ? gl.getParameter(gl.RENDERER) : null;
})(),
chrome: !!window.chrome,
permissions: typeof navigator.permissions,
})
"""If any of these values look like a default automation tool, you are flagged before the page even finishes loading.
Detection Vectors Specific to Playwright
Beyond generic headless detection, Playwright has its own unique fingerprint that anti-bot vendors specifically target.
| Feature | Default Playwright | Real Browser |
|---|---|---|
| navigator.webdriver | true | false |
| navigator.plugins.length | 0 | 5+ |
| WebGL renderer | SwiftShader | GPU hardware |
| window.chrome.runtime | ||
| Notification.permission | denied | prompt |
| CDP detection | ||
| Consistent TLS fingerprint |
CDP Protocol Leak
Playwright communicates with the browser through Chrome DevTools Protocol (CDP). Some anti-bot scripts detect this by checking for the presence of CDP-related runtime objects or by measuring timing anomalies introduced by the protocol layer.
TLS Fingerprinting (JA3/JA4)
This is the hardest to fix. When your browser makes an HTTPS connection, the TLS handshake includes a unique ordering of cipher suites, extensions, and supported curves. Anti-bot services like Cloudflare fingerprint this handshake (JA3/JA4 hash) and compare it against known browser signatures.
Playwright's Chromium binary has a JA3 fingerprint that does not match any real Chrome release. This alone can get you blocked before any JavaScript even runs.
Stealth Techniques That Work
playwright-stealth
The playwright-stealth package patches the most common detection vectors. It is not a silver bullet, but it is the minimum viable starting point.
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--no-first-run",
"--no-default-browser-check",
],
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
locale="en-US",
timezone_id="America/New_York",
)
page = context.new_page()
stealth_sync(page)
page.goto("https://bot.sannysoft.com")
page.screenshot(path="stealth_test.png")
browser.close()What playwright-stealth patches:
- Sets
navigator.webdrivertofalse - Fakes
navigator.pluginsandnavigator.mimeTypes - Patches
chrome.runtimeto look like a real Chrome extension environment - Fixes
Notification.permissionbehavior - Overrides
navigator.permissions.queryresponses
What it does not fix: WebGL fingerprint, TLS fingerprint, CDP detection, or behavioral analysis.
Browser Context Hardening
Beyond stealth patches, your browser context configuration matters. Here is a hardened context that covers most fingerprint vectors:
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
def create_stealth_context(playwright):
browser = playwright.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
"--disable-dev-shm-usage",
"--disable-accelerated-2d-canvas",
"--disable-gpu-sandbox",
"--no-first-run",
"--no-zygote",
],
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
screen={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
locale="en-US",
timezone_id="America/New_York",
geolocation={"latitude": 40.7128, "longitude": -74.0060},
permissions=["geolocation"],
color_scheme="light",
has_touch=False,
is_mobile=False,
java_script_enabled=True,
extra_http_headers={
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"sec-ch-ua": (
'"Chromium";v="124", '
'"Google Chrome";v="124", '
'"Not-A.Brand";v="99"'
),
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
},
)
return browser, contextCookie and Session Persistence
Anti-bot systems track whether your browser maintains cookies between requests. A real user has cookies from previous visits. A bot starts fresh every time.
import json
from pathlib import Path
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
COOKIE_FILE = Path("cookies.json")
def load_cookies(context, url):
"""Load cookies from a previous session."""
if COOKIE_FILE.exists():
cookies = json.loads(COOKIE_FILE.read_text())
context.add_cookies(cookies)
def save_cookies(context):
"""Persist cookies for future sessions."""
cookies = context.cookies()
COOKIE_FILE.write_text(json.dumps(cookies, indent=2))
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context()
page = context.new_page()
stealth_sync(page)
# Load cookies from a previous scraping session
load_cookies(context, "https://target-site.com")
page.goto("https://target-site.com")
# ... do your scraping ...
# Save cookies for the next run
save_cookies(context)
browser.close()For persistent browser profiles that survive between runs (including localStorage, IndexedDB, and service workers):
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
with sync_playwright() as p:
# Use persistent context to maintain full browser state
context = p.chromium.launch_persistent_context(
user_data_dir="./browser_profile",
headless=True,
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
args=["--disable-blink-features=AutomationControlled"],
)
page = context.pages[0] if context.pages else context.new_page()
stealth_sync(page)
page.goto("https://target-site.com")
# Full browser state persists between runs
context.close()Request Interception
Intercept and modify requests to strip automation headers and add missing ones:
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
def handle_route(route):
headers = route.request.headers.copy()
# Remove headers that leak automation
headers.pop("x-playwright", None)
headers.pop("x-devtools", None)
# Ensure consistent Accept header
if route.request.resource_type == "document":
headers["Accept"] = (
"text/html,application/xhtml+xml,"
"application/xml;q=0.9,image/avif,"
"image/webp,image/apng,*/*;q=0.8"
)
headers["Upgrade-Insecure-Requests"] = "1"
route.continue_(headers=headers)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context()
page = context.new_page()
stealth_sync(page)
# Intercept all requests to fix headers
page.route("**/*", handle_route)
# Block tracking scripts that report automation
page.route(
"**/{datadome,px,perimeterx,kasada}*.js",
lambda route: route.abort(),
)
page.goto("https://target-site.com")
browser.close()Warning: Blocking anti-bot scripts outright can be counterproductive. Some sites check if their detection scripts ran and block you if they did not load. Use this selectively.
Handling Cloudflare, DataDome, and PerimeterX
Each anti-bot provider has different detection strategies. A technique that works against Cloudflare may fail against DataDome.
Detect the provider
Check response headers and page source. Cloudflare returns cf-ray headers. DataDome sets datadome cookies. PerimeterX uses _px cookies and loads px scripts.
Apply provider-specific patches
Each system checks different fingerprint vectors. Cloudflare focuses on TLS and JS challenges. DataDome checks behavioral patterns. PerimeterX does deep browser fingerprinting.
Handle challenges
Wait for challenge pages to resolve. Some require JavaScript execution time, others need CAPTCHA solving, and some need specific cookie values from previous visits.
Validate success
Check that the response contains actual content, not a challenge page. Verify status codes and look for challenge page markers in the HTML.
Cloudflare
Cloudflare is the most common anti-bot system. Their detection layers include TLS fingerprinting, JavaScript challenges (Turnstile), and behavioral analysis.
import time
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
def scrape_cloudflare_site(url):
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled"],
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
)
page = context.new_page()
stealth_sync(page)
page.goto(url, wait_until="domcontentloaded")
# Cloudflare challenge pages take 3-8 seconds to resolve
# Wait for the challenge to complete
for attempt in range(15):
title = page.title()
content = page.content()
# Check if we are past the challenge
if "just a moment" not in title.lower() and \
"checking your browser" not in content.lower() and \
"cf-challenge" not in content.lower():
break
time.sleep(1)
# Verify we got real content
if "just a moment" in page.title().lower():
print("Failed to bypass Cloudflare challenge")
browser.close()
return None
html = page.content()
browser.close()
return htmlCloudflare Turnstile is their CAPTCHA replacement. It runs in the background and does not always require user interaction. But when it does, you need a CAPTCHA solving service or a real user session. More on this in the CAPTCHA section below.
DataDome
DataDome is behavioral-heavy. It watches mouse movements, scroll patterns, and typing cadence. A browser that navigates directly to a URL without any human-like interaction gets flagged.
import random
import time
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
def human_like_mouse(page):
"""Simulate realistic mouse movement."""
width = page.viewport_size["width"]
height = page.viewport_size["height"]
# Move to 3-5 random positions with realistic timing
for _ in range(random.randint(3, 5)):
x = random.randint(100, width - 100)
y = random.randint(100, height - 100)
page.mouse.move(x, y, steps=random.randint(10, 25))
time.sleep(random.uniform(0.1, 0.4))
def human_like_scroll(page):
"""Scroll down in chunks like a human reader."""
total_scroll = random.randint(500, 1500)
scrolled = 0
while scrolled < total_scroll:
delta = random.randint(80, 200)
page.mouse.wheel(0, delta)
scrolled += delta
time.sleep(random.uniform(0.1, 0.3))
def scrape_datadome_site(url):
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled"],
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
)
page = context.new_page()
stealth_sync(page)
page.goto(url, wait_until="domcontentloaded")
page.wait_for_timeout(2000)
# DataDome checks for human-like behavior
human_like_mouse(page)
human_like_scroll(page)
# Wait for any DataDome challenge to resolve
page.wait_for_timeout(3000)
# Check for DataDome block page
content = page.content()
if "datadome" in content.lower() and "blocked" in content.lower():
print("DataDome blocked the request")
browser.close()
return None
html = page.content()
browser.close()
return htmlPerimeterX (now HUMAN Security)
PerimeterX runs deep fingerprinting via their _px scripts. They check canvas fingerprints, AudioContext fingerprints, WebGL parameters, and font enumeration.
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
def patch_fingerprint(page):
"""Inject scripts to override fingerprint vectors."""
page.add_init_script("""
// Override canvas fingerprint
const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function(type) {
if (type === 'image/png') {
const ctx = this.getContext('2d');
if (ctx) {
const imageData = ctx.getImageData(
0, 0, this.width, this.height
);
for (let i = 0; i < imageData.data.length; i += 4) {
imageData.data[i] ^= 1;
}
ctx.putImageData(imageData, 0, 0);
}
}
return originalToDataURL.apply(this, arguments);
};
// Override AudioContext fingerprint
const originalGetFloatFrequencyData =
AnalyserNode.prototype.getFloatFrequencyData;
AnalyserNode.prototype.getFloatFrequencyData = function(array) {
originalGetFloatFrequencyData.call(this, array);
for (let i = 0; i < array.length; i++) {
array[i] += Math.random() * 0.0001;
}
};
""")
def scrape_perimeterx_site(url):
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-web-security",
],
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
)
page = context.new_page()
stealth_sync(page)
patch_fingerprint(page)
page.goto(url, wait_until="networkidle")
# Check for PerimeterX block
if page.query_selector("[data-testid='px-captcha']"):
print("PerimeterX CAPTCHA detected")
browser.close()
return None
html = page.content()
browser.close()
return htmlCAPTCHA Handling in Playwright
When stealth techniques fail, you hit CAPTCHAs. Automating CAPTCHA solving requires integrating with a solving service. Here is how to handle the two most common types.
Cloudflare Turnstile
Turnstile works differently from traditional CAPTCHAs. It runs a background challenge and injects a token into a hidden form field. You can extract the site key and send it to a solving service.
import time
import urllib.request
import json
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
CAPSOLVER_API_KEY = "your_capsolver_key"
def solve_turnstile(site_key, page_url):
"""Send Turnstile challenge to a solving service."""
payload = json.dumps({
"clientKey": CAPSOLVER_API_KEY,
"task": {
"type": "AntiTurnstileTaskProxyLess",
"websiteURL": page_url,
"websiteKey": site_key,
},
}).encode()
req = urllib.request.Request(
"https://api.capsolver.com/createTask",
data=payload,
headers={"Content-Type": "application/json"},
)
resp = json.loads(urllib.request.urlopen(req).read())
task_id = resp["taskId"]
# Poll for solution
for _ in range(30):
time.sleep(2)
check = json.dumps({
"clientKey": CAPSOLVER_API_KEY,
"taskId": task_id,
}).encode()
req = urllib.request.Request(
"https://api.capsolver.com/getTaskResult",
data=check,
headers={"Content-Type": "application/json"},
)
result = json.loads(urllib.request.urlopen(req).read())
if result["status"] == "ready":
return result["solution"]["token"]
return None
def scrape_with_turnstile(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
stealth_sync(page)
page.goto(url, wait_until="domcontentloaded")
page.wait_for_timeout(3000)
# Find Turnstile widget and extract site key
turnstile_frame = page.query_selector(
"iframe[src*='challenges.cloudflare.com']"
)
if turnstile_frame:
site_key = page.evaluate("""
() => {
const widget = document.querySelector(
'[data-sitekey]'
);
return widget
? widget.getAttribute('data-sitekey')
: null;
}
""")
if site_key:
token = solve_turnstile(site_key, url)
if token:
page.evaluate(
"""(token) => {
const input = document.querySelector(
'[name="cf-turnstile-response"]'
);
if (input) input.value = token;
const callback = window.turnstileCallback
|| window._cf_chl_opt?.clCb;
if (callback) callback(token);
}""",
token,
)
page.wait_for_timeout(2000)
html = page.content()
browser.close()
return htmlhCaptcha
hCaptcha is used by Cloudflare on some domains and by many other sites directly.
import json
import time
import urllib.request
CAPSOLVER_API_KEY = "your_capsolver_key"
def solve_hcaptcha(site_key, page_url):
"""Send hCaptcha to a solving service."""
payload = json.dumps({
"clientKey": CAPSOLVER_API_KEY,
"task": {
"type": "HCaptchaTaskProxyLess",
"websiteURL": page_url,
"websiteKey": site_key,
},
}).encode()
req = urllib.request.Request(
"https://api.capsolver.com/createTask",
data=payload,
headers={"Content-Type": "application/json"},
)
resp = json.loads(urllib.request.urlopen(req).read())
task_id = resp["taskId"]
for _ in range(60):
time.sleep(2)
check = json.dumps({
"clientKey": CAPSOLVER_API_KEY,
"taskId": task_id,
}).encode()
req = urllib.request.Request(
"https://api.capsolver.com/getTaskResult",
data=check,
headers={"Content-Type": "application/json"},
)
result = json.loads(urllib.request.urlopen(req).read())
if result["status"] == "ready":
return result["solution"]["gRecaptchaResponse"]
return NoneThe pattern is the same regardless of the CAPTCHA type: extract the site key from the page, send it to a solving service, inject the response token back into the page. Budget $2-5 per thousand solves depending on the provider and CAPTCHA type.
Wait Strategies That Prevent Detection
Naive waits are one of the most common reasons scrapes fail. Using the wrong wait strategy either triggers anti-bot detection (too fast) or wastes time (too slow).
networkidle vs domcontentloaded vs Custom Waits
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Option 1: domcontentloaded — fires when HTML is parsed
# Fast, but JS-rendered content is not ready yet
page.goto("https://example.com", wait_until="domcontentloaded")
# Option 2: networkidle — waits until no network requests
# for 500ms. Works for most sites but can hang on sites
# with persistent WebSocket connections or polling.
page.goto("https://example.com", wait_until="networkidle")
# Option 3: Custom wait — the most reliable approach
# Wait for a specific element that indicates content loaded
page.goto("https://example.com", wait_until="domcontentloaded")
page.wait_for_selector("div.product-list", timeout=10000)
browser.close()When to use each:
domcontentloaded-- Static sites, server-rendered pages. Fast and reliable.networkidle-- Sites with standard JS rendering (React, Vue, Angular). Good default for dynamic rendering with headless browsers, but set a timeout.- Custom selector waits -- Best for known targets. Wait for the exact element you need to scrape.
Waiting a Fixed Duration
Sometimes you just need to wait for a specific duration. This is common when dealing with anti-bot challenges that need time to resolve.
# Playwright's built-in timeout (non-blocking, better than time.sleep)
page.wait_for_timeout(5000) # Wait for 5 seconds
# Conditional wait with timeout
try:
page.wait_for_selector(
"div.content",
state="visible",
timeout=5000,
)
except Exception:
# Element did not appear in 5 seconds — take a screenshot
page.screenshot(path="debug_timeout.png")Waiting for Dynamic Content
For SPAs and sites that load content via XHR/fetch after the initial page load:
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Wait for a specific API response before scraping
with page.expect_response(
lambda resp: "/api/products" in resp.url
and resp.status == 200,
timeout=15000,
) as response_info:
page.goto(
"https://spa-site.com/products",
wait_until="domcontentloaded",
)
api_response = response_info.value
data = api_response.json()
print(f"Got {len(data['items'])} products from API")
browser.close()Screenshot Debugging for Anti-Bot Issues
When a scrape fails, a screenshot tells you more than any log message. Build screenshot debugging into your scraping pipeline from the start.
import os
from datetime import datetime
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
DEBUG_DIR = "debug_screenshots"
os.makedirs(DEBUG_DIR, exist_ok=True)
def scrape_with_debug(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(
viewport={"width": 1920, "height": 1080}
)
stealth_sync(page)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
domain = (
url.split("//")[-1].split("/")[0].replace(".", "_")
)
try:
response = page.goto(
url, wait_until="networkidle", timeout=30000
)
# Screenshot on non-200 status
if response and response.status != 200:
path = (
f"{DEBUG_DIR}/{domain}_{timestamp}"
f"_status{response.status}.png"
)
page.screenshot(path=path, full_page=True)
print(
f"Non-200 status ({response.status}). "
f"Screenshot: {path}"
)
# Screenshot if challenge page detected
content = page.content()
challenge_markers = [
"just a moment",
"checking your browser",
"access denied",
"blocked",
"captcha",
"cf-challenge",
"datadome",
]
for marker in challenge_markers:
if marker in content.lower():
slug = marker.replace(" ", "_")
path = (
f"{DEBUG_DIR}/{domain}"
f"_{timestamp}_{slug}.png"
)
page.screenshot(
path=path, full_page=True
)
print(
f"Challenge detected: {marker}. "
f"Screenshot: {path}"
)
break
return content
except Exception as e:
path = (
f"{DEBUG_DIR}/{domain}_{timestamp}_error.png"
)
try:
page.screenshot(
path=path, full_page=True
)
print(f"Error: {e}. Screenshot: {path}")
except Exception:
print(
f"Error: {e}. "
f"Could not take screenshot."
)
return None
finally:
browser.close()This pattern -- playwright screenshot on failure -- saves hours of debugging. When you see the actual challenge page or error screen, you know exactly which anti-bot system blocked you and at what stage.
When DIY Fails: Knowing the Limits
There is a ceiling to what stealth patches and browser configuration can achieve. The arms race between anti-bot systems and automation tools is constant, and the detection side has structural advantages:
- TLS fingerprinting cannot be fixed from JavaScript. You would need to patch Chromium's TLS stack at the C++ level and rebuild the binary.
- Behavioral analysis gets more sophisticated every month. Anti-bot vendors now use ML models trained on billions of real user sessions.
- Maintaining stealth is a full-time job. Patches that work today break with the next Chrome update or anti-bot vendor release.
- Scale amplifies problems. A technique that works for 100 pages per day fails at 10,000 because rate limiting and IP reputation compound.
If you are spending more time maintaining your stealth setup than building your actual product, it is time to consider a managed solution. Services like AlterLab handle the anti-bot layer at the infrastructure level -- TLS fingerprint rotation, browser profile management, residential proxy networks, and automatic challenge solving -- so your code stays a simple API call:
import requests
response = requests.post(
"https://api.alterlab.io/v1/scrape",
headers={"X-API-Key": "your_api_key"},
json={
"url": "https://cloudflare-protected-site.com",
"formats": ["html", "markdown"],
},
)
data = response.json()
print(data["content"])No stealth patches. No fingerprint maintenance. No CAPTCHA integration. The anti-bot bypass happens at the network level, which is fundamentally harder to detect than anything you can do from within a browser.
Quick Reference
| Feature | Technique | Cloudflare | DataDome | PerimeterX |
|---|---|---|---|---|
| playwright-stealth | ||||
| Browser context hardening | ||||
| Cookie persistence | ||||
| Request interception | ||||
| Human-like behavior | ||||
| CAPTCHA solving service | ||||
| Managed API (AlterLab) |
The honest summary: playwright-stealth plus browser context hardening will get you past basic bot detection and work on sites without dedicated anti-bot services. For Cloudflare, DataDome, and PerimeterX, you need a combination of stealth, behavioral simulation, and CAPTCHA solving. For reliable production-scale scraping against these systems, the cost-benefit math usually points toward a managed service.
Start with stealth patches. Add behavioral simulation when you hit challenges. Integrate CAPTCHA solving when you hit CAPTCHAs. And when the maintenance overhead crosses your tolerance threshold, switch to an API.