Pricing Compare Playground Blog Docs Changelog

Playwright Bot Detection: What Actually Works in 2026

Q: Does Playwright get detected by anti-bot systems?

Playwright is harder to detect than Selenium because it does not inject cdc_ markers. However, it still sets navigator.webdriver to true, uses default headless fingerprints, and produces detectable TLS signatures. Anti-bot systems like Cloudflare, DataDome, and PerimeterX can still detect unpatched Playwright browsers.

Q: How do I make Playwright undetectable?

Use playwright-extra with the stealth plugin to automatically patch common detection vectors. Override navigator.webdriver, spoof plugins and mime types, add canvas noise, and use headed mode instead of headless when possible. For advanced targets, combine with residential proxies and request throttling.

Q: What is playwright-extra stealth plugin?

Playwright-extra is a modular wrapper around Playwright that supports plugins. The stealth plugin automatically patches known detection vectors including navigator.webdriver, chrome.runtime, plugin enumeration, language settings, and WebGL fingerprints. Install via npm install playwright-extra puppeteer-extra-plugin-stealth.

Q: Is Playwright better than Puppeteer for scraping?

Playwright supports multiple browser engines (Chromium, Firefox, WebKit), has better auto-waiting, supports multiple browser contexts for parallel scraping, and has a more modern API. Puppeteer only supports Chromium. For scraping, Playwright is generally the better choice unless you need Puppeteer-specific ecosystem tools.

Playwright AntiBot Detection: What Actually Works in 2026 You picked Playwright because it is fast, has a clean API, and supports all major browsers. But...

Yash DubeyFebruary 19, 2026

15 min read

10,697 views

On this page

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

You picked Playwright because it is fast, has a clean API, and supports all major browsers. But within five minutes of scraping a real target, you hit a Cloudflare challenge page. Your headless browser is getting detected, and no amount of await page.wait_for_load_state("networkidle") is going to fix it.

This guide covers the specific detection vectors that flag Playwright, what stealth techniques actually work in 2026, and how to handle the major anti-bot providers (Cloudflare, DataDome, PerimeterX) with working Python code.

87%Sites using at least one anti-bot service

<2 secAverage bot detection time

12+Fingerprint vectors checked

3xDetection rate increase since 2024

Why Playwright Gets Detected

Playwright is not invisible. Out of the box, it leaves a trail of signals that anti-bot systems check in milliseconds. Understanding these signals is the first step to avoiding them.

The navigator.webdriver Flag

Every Playwright browser instance sets navigator.webdriver to true. This is a W3C WebDriver spec requirement. Anti-bot scripts check this property first because it is the cheapest detection method available.

Python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example.com")

    # This returns True in default Playwright — instant detection
    is_bot = page.evaluate("() => navigator.webdriver")
    print(f"webdriver flag: {is_bot}")  # True
    browser.close()

Headless Browser Tells

Headless Chromium differs from headed Chrome in dozens of subtle ways. Anti-bot systems check for:

Missing plugins: navigator.plugins is empty in headless mode. Real Chrome reports PDF Viewer, Chrome PDF Plugin, etc.
Missing WebGL renderer: Headless Chrome uses SwiftShader as its GPU renderer. Real browsers report actual GPU hardware like "ANGLE (NVIDIA GeForce RTX 3080)".
Screen dimensions: Headless defaults to 800x600 with 0 values for screen.availHeight and screen.availWidth.
Missing permissions API behavior: Notification.permission returns unexpected values in headless mode.
Chrome runtime objects: Real Chrome injects window.chrome with runtime, loadTimes, and csi objects. Headless Chrome is missing or has incomplete versions of these.

JavaScript Fingerprinting

Modern anti-bot systems build a fingerprint from 50+ browser properties and compare it against known profiles. Playwright's fingerprint is distinctive:

Python

# What anti-bot scripts collect (simplified)
fingerprint_checks = """
() => ({
    webdriver: navigator.webdriver,
    plugins: navigator.plugins.length,
    languages: navigator.languages,
    platform: navigator.platform,
    hardwareConcurrency: navigator.hardwareConcurrency,
    deviceMemory: navigator.deviceMemory,
    webgl: (() => {
        const canvas = document.createElement('canvas');
        const gl = canvas.getContext('webgl');
        return gl ? gl.getParameter(gl.RENDERER) : null;
    })(),
    chrome: !!window.chrome,
    permissions: typeof navigator.permissions,
})
"""

If any of these values look like a default automation tool, you are flagged before the page even finishes loading.

Detection Vectors Specific to Playwright

Beyond generic headless detection, Playwright has its own unique fingerprint that anti-bot vendors specifically target.

Feature	Default Playwright	Real Browser
navigator.webdriver	true	false
navigator.plugins.length	0	5+
WebGL renderer	SwiftShader	GPU hardware
window.chrome.runtime
Notification.permission	denied	prompt
CDP detection
Consistent TLS fingerprint

CDP Protocol Leak

Playwright communicates with the browser through Chrome DevTools Protocol (CDP). Some anti-bot scripts detect this by checking for the presence of CDP-related runtime objects or by measuring timing anomalies introduced by the protocol layer.

TLS Fingerprinting (JA3/JA4)

This is the hardest to fix. When your browser makes an HTTPS connection, the TLS handshake includes a unique ordering of cipher suites, extensions, and supported curves. Anti-bot services like Cloudflare fingerprint this handshake (JA3/JA4 hash) and compare it against known browser signatures.

Playwright's Chromium binary has a JA3 fingerprint that does not match any real Chrome release. This alone can get you blocked before any JavaScript even runs.

Stealth Techniques That Work

playwright-stealth

The playwright-stealth package patches the most common detection vectors. It is not a silver bullet, but it is the minimum viable starting point.

Python

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
            "--no-first-run",
            "--no-default-browser-check",
        ],
    )
    context = browser.new_context(
        viewport={"width": 1920, "height": 1080},
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/124.0.0.0 Safari/537.36"
        ),
        locale="en-US",
        timezone_id="America/New_York",
    )
    page = context.new_page()
    stealth_sync(page)

    page.goto("https://bot.sannysoft.com")
    page.screenshot(path="stealth_test.png")
    browser.close()

What playwright-stealth patches:

Sets navigator.webdriver to false
Fakes navigator.plugins and navigator.mimeTypes
Patches chrome.runtime to look like a real Chrome extension environment
Fixes Notification.permission behavior
Overrides navigator.permissions.query responses

What it does not fix: WebGL fingerprint, TLS fingerprint, CDP detection, or behavioral analysis.

Browser Context Hardening

Beyond stealth patches, your browser context configuration matters. Here is a hardened context that covers most fingerprint vectors:

Python

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def create_stealth_context(playwright):
    browser = playwright.chromium.launch(
        headless=True,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-features=IsolateOrigins,site-per-process",
            "--disable-dev-shm-usage",
            "--disable-accelerated-2d-canvas",
            "--disable-gpu-sandbox",
            "--no-first-run",
            "--no-zygote",
        ],
    )

    context = browser.new_context(
        viewport={"width": 1920, "height": 1080},
        screen={"width": 1920, "height": 1080},
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/124.0.0.0 Safari/537.36"
        ),
        locale="en-US",
        timezone_id="America/New_York",
        geolocation={"latitude": 40.7128, "longitude": -74.0060},
        permissions=["geolocation"],
        color_scheme="light",
        has_touch=False,
        is_mobile=False,
        java_script_enabled=True,
        extra_http_headers={
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "sec-ch-ua": (
                '"Chromium";v="124", '
                '"Google Chrome";v="124", '
                '"Not-A.Brand";v="99"'
            ),
            "sec-ch-ua-mobile": "?0",
            "sec-ch-ua-platform": '"Windows"',
        },
    )
    return browser, context

Anti-bot systems track whether your browser maintains cookies between requests. A real user has cookies from previous visits. A bot starts fresh every time.

Python

import json
from pathlib import Path
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

COOKIE_FILE = Path("cookies.json")

def load_cookies(context, url):
    """Load cookies from a previous session."""
    if COOKIE_FILE.exists():
        cookies = json.loads(COOKIE_FILE.read_text())
        context.add_cookies(cookies)

def save_cookies(context):
    """Persist cookies for future sessions."""
    cookies = context.cookies()
    COOKIE_FILE.write_text(json.dumps(cookies, indent=2))

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    context = browser.new_context()
    page = context.new_page()
    stealth_sync(page)

    # Load cookies from a previous scraping session
    load_cookies(context, "https://target-site.com")

    page.goto("https://target-site.com")
    # ... do your scraping ...

    # Save cookies for the next run
    save_cookies(context)
    browser.close()

For persistent browser profiles that survive between runs (including localStorage, IndexedDB, and service workers):

Python

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    # Use persistent context to maintain full browser state
    context = p.chromium.launch_persistent_context(
        user_data_dir="./browser_profile",
        headless=True,
        viewport={"width": 1920, "height": 1080},
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/124.0.0.0 Safari/537.36"
        ),
        args=["--disable-blink-features=AutomationControlled"],
    )
    page = context.pages[0] if context.pages else context.new_page()
    stealth_sync(page)

    page.goto("https://target-site.com")
    # Full browser state persists between runs
    context.close()

Request Interception

Intercept and modify requests to strip automation headers and add missing ones:

Python

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def handle_route(route):
    headers = route.request.headers.copy()

    # Remove headers that leak automation
    headers.pop("x-playwright", None)
    headers.pop("x-devtools", None)

    # Ensure consistent Accept header
    if route.request.resource_type == "document":
        headers["Accept"] = (
            "text/html,application/xhtml+xml,"
            "application/xml;q=0.9,image/avif,"
            "image/webp,image/apng,*/*;q=0.8"
        )
        headers["Upgrade-Insecure-Requests"] = "1"

    route.continue_(headers=headers)

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    context = browser.new_context()
    page = context.new_page()
    stealth_sync(page)

    # Intercept all requests to fix headers
    page.route("**/*", handle_route)

    # Block tracking scripts that report automation
    page.route(
        "**/{datadome,px,perimeterx,kasada}*.js",
        lambda route: route.abort(),
    )

    page.goto("https://target-site.com")
    browser.close()

Warning: Blocking anti-bot scripts outright can be counterproductive. Some sites check if their detection scripts ran and block you if they did not load. Use this selectively.

Handling Cloudflare, DataDome, and PerimeterX

Each anti-bot provider has different detection strategies. A technique that works against Cloudflare may fail against DataDome.

Detect the provider

Check response headers and page source. Cloudflare returns cf-ray headers. DataDome sets datadome cookies. PerimeterX uses _px cookies and loads px scripts.

Apply provider-specific patches

Each system checks different fingerprint vectors. Cloudflare focuses on TLS and JS challenges. DataDome checks behavioral patterns. PerimeterX does deep browser fingerprinting.

Handle challenges

Wait for challenge pages to resolve. Some require JavaScript execution time, others need CAPTCHA solving, and some need specific cookie values from previous visits.

Validate success

Check that the response contains actual content, not a challenge page. Verify status codes and look for challenge page markers in the HTML.

Cloudflare

Cloudflare is the most common anti-bot system. Their detection layers include TLS fingerprinting, JavaScript challenges (Turnstile), and behavioral analysis.

Python

import time
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def scrape_cloudflare_site(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled"],
        )
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/124.0.0.0 Safari/537.36"
            ),
        )
        page = context.new_page()
        stealth_sync(page)

        page.goto(url, wait_until="domcontentloaded")

        # Cloudflare challenge pages take 3-8 seconds to resolve
        # Wait for the challenge to complete
        for attempt in range(15):
            title = page.title()
            content = page.content()

            # Check if we are past the challenge
            if "just a moment" not in title.lower() and \
               "checking your browser" not in content.lower() and \
               "cf-challenge" not in content.lower():
                break

            time.sleep(1)

        # Verify we got real content
        if "just a moment" in page.title().lower():
            print("Failed to bypass Cloudflare challenge")
            browser.close()
            return None

        html = page.content()
        browser.close()
        return html

Cloudflare Turnstile is their CAPTCHA replacement. It runs in the background and does not always require user interaction. But when it does, you need a CAPTCHA solving service or a real user session. More on this in the CAPTCHA section below.

DataDome

DataDome is behavioral-heavy. It watches mouse movements, scroll patterns, and typing cadence. A browser that navigates directly to a URL without any human-like interaction gets flagged.

Python

import random
import time
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def human_like_mouse(page):
    """Simulate realistic mouse movement."""
    width = page.viewport_size["width"]
    height = page.viewport_size["height"]

    # Move to 3-5 random positions with realistic timing
    for _ in range(random.randint(3, 5)):
        x = random.randint(100, width - 100)
        y = random.randint(100, height - 100)
        page.mouse.move(x, y, steps=random.randint(10, 25))
        time.sleep(random.uniform(0.1, 0.4))

def human_like_scroll(page):
    """Scroll down in chunks like a human reader."""
    total_scroll = random.randint(500, 1500)
    scrolled = 0
    while scrolled < total_scroll:
        delta = random.randint(80, 200)
        page.mouse.wheel(0, delta)
        scrolled += delta
        time.sleep(random.uniform(0.1, 0.3))

def scrape_datadome_site(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled"],
        )
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/124.0.0.0 Safari/537.36"
            ),
        )
        page = context.new_page()
        stealth_sync(page)

        page.goto(url, wait_until="domcontentloaded")
        page.wait_for_timeout(2000)

        # DataDome checks for human-like behavior
        human_like_mouse(page)
        human_like_scroll(page)

        # Wait for any DataDome challenge to resolve
        page.wait_for_timeout(3000)

        # Check for DataDome block page
        content = page.content()
        if "datadome" in content.lower() and "blocked" in content.lower():
            print("DataDome blocked the request")
            browser.close()
            return None

        html = page.content()
        browser.close()
        return html

PerimeterX (now HUMAN Security)

PerimeterX runs deep fingerprinting via their _px scripts. They check canvas fingerprints, AudioContext fingerprints, WebGL parameters, and font enumeration.

Python

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

def patch_fingerprint(page):
    """Inject scripts to override fingerprint vectors."""
    page.add_init_script("""
        // Override canvas fingerprint
        const originalToDataURL = HTMLCanvasElement.prototype.toDataURL;
        HTMLCanvasElement.prototype.toDataURL = function(type) {
            if (type === 'image/png') {
                const ctx = this.getContext('2d');
                if (ctx) {
                    const imageData = ctx.getImageData(
                        0, 0, this.width, this.height
                    );
                    for (let i = 0; i < imageData.data.length; i += 4) {
                        imageData.data[i] ^= 1;
                    }
                    ctx.putImageData(imageData, 0, 0);
                }
            }
            return originalToDataURL.apply(this, arguments);
        };

        // Override AudioContext fingerprint
        const originalGetFloatFrequencyData =
            AnalyserNode.prototype.getFloatFrequencyData;
        AnalyserNode.prototype.getFloatFrequencyData = function(array) {
            originalGetFloatFrequencyData.call(this, array);
            for (let i = 0; i < array.length; i++) {
                array[i] += Math.random() * 0.0001;
            }
        };
    """)

def scrape_perimeterx_site(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--disable-web-security",
            ],
        )
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/124.0.0.0 Safari/537.36"
            ),
        )
        page = context.new_page()
        stealth_sync(page)
        patch_fingerprint(page)

        page.goto(url, wait_until="networkidle")

        # Check for PerimeterX block
        if page.query_selector("[data-testid='px-captcha']"):
            print("PerimeterX CAPTCHA detected")
            browser.close()
            return None

        html = page.content()
        browser.close()
        return html

CAPTCHA Handling in Playwright

When stealth techniques fail, you hit CAPTCHAs. Automating CAPTCHA solving requires integrating with a solving service. Here is how to handle the two most common types.

Cloudflare Turnstile

Turnstile works differently from traditional CAPTCHAs. It runs a background challenge and injects a token into a hidden form field. You can extract the site key and send it to a solving service.

Python

import time
import urllib.request
import json
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

CAPSOLVER_API_KEY = "your_capsolver_key"

def solve_turnstile(site_key, page_url):
    """Send Turnstile challenge to a solving service."""
    payload = json.dumps({
        "clientKey": CAPSOLVER_API_KEY,
        "task": {
            "type": "AntiTurnstileTaskProxyLess",
            "websiteURL": page_url,
            "websiteKey": site_key,
        },
    }).encode()

    req = urllib.request.Request(
        "https://api.capsolver.com/createTask",
        data=payload,
        headers={"Content-Type": "application/json"},
    )
    resp = json.loads(urllib.request.urlopen(req).read())
    task_id = resp["taskId"]

    # Poll for solution
    for _ in range(30):
        time.sleep(2)
        check = json.dumps({
            "clientKey": CAPSOLVER_API_KEY,
            "taskId": task_id,
        }).encode()
        req = urllib.request.Request(
            "https://api.capsolver.com/getTaskResult",
            data=check,
            headers={"Content-Type": "application/json"},
        )
        result = json.loads(urllib.request.urlopen(req).read())
        if result["status"] == "ready":
            return result["solution"]["token"]

    return None

def scrape_with_turnstile(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        stealth_sync(page)

        page.goto(url, wait_until="domcontentloaded")
        page.wait_for_timeout(3000)

        # Find Turnstile widget and extract site key
        turnstile_frame = page.query_selector(
            "iframe[src*='challenges.cloudflare.com']"
        )
        if turnstile_frame:
            site_key = page.evaluate("""
                () => {
                    const widget = document.querySelector(
                        '[data-sitekey]'
                    );
                    return widget
                        ? widget.getAttribute('data-sitekey')
                        : null;
                }
            """)

            if site_key:
                token = solve_turnstile(site_key, url)
                if token:
                    page.evaluate(
                        """(token) => {
                            const input = document.querySelector(
                                '[name="cf-turnstile-response"]'
                            );
                            if (input) input.value = token;

                            const callback = window.turnstileCallback
                                || window._cf_chl_opt?.clCb;
                            if (callback) callback(token);
                        }""",
                        token,
                    )
                    page.wait_for_timeout(2000)

        html = page.content()
        browser.close()
        return html

hCaptcha

hCaptcha is used by Cloudflare on some domains and by many other sites directly.

Python

import json
import time
import urllib.request

CAPSOLVER_API_KEY = "your_capsolver_key"

def solve_hcaptcha(site_key, page_url):
    """Send hCaptcha to a solving service."""
    payload = json.dumps({
        "clientKey": CAPSOLVER_API_KEY,
        "task": {
            "type": "HCaptchaTaskProxyLess",
            "websiteURL": page_url,
            "websiteKey": site_key,
        },
    }).encode()

    req = urllib.request.Request(
        "https://api.capsolver.com/createTask",
        data=payload,
        headers={"Content-Type": "application/json"},
    )
    resp = json.loads(urllib.request.urlopen(req).read())
    task_id = resp["taskId"]

    for _ in range(60):
        time.sleep(2)
        check = json.dumps({
            "clientKey": CAPSOLVER_API_KEY,
            "taskId": task_id,
        }).encode()
        req = urllib.request.Request(
            "https://api.capsolver.com/getTaskResult",
            data=check,
            headers={"Content-Type": "application/json"},
        )
        result = json.loads(urllib.request.urlopen(req).read())
        if result["status"] == "ready":
            return result["solution"]["gRecaptchaResponse"]

    return None

The pattern is the same regardless of the CAPTCHA type: extract the site key from the page, send it to a solving service, inject the response token back into the page. Budget $2-5 per thousand solves depending on the provider and CAPTCHA type.

Wait Strategies That Prevent Detection

Naive waits are one of the most common reasons scrapes fail. Using the wrong wait strategy either triggers anti-bot detection (too fast) or wastes time (too slow).

networkidle vs domcontentloaded vs Custom Waits

Python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    # Option 1: domcontentloaded — fires when HTML is parsed
    # Fast, but JS-rendered content is not ready yet
    page.goto("https://example.com", wait_until="domcontentloaded")

    # Option 2: networkidle — waits until no network requests
    # for 500ms. Works for most sites but can hang on sites
    # with persistent WebSocket connections or polling.
    page.goto("https://example.com", wait_until="networkidle")

    # Option 3: Custom wait — the most reliable approach
    # Wait for a specific element that indicates content loaded
    page.goto("https://example.com", wait_until="domcontentloaded")
    page.wait_for_selector("div.product-list", timeout=10000)

    browser.close()

When to use each:

domcontentloaded -- Static sites, server-rendered pages. Fast and reliable.
networkidle -- Sites with standard JS rendering (React, Vue, Angular). Good default for dynamic rendering with headless browsers, but set a timeout.
Custom selector waits -- Best for known targets. Wait for the exact element you need to scrape.

Waiting a Fixed Duration

Sometimes you just need to wait for a specific duration. This is common when dealing with anti-bot challenges that need time to resolve.

Python

# Playwright's built-in timeout (non-blocking, better than time.sleep)
page.wait_for_timeout(5000)  # Wait for 5 seconds

# Conditional wait with timeout
try:
    page.wait_for_selector(
        "div.content",
        state="visible",
        timeout=5000,
    )
except Exception:
    # Element did not appear in 5 seconds — take a screenshot
    page.screenshot(path="debug_timeout.png")

Waiting for Dynamic Content

For SPAs and sites that load content via XHR/fetch after the initial page load:

Python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    # Wait for a specific API response before scraping
    with page.expect_response(
        lambda resp: "/api/products" in resp.url
        and resp.status == 200,
        timeout=15000,
    ) as response_info:
        page.goto(
            "https://spa-site.com/products",
            wait_until="domcontentloaded",
        )

    api_response = response_info.value
    data = api_response.json()
    print(f"Got {len(data['items'])} products from API")

    browser.close()

Screenshot Debugging for Anti-Bot Issues

When a scrape fails, a screenshot tells you more than any log message. Build screenshot debugging into your scraping pipeline from the start.

Python

import os
from datetime import datetime
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

DEBUG_DIR = "debug_screenshots"
os.makedirs(DEBUG_DIR, exist_ok=True)

def scrape_with_debug(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page(
            viewport={"width": 1920, "height": 1080}
        )
        stealth_sync(page)

        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        domain = (
            url.split("//")[-1].split("/")[0].replace(".", "_")
        )

        try:
            response = page.goto(
                url, wait_until="networkidle", timeout=30000
            )

            # Screenshot on non-200 status
            if response and response.status != 200:
                path = (
                    f"{DEBUG_DIR}/{domain}_{timestamp}"
                    f"_status{response.status}.png"
                )
                page.screenshot(path=path, full_page=True)
                print(
                    f"Non-200 status ({response.status}). "
                    f"Screenshot: {path}"
                )

            # Screenshot if challenge page detected
            content = page.content()
            challenge_markers = [
                "just a moment",
                "checking your browser",
                "access denied",
                "blocked",
                "captcha",
                "cf-challenge",
                "datadome",
            ]
            for marker in challenge_markers:
                if marker in content.lower():
                    slug = marker.replace(" ", "_")
                    path = (
                        f"{DEBUG_DIR}/{domain}"
                        f"_{timestamp}_{slug}.png"
                    )
                    page.screenshot(
                        path=path, full_page=True
                    )
                    print(
                        f"Challenge detected: {marker}. "
                        f"Screenshot: {path}"
                    )
                    break

            return content

        except Exception as e:
            path = (
                f"{DEBUG_DIR}/{domain}_{timestamp}_error.png"
            )
            try:
                page.screenshot(
                    path=path, full_page=True
                )
                print(f"Error: {e}. Screenshot: {path}")
            except Exception:
                print(
                    f"Error: {e}. "
                    f"Could not take screenshot."
                )
            return None

        finally:
            browser.close()

This pattern -- playwright screenshot on failure -- saves hours of debugging. When you see the actual challenge page or error screen, you know exactly which anti-bot system blocked you and at what stage.

When DIY Fails: Knowing the Limits

There is a ceiling to what stealth patches and browser configuration can achieve. The arms race between anti-bot systems and automation tools is constant, and the detection side has structural advantages:

TLS fingerprinting cannot be fixed from JavaScript. You would need to patch Chromium's TLS stack at the C++ level and rebuild the binary.
Behavioral analysis gets more sophisticated every month. Anti-bot vendors now use ML models trained on billions of real user sessions.
Maintaining stealth is a full-time job. Patches that work today break with the next Chrome update or anti-bot vendor release.
Scale amplifies problems. A technique that works for 100 pages per day fails at 10,000 because rate limiting and IP reputation compound.

If you are spending more time maintaining your stealth setup than building your actual product, it is time to consider a managed solution. Services like AlterLab handle the anti-bot layer at the infrastructure level -- TLS fingerprint rotation, browser profile management, residential proxy networks, and automatic challenge solving -- so your code stays a simple API call:

Python

import requests

response = requests.post(
    "https://api.alterlab.io/v1/scrape",
    headers={"X-API-Key": "your_api_key"},
    json={
        "url": "https://cloudflare-protected-site.com",
        "formats": ["html", "markdown"],
    },
)
data = response.json()
print(data["content"])

No stealth patches. No fingerprint maintenance. No CAPTCHA integration. The anti-bot bypass happens at the network level, which is fundamentally harder to detect than anything you can do from within a browser.

Quick Reference

Feature	Technique	Cloudflare	DataDome	PerimeterX
playwright-stealth
Browser context hardening
Cookie persistence
Request interception
Human-like behavior
CAPTCHA solving service
Managed API (AlterLab)

The honest summary: playwright-stealth plus browser context hardening will get you past basic bot detection and work on sites without dedicated anti-bot services. For Cloudflare, DataDome, and PerimeterX, you need a combination of stealth, behavioral simulation, and CAPTCHA solving. For reliable production-scale scraping against these systems, the cost-benefit math usually points toward a managed service.

Start with stealth patches. Add behavioral simulation when you hit challenges. Integrate CAPTCHA solving when you hit CAPTCHAs. And when the maintenance overhead crosses your tolerance threshold, switch to an API.

Was this article helpful?

Try it yourself

Stop fighting browser detection

AlterLab manages browser fingerprinting, header rotation, and challenge resolution so your Playwright scripts actually return data.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "render_js": true}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Playwright is harder to detect than Selenium because it does not inject cdc_ markers. However, it still sets navigator.webdriver to true, uses default headless fingerprints, and produces detectable TLS signatures. Anti-bot systems like Cloudflare, DataDome, and PerimeterX can still detect unpatched Playwright browsers.

Use playwright-extra with the stealth plugin to automatically patch common detection vectors. Override navigator.webdriver, spoof plugins and mime types, add canvas noise, and use headed mode instead of headless when possible. For advanced targets, combine with residential proxies and request throttling.

Playwright-extra is a modular wrapper around Playwright that supports plugins. The stealth plugin automatically patches known detection vectors including navigator.webdriver, chrome.runtime, plugin enumeration, language settings, and WebGL fingerprints. Install via npm install playwright-extra puppeteer-extra-plugin-stealth.

Playwright supports multiple browser engines (Chromium, Firefox, WebKit), has better auto-waiting, supports multiple browser contexts for parallel scraping, and has a more modern API. Puppeteer only supports Chromium. For scraping, Playwright is generally the better choice unless you need Puppeteer-specific ecosystem tools.

Yash Dubey

View all posts

Tutorials

Monster Data API: Extract Structured JSON in 2026

Learn how to build a high-scale data pipeline using a Monster data API to retrieve structured job information in JSON format without manual HTML parsing.

Herald Blog Service

Jul 17, 2026

Tutorials

ZipRecruiter Data API: Extract Structured JSON in 2026

Learn how to get structured ZipRecruiter data via API using AlterLab's Extract API for typed JSON output, pagination, and scalable pipelines.

Herald Blog Service

Jul 17, 2026

Tutorials

How to Scrape Google Scholar Data: Complete Guide for 2026

Learn how to scrape Google Scholar for public academic data using Python and Node.js with AlterLab's API, handling anti-bot protections and extracting structured results.

Herald Blog Service

Jul 17, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Why Playwright Gets Detected

The navigator.webdriver Flag

Headless Browser Tells

JavaScript Fingerprinting

Detection Vectors Specific to Playwright

CDP Protocol Leak

TLS Fingerprinting (JA3/JA4)

Stealth Techniques That Work

playwright-stealth

Browser Context Hardening

Cookie and Session Persistence

Request Interception

Handling Cloudflare, DataDome, and PerimeterX

Detect the provider

Apply provider-specific patches

Handle challenges

Validate success

Cloudflare

DataDome

PerimeterX (now HUMAN Security)

CAPTCHA Handling in Playwright

Cloudflare Turnstile

hCaptcha

Wait Strategies That Prevent Detection

networkidle vs domcontentloaded vs Custom Waits

Waiting a Fixed Duration

Waiting for Dynamic Content

Screenshot Debugging for Anti-Bot Issues

When DIY Fails: Knowing the Limits

Quick Reference

Frequently Asked Questions

Related Articles

Monster Data API: Extract Structured JSON in 2026

ZipRecruiter Data API: Extract Structured JSON in 2026

How to Scrape Google Scholar Data: Complete Guide for 2026

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X: Complete Guide for 2026

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources