When should I use Playwright vs a scraping API?

Use Playwright when you need complex browser interactions: logging in, filling forms, clicking through multi-step flows, or triggering user events. Use a scraping API like AlterLab when you just need the rendered HTML — you get the same output without managing browser processes, handling crashes, or dealing with detection challenges.

Can Playwright scrape JavaScript-rendered pages?

Yes — that is Playwright's primary strength. It launches a real Chromium browser, executes JavaScript, and lets you wait for specific elements before extracting data. The tradeoff is resource cost: each browser instance uses hundreds of megabytes and takes seconds to start.

Does Playwright work with Python?

Yes. Playwright has official Python and Node.js libraries. Install with `pip install playwright && playwright install chromium`. The Python API mirrors the JavaScript API closely — both support async/await patterns.

Is Playwright detected as a bot?

Headless browsers can be identified through various signals including navigator properties, timing patterns, and rendering characteristics. Some sites with compatibility layers identify and restrict automated browser traffic. Detection rate varies significantly by site. Running browsers through residential proxies reduces IP-based detection but not fingerprint-based detection.

How do I wait for content to load in Playwright?

The most reliable wait strategy depends on your target page. Use `wait_until='networkidle'` for pages that settle after loading, `wait_for_selector()` to wait for a specific element to appear, or `wait_for_function()` to wait for a custom JavaScript condition. Avoid fixed time delays — they break when the page loads faster or slower than expected.

Playwright GuidePython & Node.js

Web Scraping with Playwright — Complete Guide

How Playwright works, how to set up a scraper, and when a cloud rendering API is more practical than running browsers yourself.

Start Free — 5,000 Requests JS Rendering API

Playwright is a browser automation library from Microsoft — it controls Chromium, Firefox, and WebKit programmatically. It is a strong tool for scraping JavaScript-heavy pages that require real browser execution: single-page applications, pages with infinite scroll, forms that need to be filled, and content that only loads after user interactions. This guide covers setup, basic scraping patterns, and the practical tradeoffs of running browsers locally versus using a managed rendering API.

Installing Playwright

Playwright is available for Python and Node.js. Install the package and then download the browser binaries.

# Python
pip install playwright
playwright install chromium

# Node.js
npm install playwright
npx playwright install chromium

Your First Playwright Scraper

Playwright's API is straightforward: launch a browser, open a page, navigate to a URL, and query the DOM. The async API is the standard for production code.

import asyncio
from playwright.async_api import async_playwright

async def scrape_page(url: str) -> list[dict]:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")

        products = await page.eval_on_selector_all(
            "div.product-card",
            """els => els.map(el => ({
                title: el.querySelector('h2')?.textContent?.trim() ?? '',
                price: el.querySelector('.price')?.textContent?.trim() ?? '',
            }))"""
        )

        await browser.close()
        return products

results = asyncio.run(scrape_page("https://example.com/products"))

Handling Dynamic Content and Waiting

The most common Playwright challenge: knowing when the page has loaded enough data to scrape. Playwright provides several wait strategies — use the most specific one for your target page.

# Wait for network to settle (no requests for 500ms)
await page.goto(url, wait_until="networkidle")

# Wait for a specific element to appear
await page.wait_for_selector("div.product-card", timeout=10000)

# Wait for a specific number of elements
await page.wait_for_function("document.querySelectorAll('div.product-card').length > 0")

# Wait for an XHR response
async with page.expect_response(lambda r: "/api/products" in r.url) as response_info:
    await page.goto(url)
response = await response_info.value
data = await response.json()  # often easier than DOM scraping

Handling Infinite Scroll

Infinite-scroll pages load more content as you scroll down. Use Playwright to scroll the page incrementally and wait for new content to load before scrolling again.

async def scrape_infinite_scroll(url: str, max_scrolls: int = 10) -> list[str]:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")

        all_items: set[str] = set()

        for _ in range(max_scrolls):
            items = await page.eval_on_selector_all(
                ".item-title",
                "els => els.map(el => el.textContent.trim())"
            )
            all_items.update(items)

            prev_count = len(all_items)
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(2000)  # wait for new content

            new_items = await page.eval_on_selector_all(
                ".item-title",
                "els => els.map(el => el.textContent.trim())"
            )
            all_items.update(new_items)

            if len(all_items) == prev_count:
                break  # no new content loaded

        await browser.close()
        return list(all_items)

Practical Limitations of Running Playwright Locally

Playwright is powerful, but comes with significant operational costs for production scraping:

Memory: Each browser instance uses 300–500 MB. Scaling to 20 concurrent browsers requires 6–10 GB of RAM.

Speed: Browser startup takes 1–3 seconds. Page load takes 3–10 seconds per page. Throughput is low compared to API-based approaches.

Detection: Headless browsers can be identified by timing patterns, navigator properties, and rendering characteristics. Many sites with compatibility layers identify and block automated browser traffic.

Infrastructure: You need to manage Chromium binaries, handle crashes, implement restarts, and configure proxy rotation yourself.

When to use Playwright locally: Complex interaction sequences (login flows, multi-step forms), testing/QA pipelines, or one-off data collection runs.

When to use a rendering API instead: Production scraping of JavaScript-heavy pages at scale, when you need reliable IP rotation, when you cannot maintain browser infrastructure.

Extracting Data After Rendering — Playwright vs API

Both approaches produce the same outcome: rendered HTML you can parse. The difference is where the browser runs.

# Approach A: Local Playwright browser
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.launch()
    page = await browser.new_page()
    await page.goto("https://example.com/spa-page", wait_until="networkidle")
    html = await page.content()
    await browser.close()

# Approach B: AlterLab rendering API (same result, no browser management)
import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY", "Content-Type": "application/json"},
    json={"url": "https://example.com/spa-page", "render_js": True},
)
html = response.json()["html"]

# Either way, parse with BeautifulSoup or lxml
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")

Playwright Scraper — SPA with Pagination

Complete working Playwright scraper with pagination, realistic browser settings, and error handling.

import asyncio
from playwright.async_api import async_playwright
import json

async def scrape_spa(base_url: str, max_pages: int = 10) -> list[dict]:
    results = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (compatible; DataBot/1.0)",
            viewport={"width": 1280, "height": 900},
        )
        page = await context.new_page()

        for page_num in range(1, max_pages + 1):
            url = f"{base_url}?page={page_num}"
            print(f"Scraping page {page_num}...")

            await page.goto(url, wait_until="networkidle", timeout=30000)
            await page.wait_for_selector(".product-card", timeout=10000)

            items = await page.eval_on_selector_all(
                ".product-card",
                """els => els.map(el => ({
                    title: el.querySelector('h2')?.textContent?.trim() ?? '',
                    price: el.querySelector('.price')?.textContent?.trim() ?? '',
                    url: el.querySelector('a')?.href ?? '',
                }))"""
            )

            if not items:
                print(f"No items on page {page_num} — stopping")
                break

            results.extend(items)

            # Check for next page
            next_btn = await page.query_selector("a.next-page")
            if not next_btn:
                break

        await browser.close()

    return results

results = asyncio.run(scrape_spa("https://example.com/products"))
with open("products.json", "w") as f:
    json.dump(results, f, indent=2)
print(f"Saved {len(results)} products")

Same Result, No Browser Process

When you just need rendered HTML — not complex interactions — AlterLab handles the browser server-side. No Playwright install, no browser binary management, no memory overhead. From $0.0002/request with 5,000 free requests to start.

import requests
from bs4 import BeautifulSoup
import json

API_KEY = "YOUR_API_KEY"  # Get free at alterlab.io

def scrape_spa_page(url: str) -> list[dict]:
    """AlterLab renders the page server-side — no local browser required."""
    response = requests.post(
        "https://api.alterlab.io/api/v1/scrape",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"url": url, "render_js": True},
        timeout=30,
    )
    response.raise_for_status()
    html = response.json().get("html", "")

    soup = BeautifulSoup(html, "lxml")
    return [
        {
            "title": card.select_one("h2").get_text(strip=True),
            "price": card.select_one(".price").get_text(strip=True),
        }
        for card in soup.select(".product-card")
        if card.select_one("h2") and card.select_one(".price")
    ]

all_results = []
for page_num in range(1, 11):
    url = f"https://example.com/products?page={page_num}"
    items = scrape_spa_page(url)
    if not items:
        break
    all_results.extend(items)

with open("products.json", "w") as f:
    json.dump(all_results, f, indent=2)
print(f"Saved {len(all_results)} products — no browser process running")

Get Free API Key JavaScript Rendering API

Playwright vs Alternatives

Playwright (local browser)

Pros

+Full browser interaction (clicks, forms, scroll)
+Free to run
+Direct DOM access

Cons

−300–500 MB per browser instance
−3–10 seconds per page
−Browser detection common
−Complex infrastructure management
−Crashes require restart logic

Playwright + proxy rotation (DIY)

Pros

+Handles IP-based rate limiting
+More reliable than plain browser

Cons

−Proxy cost + browser cost
−Complex integration
−Still slow and memory-heavy
−Detection still possible

AlterLab rendering API

Pros

+No browser management
+Automatic IP rotation
+5-tier compatibility escalation
+From $0.0002/request
+No memory or CPU overhead

Cons

−Per-request cost
−Cannot perform complex interactions

Frequently Asked Questions

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.
Just a POST request.

terminal

curl -X POST https://api.alterlab.io/v1/scrape \

-H "X-API-Key: YOUR_KEY" \

-H "Content-Type: application/json" \

-d '{"url": "https://example.com", "formats": ["markdown"]}'

Start building free

No credit card required · Up to 5,000 free scrapes · Balance never expires

Web Scraping with Playwright — Complete Guide

Installing Playwright

Your First Playwright Scraper

Handling Dynamic Content and Waiting

Handling Infinite Scroll

Practical Limitations of Running Playwright Locally

Extracting Data After Rendering — Playwright vs API

Playwright Scraper — SPA with Pagination

Same Result, No Browser Process

Playwright vs Alternatives

Playwright (local browser)

Playwright + proxy rotation (DIY)

AlterLab rendering API

Frequently Asked Questions

More Browser Scraping Resources

Web Scraping with Puppeteer

Web Scraping with Python

JavaScript Rendering API

Anti-Bot Handling API

Your first scrape.
Sixty seconds.

Web Scraping with Playwright — Complete Guide

Installing Playwright

Your First Playwright Scraper

Handling Dynamic Content and Waiting

Handling Infinite Scroll

Practical Limitations of Running Playwright Locally

Extracting Data After Rendering — Playwright vs API

Playwright Scraper — SPA with Pagination

Same Result, No Browser Process

Playwright vs Alternatives

Playwright (local browser)

Playwright + proxy rotation (DIY)

AlterLab rendering API

Frequently Asked Questions

When should I use Playwright vs a scraping API?

Can Playwright scrape JavaScript-rendered pages?

Does Playwright work with Python?

Is Playwright detected as a bot?

How do I wait for content to load in Playwright?

More Browser Scraping Resources

Web Scraping with Puppeteer

Web Scraping with Python

JavaScript Rendering API

Anti-Bot Handling API

Your first scrape. Sixty seconds.

Your first scrape.
Sixty seconds.