Playwright GuidePython & Node.js

Web Scraping with Playwright — Complete Guide

How Playwright works, how to set up a scraper, and when a cloud rendering API is more practical than running browsers yourself.

Playwright is a browser automation library from Microsoft — it controls Chromium, Firefox, and WebKit programmatically. It is a strong tool for scraping JavaScript-heavy pages that require real browser execution: single-page applications, pages with infinite scroll, forms that need to be filled, and content that only loads after user interactions. This guide covers setup, basic scraping patterns, and the practical tradeoffs of running browsers locally versus using a managed rendering API.

Installing Playwright

Playwright is available for Python and Node.js. Install the package and then download the browser binaries.

# Python
pip install playwright
playwright install chromium

# Node.js
npm install playwright
npx playwright install chromium

Your First Playwright Scraper

Playwright's API is straightforward: launch a browser, open a page, navigate to a URL, and query the DOM. The async API is the standard for production code.

import asyncio
from playwright.async_api import async_playwright

async def scrape_page(url: str) -> list[dict]:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")

        products = await page.eval_on_selector_all(
            "div.product-card",
            """els => els.map(el => ({
                title: el.querySelector('h2')?.textContent?.trim() ?? '',
                price: el.querySelector('.price')?.textContent?.trim() ?? '',
            }))"""
        )

        await browser.close()
        return products

results = asyncio.run(scrape_page("https://example.com/products"))

Handling Dynamic Content and Waiting

The most common Playwright challenge: knowing when the page has loaded enough data to scrape. Playwright provides several wait strategies — use the most specific one for your target page.

# Wait for network to settle (no requests for 500ms)
await page.goto(url, wait_until="networkidle")

# Wait for a specific element to appear
await page.wait_for_selector("div.product-card", timeout=10000)

# Wait for a specific number of elements
await page.wait_for_function("document.querySelectorAll('div.product-card').length > 0")

# Wait for an XHR response
async with page.expect_response(lambda r: "/api/products" in r.url) as response_info:
    await page.goto(url)
response = await response_info.value
data = await response.json()  # often easier than DOM scraping

Handling Infinite Scroll

Infinite-scroll pages load more content as you scroll down. Use Playwright to scroll the page incrementally and wait for new content to load before scrolling again.

async def scrape_infinite_scroll(url: str, max_scrolls: int = 10) -> list[str]:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url, wait_until="networkidle")

        all_items: set[str] = set()

        for _ in range(max_scrolls):
            items = await page.eval_on_selector_all(
                ".item-title",
                "els => els.map(el => el.textContent.trim())"
            )
            all_items.update(items)

            prev_count = len(all_items)
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(2000)  # wait for new content

            new_items = await page.eval_on_selector_all(
                ".item-title",
                "els => els.map(el => el.textContent.trim())"
            )
            all_items.update(new_items)

            if len(all_items) == prev_count:
                break  # no new content loaded

        await browser.close()
        return list(all_items)

Practical Limitations of Running Playwright Locally

Playwright is powerful, but comes with significant operational costs for production scraping:

Memory: Each browser instance uses 300–500 MB. Scaling to 20 concurrent browsers requires 6–10 GB of RAM.

Speed: Browser startup takes 1–3 seconds. Page load takes 3–10 seconds per page. Throughput is low compared to API-based approaches.

Detection: Headless browsers can be identified by timing patterns, navigator properties, and rendering characteristics. Many sites with compatibility layers identify and block automated browser traffic.

Infrastructure: You need to manage Chromium binaries, handle crashes, implement restarts, and configure proxy rotation yourself.

When to use Playwright locally: Complex interaction sequences (login flows, multi-step forms), testing/QA pipelines, or one-off data collection runs.

When to use a rendering API instead: Production scraping of JavaScript-heavy pages at scale, when you need reliable IP rotation, when you cannot maintain browser infrastructure.

Extracting Data After Rendering — Playwright vs API

Both approaches produce the same outcome: rendered HTML you can parse. The difference is where the browser runs.

# Approach A: Local Playwright browser
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.launch()
    page = await browser.new_page()
    await page.goto("https://example.com/spa-page", wait_until="networkidle")
    html = await page.content()
    await browser.close()

# Approach B: AlterLab rendering API (same result, no browser management)
import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_KEY", "Content-Type": "application/json"},
    json={"url": "https://example.com/spa-page", "render_js": True},
)
html = response.json()["html"]

# Either way, parse with BeautifulSoup or lxml
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")

Playwright Scraper — SPA with Pagination

Complete working Playwright scraper with pagination, realistic browser settings, and error handling.

import asyncio
from playwright.async_api import async_playwright
import json

async def scrape_spa(base_url: str, max_pages: int = 10) -> list[dict]:
    results = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (compatible; DataBot/1.0)",
            viewport={"width": 1280, "height": 900},
        )
        page = await context.new_page()

        for page_num in range(1, max_pages + 1):
            url = f"{base_url}?page={page_num}"
            print(f"Scraping page {page_num}...")

            await page.goto(url, wait_until="networkidle", timeout=30000)
            await page.wait_for_selector(".product-card", timeout=10000)

            items = await page.eval_on_selector_all(
                ".product-card",
                """els => els.map(el => ({
                    title: el.querySelector('h2')?.textContent?.trim() ?? '',
                    price: el.querySelector('.price')?.textContent?.trim() ?? '',
                    url: el.querySelector('a')?.href ?? '',
                }))"""
            )

            if not items:
                print(f"No items on page {page_num} — stopping")
                break

            results.extend(items)

            # Check for next page
            next_btn = await page.query_selector("a.next-page")
            if not next_btn:
                break

        await browser.close()

    return results

results = asyncio.run(scrape_spa("https://example.com/products"))
with open("products.json", "w") as f:
    json.dump(results, f, indent=2)
print(f"Saved {len(results)} products")

Same Result, No Browser Process

When you just need rendered HTML — not complex interactions — AlterLab handles the browser server-side. No Playwright install, no browser binary management, no memory overhead. From $0.0002/request with 5,000 free requests to start.

import requests
from bs4 import BeautifulSoup
import json

API_KEY = "YOUR_API_KEY"  # Get free at alterlab.io

def scrape_spa_page(url: str) -> list[dict]:
    """AlterLab renders the page server-side — no local browser required."""
    response = requests.post(
        "https://api.alterlab.io/api/v1/scrape",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json={"url": url, "render_js": True},
        timeout=30,
    )
    response.raise_for_status()
    html = response.json().get("html", "")

    soup = BeautifulSoup(html, "lxml")
    return [
        {
            "title": card.select_one("h2").get_text(strip=True),
            "price": card.select_one(".price").get_text(strip=True),
        }
        for card in soup.select(".product-card")
        if card.select_one("h2") and card.select_one(".price")
    ]

all_results = []
for page_num in range(1, 11):
    url = f"https://example.com/products?page={page_num}"
    items = scrape_spa_page(url)
    if not items:
        break
    all_results.extend(items)

with open("products.json", "w") as f:
    json.dump(all_results, f, indent=2)
print(f"Saved {len(all_results)} products — no browser process running")

Playwright vs Alternatives

Playwright (local browser)

Pros

  • +Full browser interaction (clicks, forms, scroll)
  • +Free to run
  • +Direct DOM access

Cons

  • 300–500 MB per browser instance
  • 3–10 seconds per page
  • Browser detection common
  • Complex infrastructure management
  • Crashes require restart logic

Playwright + proxy rotation (DIY)

Pros

  • +Handles IP-based rate limiting
  • +More reliable than plain browser

Cons

  • Proxy cost + browser cost
  • Complex integration
  • Still slow and memory-heavy
  • Detection still possible

AlterLab rendering API

Pros

  • +No browser management
  • +Automatic IP rotation
  • +5-tier compatibility escalation
  • +From $0.0002/request
  • +No memory or CPU overhead

Cons

  • Per-request cost
  • Cannot perform complex interactions

Frequently Asked Questions

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    Web Scraping with Playwright 2026 — Setup, Code & Limitations | AlterLab | AlterLab