Is Puppeteer still worth using in 2026?

Puppeteer is still a solid tool for browser automation and testing. For web scraping specifically, Microsoft's Playwright has largely superseded it — Playwright supports more browsers, has a more ergonomic async API, and is actively developed. For high-volume production scraping, a managed rendering API is usually more practical than either.

What is the difference between Puppeteer and Cheerio?

Puppeteer launches a full Chrome browser and executes JavaScript — necessary for pages that load content dynamically. Cheerio is a lightweight HTML parser that works on already-fetched HTML — it cannot execute JavaScript. Cheerio is much faster and uses far less memory, but only works when you already have the rendered HTML (e.g., fetched via a rendering API like AlterLab).

How do I avoid getting blocked with Puppeteer?

Set a realistic user agent and viewport. Add delays between requests. Rotate IP addresses through a proxy service. Avoid patterns that expose automation: predictable timing, missing browser headers, or unusual navigator properties. Even with these measures, sites with compatibility layers may identify and restrict automated browser traffic. A managed rendering API handles this infrastructure automatically.

Can Puppeteer handle JavaScript-heavy pages?

Yes — executing JavaScript is Puppeteer's primary purpose. Use waitForSelector() to wait for specific elements to load, or waitForResponse() to intercept the underlying API calls that load dynamic content. Intercepting API calls is often more reliable than DOM polling, since API responses are structured JSON.

How does Puppeteer compare to Selenium?

Both automate browsers. Puppeteer controls Chrome/Chromium specifically via the DevTools Protocol — faster and more feature-rich for Chrome. Selenium supports multiple browsers (Chrome, Firefox, Safari, Edge) through WebDriver — more portable but generally slower. For modern web scraping, Puppeteer or Playwright are usually preferred over Selenium.

Puppeteer GuideNode.js / JavaScript

Web Scraping with Puppeteer — Complete Guide

How Puppeteer works for web scraping, working code examples, and when a cloud rendering API replaces the need to run Chrome locally.

Start Free — 5,000 Requests JS Rendering API

Puppeteer is a Node.js library from Google that provides a high-level API to control headless Chrome. It is a natural choice for scraping JavaScript-heavy pages — it executes JavaScript, handles dynamic content, and lets you interact with the page before extracting data. This guide covers installation, basic and advanced scraping patterns, common pitfalls, and the practical tradeoffs of running Chrome locally versus using a managed rendering API.

Installing Puppeteer

Puppeteer ships with a bundled version of Chrome. The full package downloads Chrome automatically on install.

npm install puppeteer
# or lightweight version (bring your own Chrome)
npm install puppeteer-core

Your First Puppeteer Scraper

The core Puppeteer pattern: launch a browser, open a page, navigate to a URL, and extract data. Always close the browser when done — leaving browser processes open leaks memory.

import puppeteer from "puppeteer";

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();

await page.goto("https://example.com/products", {
  waitUntil: "networkidle2",
  timeout: 30000,
});

// Extract data from the DOM
const products = await page.evaluate(() => {
  return Array.from(document.querySelectorAll("div.product-card")).map((el) => ({
    title: el.querySelector("h2")?.textContent?.trim() ?? "",
    price: el.querySelector(".price")?.textContent?.trim() ?? "",
    url: el.querySelector("a")?.href ?? "",
  }));
});

await browser.close();
console.log(`Extracted ${products.length} products:`, products);

Waiting for Dynamic Content

The most common Puppeteer challenge: knowing when the page has loaded the data you need. Puppeteer provides several wait strategies.

// Wait for specific selector to appear
await page.waitForSelector("div.product-card", { timeout: 10000 });

// Wait for network to settle (no requests for 500ms)
await page.goto(url, { waitUntil: "networkidle2" });

// Wait for XHR response — often cleaner than DOM polling
const [response] = await Promise.all([
  page.waitForResponse((res) => res.url().includes("/api/products")),
  page.goto(url),
]);
const data = await response.json(); // structured data from the API
console.log("API response:", data);

Setting User Agent and Viewport

Default Puppeteer configuration can be identified by sites with compatibility layers. Set a realistic user agent and viewport to more closely match a real browser.

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();

await page.setUserAgent(
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " +
  "(KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
);
await page.setViewport({ width: 1280, height: 900 });
await page.setExtraHTTPHeaders({
  "Accept-Language": "en-US,en;q=0.9",
});

Handling Pagination with Puppeteer

Navigate through paginated sites by clicking the next-page button or constructing page URLs.

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
const allResults = [];

let currentPage = 1;
while (currentPage <= 20) {
  await page.goto(`https://example.com/articles?page=${currentPage}`, {
    waitUntil: "networkidle2",
  });

  const items = await page.$$eval(".article-card", (els) =>
    els.map((el) => ({
      title: el.querySelector("h2")?.textContent?.trim() ?? "",
      date: el.querySelector("time")?.getAttribute("datetime") ?? "",
    }))
  );

  if (items.length === 0) break;
  allResults.push(...items);

  const nextBtn = await page.$("a.next-page");
  if (!nextBtn) break;
  currentPage++;
  await new Promise((r) => setTimeout(r, 1000)); // polite delay
}

await browser.close();
console.log(`Collected ${allResults.length} articles`);

Practical Limitations for Production Scraping

Puppeteer is excellent for development and interactive scraping, but has significant costs at production scale:

Memory: Each Chrome instance uses 200–500 MB. Running 10 parallel scrapers requires 2–5 GB RAM.

Speed: Browser launch + page render takes 3–15 seconds per page. A 100-page scrape takes 5–25 minutes.

Maintenance: Bundled Chrome updates require dependency updates. Chrome version mismatches cause failures.

Detection: Headless Chrome exposes signals through navigator properties, timing, and rendering characteristics. Sites with compatibility layers often identify and restrict headless traffic.

When Puppeteer is the right choice: Interaction-heavy flows (login, form submission, multi-step navigation), browser testing/QA, or low-volume one-off data collection.

When a rendering API is more practical: High-volume production scraping, when you cannot maintain browser infrastructure, or when you need reliable IP rotation without additional proxy setup.

Complete Puppeteer Scraper — Multi-Page Data Collection

Complete working Puppeteer scraper with pagination, realistic browser configuration, and error handling.

import puppeteer from "puppeteer";
import { writeFileSync } from "fs";

async function scrapeSite(baseUrl, maxPages = 10) {
  const browser = await puppeteer.launch({
    headless: true,
    args: ["--no-sandbox", "--disable-setuid-sandbox"],
  });

  const page = await browser.newPage();
  await page.setUserAgent(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " +
    "(KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
  );
  await page.setViewport({ width: 1280, height: 900 });

  const allResults = [];
  let pageNum = 1;

  try {
    while (pageNum <= maxPages) {
      const url = `${baseUrl}?page=${pageNum}`;
      console.log(`Scraping ${url}...`);

      await page.goto(url, { waitUntil: "networkidle2", timeout: 30000 });

      try {
        await page.waitForSelector(".product-card", { timeout: 5000 });
      } catch {
        console.log("No products found — stopping");
        break;
      }

      const products = await page.$$eval(".product-card", (els) =>
        els.map((el) => ({
          title: el.querySelector("h2")?.textContent?.trim() ?? "",
          price: el.querySelector(".price")?.textContent?.trim() ?? "",
          url: el.querySelector("a")?.href ?? "",
        }))
      );

      if (products.length === 0) break;
      allResults.push(...products);

      const hasNext = await page.$("a.next-page");
      if (!hasNext) break;

      pageNum++;
      await new Promise((r) => setTimeout(r, 1500)); // polite delay
    }
  } finally {
    await browser.close();
  }

  return allResults;
}

const results = await scrapeSite("https://example.com/products");
writeFileSync("products.json", JSON.stringify(results, null, 2));
console.log(`Saved ${results.length} products`);

Same Result, No Chrome Process

When you just need rendered HTML — not complex browser interactions — AlterLab handles the browser server-side. No Puppeteer install, no Chrome binary, no memory overhead. From $0.0002/request with 5,000 free requests to start.

import * as cheerio from "cheerio";
import { writeFileSync } from "fs";

const API_KEY = "YOUR_API_KEY"; // Get free at alterlab.io

async function scrapeWithAlterLab(url, renderJs = false) {
  const response = await fetch("https://api.alterlab.io/api/v1/scrape", {
    method: "POST",
    headers: {
      "X-API-Key": API_KEY,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ url, render_js: renderJs }),
    signal: AbortSignal.timeout(30000),
  });
  if (!response.ok) throw new Error(`API error: ${response.status}`);
  const data = await response.json();
  return data.html ?? "";
}

const allResults = [];
for (let pageNum = 1; pageNum <= 10; pageNum++) {
  const url = `https://example.com/products?page=${pageNum}`;
  const html = await scrapeWithAlterLab(url, true); // render_js: true

  const $ = cheerio.load(html);
  const products = [];
  $(".product-card").each((_i, el) => {
    const title = $(el).find("h2").text().trim();
    const price = $(el).find(".price").text().trim();
    if (title) products.push({ title, price });
  });

  if (products.length === 0) break;
  allResults.push(...products);
}

writeFileSync("products.json", JSON.stringify(allResults, null, 2));
console.log(`Saved ${allResults.length} products — no Chrome process running`);

Get Free API Key JavaScript Rendering API

Puppeteer vs Alternatives

Puppeteer (local Chrome)

Pros

+Full browser interaction
+Intercept network requests
+Free to run

Cons

−200–500 MB per browser instance
−3–15 seconds per page
−Chrome detection is common
−Complex scaling and crash handling

Puppeteer + proxies

Pros

+Handles IP-based rate limiting
+More reliable on protected sites

Cons

−Proxy cost + browser cost
−Complex proxy rotation setup
−Still slow and memory-heavy

AlterLab rendering API

Pros

+No Chrome management
+Automatic IP rotation
+5-tier compatibility escalation
+From $0.0002/request
+No CPU or memory overhead

Cons

−Per-request cost
−Cannot perform complex UI interactions

Frequently Asked Questions

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.
Just a POST request.

terminal

curl -X POST https://api.alterlab.io/v1/scrape \

-H "X-API-Key: YOUR_KEY" \

-H "Content-Type: application/json" \

-d '{"url": "https://example.com", "formats": ["markdown"]}'

Start building free

No credit card required · Up to 5,000 free scrapes · Balance never expires

Web Scraping with Puppeteer — Complete Guide

Installing Puppeteer

Your First Puppeteer Scraper

Waiting for Dynamic Content

Setting User Agent and Viewport

Handling Pagination with Puppeteer

Practical Limitations for Production Scraping

Complete Puppeteer Scraper — Multi-Page Data Collection

Same Result, No Chrome Process

Puppeteer vs Alternatives

Puppeteer (local Chrome)

Puppeteer + proxies

AlterLab rendering API

Frequently Asked Questions

More Browser Scraping Resources

Web Scraping with Playwright

Web Scraping with Node.js

JavaScript Rendering API

Anti-Bot Handling API

Your first scrape.
Sixty seconds.

Web Scraping with Puppeteer — Complete Guide

Installing Puppeteer

Your First Puppeteer Scraper

Waiting for Dynamic Content

Setting User Agent and Viewport

Handling Pagination with Puppeteer

Practical Limitations for Production Scraping

Complete Puppeteer Scraper — Multi-Page Data Collection

Same Result, No Chrome Process

Puppeteer vs Alternatives

Puppeteer (local Chrome)

Puppeteer + proxies

AlterLab rendering API

Frequently Asked Questions

Is Puppeteer still worth using in 2026?

What is the difference between Puppeteer and Cheerio?

How do I avoid getting blocked with Puppeteer?

Can Puppeteer handle JavaScript-heavy pages?

How does Puppeteer compare to Selenium?

More Browser Scraping Resources

Web Scraping with Playwright

Web Scraping with Node.js

JavaScript Rendering API

Anti-Bot Handling API

Your first scrape. Sixty seconds.

Your first scrape.
Sixty seconds.