Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Puppeteer is Google's official Node.js library for controlling Chrome and Chromium. Unlike HTTPbased scraping tools like Axios or Cheerio, Puppeteer runs a...
Yash Dubey
February 19, 2026
Puppeteer is Google's official Node.js library for controlling Chrome and Chromium. Unlike HTTP-based scraping tools like Axios or Cheerio, Puppeteer runs a real browser. That means it can render JavaScript, click buttons, fill forms, scroll through infinite feeds, and screenshot pages exactly as a human would see them.
This guide walks through every technique you will need to build production-grade scrapers with Puppeteer in 2026 — from basic page fetches to handling anti-bot systems, proxy rotation, and browser pooling. Every code example is tested and ready to use.
Why Puppeteer for Web Scraping
HTTP-only scrapers (Axios + Cheerio, Got + JSDOM) send a raw request and parse whatever HTML comes back. That works for static sites. But the modern web runs on JavaScript. React, Vue, and Angular apps render their content client-side. Product listings load via XHR calls after the initial page load. Prices appear only after a scroll event fires.
Puppeteer solves this by giving you a full Chromium instance. You get:
- JavaScript execution — SPAs render completely before you extract data
- User interaction — click buttons, fill forms, navigate pagination
- Network control — intercept requests, block resources, capture API responses
- Screenshots and PDFs — visual verification and archival
- Cookie and session management — handle logins and authenticated scraping
The tradeoff is resource usage. Each Puppeteer instance launches a Chromium process that consumes 100-300 MB of RAM. For static HTML scraping, Cheerio is 50x lighter. Use Puppeteer when the target site requires a real browser to produce the data you need.
Setup and Installation
You need Node.js 18+ (LTS recommended). Puppeteer bundles its own Chromium binary, so there is nothing else to install.
mkdir my-scraper && cd my-scraper
npm init -y
npm install puppeteerIf you are deploying to a server and want to use an existing Chrome installation instead:
npm install puppeteer-coreWith puppeteer-core, you pass the browser path manually — useful for Docker containers where you install Chromium via apt.
Verify the installation:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.title();
console.log(`Page title: ${title}`);
await browser.close();
})();Run it with node index.js. You should see "Page title: Example Domain" in your terminal.
Basic Page Scraping
Every Puppeteer scraper follows the same pattern: launch a browser, open a page, navigate to a URL, wait for content, extract data, and close.
const puppeteer = require('puppeteer');
async function scrapeProduct(url) {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
// Set a realistic viewport and user agent
await page.setViewport({ width: 1920, height: 1080 });
await page.setUserAgent(
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' +
'(KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36'
);
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
// Wait for a specific element before extracting
await page.waitForSelector('.product-title', { timeout: 10000 });
const product = await page.evaluate(() => {
return {
title: document.querySelector('.product-title')?.textContent.trim(),
price: document.querySelector('.product-price')?.textContent.trim(),
description: document.querySelector('.product-desc')?.textContent.trim(),
inStock: !document.querySelector('.out-of-stock'),
};
});
await browser.close();
return product;
}Key points:
waitUntil: 'networkidle2'waits until there are no more than 2 network connections for 500ms. This handles most SPA rendering.waitForSelectorensures the element you need is actually in the DOM before you try to read it.page.evaluateruns JavaScript inside the browser context. The function you pass has access todocument,window, and the full DOM — but not your Node.js variables.
Handling Dynamic Content
Single-Page Applications
SPAs often load data after the initial HTML. The page source is just a <div id="root"></div> with a JavaScript bundle. Puppeteer handles this by default since it runs the JavaScript, but you need to wait for the right moment.
// Wait for data to render (not just the shell)
await page.waitForFunction(
() => document.querySelectorAll('.product-card').length > 0,
{ timeout: 15000 }
);Infinite Scroll
Many sites load content as you scroll down. You need to simulate scrolling and wait for new items to appear.
async function scrapeInfiniteScroll(page, maxItems = 100) {
let items = [];
let previousHeight = 0;
while (items.length < maxItems) {
// Scroll to bottom
previousHeight = await page.evaluate(() => document.body.scrollHeight);
await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
// Wait for new content to load
try {
await page.waitForFunction(
`document.body.scrollHeight > ${previousHeight}`,
{ timeout: 5000 }
);
} catch {
break; // No more content to load
}
// Small delay for rendering
await new Promise(r => setTimeout(r, 1000));
// Extract current items
items = await page.evaluate(() =>
[...document.querySelectorAll('.item')].map(el => ({
name: el.querySelector('.name')?.textContent.trim(),
price: el.querySelector('.price')?.textContent.trim(),
}))
);
}
return items.slice(0, maxItems);
}Lazy-Loaded Images
Images with loading="lazy" or intersection observer patterns only load when they enter the viewport. Scroll them into view first:
async function loadLazyImages(page) {
await page.evaluate(async () => {
const images = document.querySelectorAll('img[loading="lazy"]');
for (const img of images) {
img.scrollIntoView({ behavior: 'instant' });
await new Promise(r => setTimeout(r, 200));
}
// Scroll back to top
window.scrollTo(0, 0);
});
// Wait for images to finish loading
await new Promise(r => setTimeout(r, 2000));
}Form Interaction
Login Flows
Many scraping targets require authentication. Puppeteer can fill forms and submit them just like a user.
async function loginAndScrape(url, username, password) {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
await page.goto('https://example.com/login', { waitUntil: 'networkidle2' });
// Type credentials with realistic delays
await page.type('#username', username, { delay: 50 });
await page.type('#password', password, { delay: 50 });
// Click submit and wait for navigation
await Promise.all([
page.waitForNavigation({ waitUntil: 'networkidle2' }),
page.click('#login-button'),
]);
// Now scrape authenticated content
await page.goto(url, { waitUntil: 'networkidle2' });
const data = await page.evaluate(() => {
return document.querySelector('.dashboard-data')?.textContent;
});
await browser.close();
return data;
}Search Forms
// Type into a search box, wait for autocomplete, select a result
await page.type('#search-input', 'web scraping api', { delay: 100 });
await page.waitForSelector('.autocomplete-results', { timeout: 5000 });
await page.click('.autocomplete-results li:first-child');
await page.waitForNavigation({ waitUntil: 'networkidle2' });Handling Pagination
Most scraping jobs involve multiple pages. Here is a reliable pattern for numbered pagination:
async function scrapeAllPages(baseUrl) {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
let allResults = [];
let currentPage = 1;
while (true) {
const url = `${baseUrl}?page=${currentPage}`;
await page.goto(url, { waitUntil: 'networkidle2' });
const pageResults = await page.evaluate(() =>
[...document.querySelectorAll('.result-item')].map(el => ({
title: el.querySelector('h3')?.textContent.trim(),
link: el.querySelector('a')?.href,
}))
);
if (pageResults.length === 0) break;
allResults.push(...pageResults);
console.log(`Page ${currentPage}: ${pageResults.length} results`);
// Check if there is a next page
const hasNext = await page.$('.pagination .next:not(.disabled)');
if (!hasNext) break;
currentPage++;
// Respectful delay between pages
await new Promise(r => setTimeout(r, 1500 + Math.random() * 1000));
}
await browser.close();
return allResults;
}For "Load More" button pagination:
async function scrapeLoadMore(page) {
let clickCount = 0;
while (clickCount < 20) {
const loadMoreBtn = await page.$('.load-more-button');
if (!loadMoreBtn) break;
await loadMoreBtn.click();
await page.waitForFunction(
(prevCount) => document.querySelectorAll('.item').length > prevCount,
{ timeout: 8000 },
await page.evaluate(() => document.querySelectorAll('.item').length)
);
clickCount++;
}
}Network Request Interception
Blocking unnecessary resources makes your scraper faster and reduces bandwidth. Images, fonts, and CSS are rarely needed for data extraction.
async function setupRequestInterception(page) {
await page.setRequestInterception(true);
page.on('request', (request) => {
const blockedTypes = ['image', 'stylesheet', 'font', 'media'];
const blockedDomains = ['google-analytics.com', 'facebook.net', 'doubleclick.net'];
if (blockedTypes.includes(request.resourceType())) {
request.abort();
} else if (blockedDomains.some(d => request.url().includes(d))) {
request.abort();
} else {
request.continue();
}
});
}This can reduce page load times by 40-60% and cut bandwidth by 70%+. Always block analytics and tracking scripts — they slow down scraping and serve no purpose for data extraction.
Screenshots and PDF Generation
Puppeteer can capture full-page screenshots and generate PDFs, which is useful for visual verification, archival, or monitoring changes.
// Full page screenshot
await page.screenshot({
path: 'page.png',
fullPage: true,
});
// Screenshot of a specific element
const element = await page.$('.product-card');
await element.screenshot({ path: 'product.png' });
// Generate PDF (works only in headless mode)
await page.pdf({
path: 'page.pdf',
format: 'A4',
printBackground: true,
margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' },
});A practical use case: take before/after screenshots of product pages to detect price changes or layout shifts.
Advanced Data Extraction
Scraping Tables
HTML tables are everywhere — product specs, financial data, comparison pages. Here is a generic table scraper:
async function scrapeTable(page, tableSelector) {
return await page.evaluate((selector) => {
const table = document.querySelector(selector);
if (!table) return null;
const headers = [...table.querySelectorAll('thead th')].map(
th => th.textContent.trim()
);
const rows = [...table.querySelectorAll('tbody tr')].map(tr => {
const cells = [...tr.querySelectorAll('td')].map(
td => td.textContent.trim()
);
return Object.fromEntries(headers.map((h, i) => [h, cells[i]]));
});
return { headers, rows };
}, tableSelector);
}
// Usage
const data = await scrapeTable(page, '#pricing-table');
// Returns: { headers: ['Plan', 'Price', 'Features'], rows: [{...}, {...}] }Shadow DOM
Some modern web components use Shadow DOM, which hides elements from regular querySelector calls. You need to pierce through shadow roots:
const shadowData = await page.evaluate(() => {
const host = document.querySelector('product-card');
const shadow = host.shadowRoot;
return {
title: shadow.querySelector('.title')?.textContent.trim(),
price: shadow.querySelector('.price')?.textContent.trim(),
};
});Intercepting XHR/Fetch Responses
Sometimes the cleanest approach is to intercept the API calls the page makes internally, rather than parsing the rendered HTML:
async function interceptApiData(page, url) {
const apiData = [];
page.on('response', async (response) => {
const reqUrl = response.url();
if (reqUrl.includes('/api/products') && response.status() === 200) {
try {
const json = await response.json();
apiData.push(...json.results);
} catch (e) {
// Not JSON, skip
}
}
});
await page.goto(url, { waitUntil: 'networkidle2' });
return apiData;
}This technique is extremely powerful. Many sites fetch structured JSON from their own APIs and then render it into HTML. By intercepting the response, you get clean, structured data without parsing DOM elements at all.
Error Handling and Retry Patterns
Production scrapers need robust error handling. Network timeouts, selector changes, and rate limiting will all happen.
async function scrapeWithRetry(url, maxRetries = 3) {
let browser;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
page.setDefaultTimeout(15000);
await page.goto(url, { waitUntil: 'networkidle2' });
await page.waitForSelector('.data-container');
const data = await page.evaluate(() => {
return document.querySelector('.data-container')?.textContent.trim();
});
if (!data) throw new Error('Empty data extracted');
return data;
} catch (error) {
console.error(`Attempt ${attempt}/${maxRetries} failed: ${error.message}`);
if (attempt === maxRetries) {
throw new Error(`All ${maxRetries} attempts failed for ${url}`);
}
// Exponential backoff: 2s, 4s, 8s...
const delay = Math.pow(2, attempt) * 1000;
await new Promise(r => setTimeout(r, delay));
} finally {
if (browser) await browser.close();
}
}
}Key practices:
- Always close the browser in a
finallyblock to prevent orphan Chromium processes - Validate extracted data — an empty result is as bad as an error
- Use exponential backoff — hammering a rate-limited server with immediate retries makes things worse
- Set explicit timeouts — the default 30-second timeout is too long for most scraping jobs
Anti-Bot Detection and Stealth
Out of the box, Puppeteer is trivially detectable. The navigator.webdriver flag is set to true, plugin arrays are empty, and WebGL reports SwiftShader instead of real GPU hardware. Every anti-bot service checks these in milliseconds.
The puppeteer-extra-plugin-stealth package patches most of these signals:
npm install puppeteer-extra puppeteer-extra-plugin-stealthconst puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
(async () => {
const browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-blink-features=AutomationControlled',
],
});
const page = await browser.newPage();
// Randomize viewport to look less bot-like
const width = 1280 + Math.floor(Math.random() * 640);
const height = 800 + Math.floor(Math.random() * 280);
await page.setViewport({ width, height });
await page.goto('https://bot.sannysoft.com', { waitUntil: 'networkidle2' });
await page.screenshot({ path: 'stealth-test.png', fullPage: true });
await browser.close();
})();The stealth plugin handles:
- Removing
navigator.webdriverflag - Faking
navigator.pluginswith realistic entries - Spoofing WebGL vendor and renderer strings
- Patching Chrome runtime objects (
window.chrome) - Fixing
navigator.permissionsbehavior - Spoofing
navigator.languagesproperly
Even with stealth, advanced anti-bot systems like Cloudflare Turnstile, DataDome, and PerimeterX use behavioral analysis — mouse movement patterns, scroll velocity, and timing between actions. You can partially address this:
// Simulate human-like mouse movement
async function humanMove(page, x, y) {
const steps = 10 + Math.floor(Math.random() * 15);
await page.mouse.move(x, y, { steps });
await new Promise(r => setTimeout(r, 100 + Math.random() * 300));
}
// Random delays between actions
async function humanDelay(min = 500, max = 2000) {
const delay = min + Math.random() * (max - min);
await new Promise(r => setTimeout(r, delay));
}Proxy Rotation
Sending all requests from one IP address gets you blocked fast. Proxy rotation distributes your requests across many IPs.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
const proxies = [
'http://user:[email protected]:8080',
'http://user:[email protected]:8080',
'http://user:[email protected]:8080',
];
async function scrapeWithProxy(url) {
const proxy = proxies[Math.floor(Math.random() * proxies.length)];
const proxyUrl = new URL(proxy);
const browser = await puppeteer.launch({
headless: 'new',
args: [`--proxy-server=${proxyUrl.host}`],
});
const page = await browser.newPage();
// Authenticate with the proxy
if (proxyUrl.username) {
await page.authenticate({
username: proxyUrl.username,
password: proxyUrl.password,
});
}
try {
await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
return await page.content();
} finally {
await browser.close();
}
}A smarter approach rotates proxies based on failure rate. If a proxy gets blocked, move it to a cooldown list:
class ProxyPool {
constructor(proxies) {
this.available = [...proxies];
this.cooldown = new Map(); // proxy -> cooldown expiry timestamp
}
getProxy() {
// Move expired cooldowns back to available
const now = Date.now();
for (const [proxy, expiry] of this.cooldown) {
if (now > expiry) {
this.available.push(proxy);
this.cooldown.delete(proxy);
}
}
if (this.available.length === 0) {
throw new Error('No proxies available');
}
const index = Math.floor(Math.random() * this.available.length);
return this.available[index];
}
markFailed(proxy) {
this.available = this.available.filter(p => p !== proxy);
this.cooldown.set(proxy, Date.now() + 5 * 60 * 1000); // 5 min cooldown
}
}Scaling with Browser Pools
Running one browser at a time is fine for small jobs. For serious scraping, you need a pool of browsers running in parallel with concurrency limits.
const puppeteer = require('puppeteer');
class BrowserPool {
constructor(maxBrowsers = 5) {
this.maxBrowsers = maxBrowsers;
this.activeBrowsers = 0;
this.queue = [];
}
async acquire() {
if (this.activeBrowsers < this.maxBrowsers) {
this.activeBrowsers++;
return await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-dev-shm-usage'],
});
}
// Wait for a browser to become available
return new Promise((resolve) => {
this.queue.push(resolve);
});
}
async release(browser) {
await browser.close();
if (this.queue.length > 0) {
const next = this.queue.shift();
const newBrowser = await puppeteer.launch({
headless: 'new',
args: ['--no-sandbox', '--disable-dev-shm-usage'],
});
next(newBrowser);
} else {
this.activeBrowsers--;
}
}
}
// Usage: scrape 100 URLs with max 5 concurrent browsers
async function scrapeMany(urls) {
const pool = new BrowserPool(5);
const results = [];
const tasks = urls.map(async (url) => {
const browser = await pool.acquire();
try {
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2', timeout: 20000 });
const title = await page.title();
results.push({ url, title });
} catch (error) {
results.push({ url, error: error.message });
} finally {
await pool.release(browser);
}
});
await Promise.all(tasks);
return results;
}Scaling Puppeteer beyond a single machine gets complicated fast. Each browser instance needs 100-300 MB of RAM. A 4 GB server can realistically run 10-15 concurrent browsers. For higher throughput, you are looking at container orchestration, process monitoring, and failure recovery — which is essentially building your own scraping infrastructure.
Launch Browser
Puppeteer starts a Chromium instance with stealth patches and proxy configuration
Navigate and Wait
Go to the target URL, wait for JavaScript rendering and dynamic content to load
Extract Data
Run page.evaluate() to pull structured data from the DOM or intercept API responses
Handle Errors
Retry on failure with exponential backoff, rotate proxies on blocks
Store Results
Save extracted data to JSON, CSV, or database — validate before writing
When to Use a Scraping API Instead
Building and maintaining a Puppeteer scraping pipeline is real engineering work. You are responsible for browser management, proxy infrastructure, anti-bot bypass, CAPTCHA solving, retry logic, and monitoring. For one-off projects or simple targets, that is fine.
But when you are scraping at scale against sites with serious bot protection, the maintenance burden grows fast. Cloudflare updates its detection every few weeks. Proxy providers rotate your IPs into already-burned ranges. Your stealth patches break when Chrome updates. You spend more time maintaining infrastructure than building the product that needs the data.
This is where a scraping API makes sense. Instead of managing browsers, proxies, and stealth patches yourself, you send an HTTP request and get back the rendered HTML or structured data.
AlterLab handles the hard parts — anti-bot bypass across Cloudflare, DataDome, and PerimeterX, automatic proxy rotation through residential and datacenter pools, and headless browser rendering. A single API call replaces hundreds of lines of Puppeteer infrastructure code:
curl -X POST https://alterlab.io/api/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/products", "formats": ["text", "markdown"]}'No browser pools to manage. No proxy rotation to implement. No stealth plugin updates to track.
| Feature | DIY Puppeteer | AlterLab API |
|---|---|---|
| Setup Time | Hours to days | 5 minutes |
| Anti-Bot Bypass | Manual (breaks often) | Built-in (auto-updated) |
| Proxy Management | Self-managed pool | Included |
| Browser Infrastructure | Your servers | Managed |
| Maintenance | Ongoing | None |
| Scaling | Complex orchestration | Increase API calls |
| Cost at 10K pages/day | $200-500/mo servers + proxies | $49/mo |
| JavaScript Rendering |
The decision is straightforward: use Puppeteer when you need full browser control (custom interactions, screenshots, specific workflows). Use a scraping API when you need reliable data extraction at scale without the infrastructure overhead.
Complete Example: E-Commerce Product Scraper
Here is a complete, production-ready scraper that extracts product data from an e-commerce listing page, handles pagination, retries on failure, and saves results to JSON.
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const fs = require('fs').promises;
puppeteer.use(StealthPlugin());
const CONFIG = {
baseUrl: 'https://example-store.com/category/electronics',
maxPages: 10,
maxRetries: 3,
concurrency: 3,
outputFile: 'products.json',
delayBetweenPages: [1500, 3000], // min/max ms
};
async function randomDelay([min, max]) {
const delay = min + Math.random() * (max - min);
await new Promise(r => setTimeout(r, delay));
}
async function extractProducts(page) {
return await page.evaluate(() => {
return [...document.querySelectorAll('.product-card')].map(card => ({
name: card.querySelector('.product-name')?.textContent.trim() || '',
price: card.querySelector('.product-price')?.textContent.trim() || '',
rating: card.querySelector('.rating-value')?.textContent.trim() || '',
reviewCount: card.querySelector('.review-count')?.textContent.trim() || '',
url: card.querySelector('a.product-link')?.href || '',
image: card.querySelector('img.product-image')?.src || '',
availability: card.querySelector('.stock-status')?.textContent.trim() || '',
scraped_at: new Date().toISOString(),
}));
});
}
async function scrapePage(browser, url, retries = CONFIG.maxRetries) {
const page = await browser.newPage();
try {
// Block heavy resources
await page.setRequestInterception(true);
page.on('request', req => {
if (['image', 'stylesheet', 'font', 'media'].includes(req.resourceType())) {
req.abort();
} else {
req.continue();
}
});
await page.setViewport({ width: 1366, height: 768 });
await page.goto(url, { waitUntil: 'networkidle2', timeout: 25000 });
// Wait for product cards to render
await page.waitForSelector('.product-card', { timeout: 10000 });
const products = await extractProducts(page);
console.log(` Extracted ${products.length} products from ${url}`);
return products;
} catch (error) {
if (retries > 0) {
console.warn(` Retry (${CONFIG.maxRetries - retries + 1}): ${error.message}`);
await new Promise(r => setTimeout(r, 3000));
return scrapePage(browser, url, retries - 1);
}
console.error(` Failed after ${CONFIG.maxRetries} retries: ${url}`);
return [];
} finally {
await page.close();
}
}
async function main() {
console.log('Starting e-commerce scraper...');
const browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled',
],
});
const allProducts = [];
try {
for (let pageNum = 1; pageNum <= CONFIG.maxPages; pageNum++) {
const url = `${CONFIG.baseUrl}?page=${pageNum}`;
console.log(`Scraping page ${pageNum}/${CONFIG.maxPages}: ${url}`);
const products = await scrapePage(browser, url);
if (products.length === 0) {
console.log('No products found — reached last page.');
break;
}
allProducts.push(...products);
// Respectful delay between pages
if (pageNum < CONFIG.maxPages) {
await randomDelay(CONFIG.delayBetweenPages);
}
}
// Save results
await fs.writeFile(
CONFIG.outputFile,
JSON.stringify(allProducts, null, 2),
'utf-8'
);
console.log(`\nDone. Scraped ${allProducts.length} products across ${CONFIG.maxPages} pages.`);
console.log(`Results saved to ${CONFIG.outputFile}`);
} finally {
await browser.close();
}
}
main().catch(console.error);This scraper includes everything covered in this guide: stealth mode, request interception, proper waits, retry logic, respectful delays, and clean resource management. To adapt it to a real target, you only need to update the CSS selectors in extractProducts and the base URL.
What to Remember
Puppeteer gives you complete control over a real browser, which makes it the right choice for scraping JavaScript-heavy sites, handling complex interactions, and extracting data that HTTP-only tools cannot reach. The cost is complexity — you are managing browser processes, memory, proxies, and stealth.
For projects where you need that level of control, the patterns in this guide will get you to production. For projects where you just need the data, consider whether the infrastructure overhead is worth it. A scraping API like AlterLab can reduce weeks of Puppeteer infrastructure work to a single HTTP call — letting you focus on what you are building instead of how you are scraping.