How-To Guides

Web Scraping Tutorials

Step-by-step guides for extracting data from any website. Beginner to advanced — with working code examples and AlterLab API integration.

Data Extraction

Extract structured data from any public website — product pages, directories, news sites, and more.

Beginner4 steps

Scrape Amazon Product Data

Amazon product pages render dynamically and include compatibility layers that block most simple scrapers. Extracting prices, ratings, and availability requires a reliable browser-rendering pipeline that handles these compatibility requirements automatically.

Beginner4 steps

Extract Emails from a Website

Finding publicly listed contact email addresses across multiple pages of a website requires fetching each page and applying pattern matching. Doing this at scale means handling pagination, varying page structures, and consistent data delivery.

Intermediate5 steps

Scrape E-commerce Prices

Price monitoring requires fetching the same product URLs repeatedly and extracting current prices reliably. E-commerce sites frequently update their page structure, and prices are often loaded dynamically — making standard HTTP scrapers unreliable.

Advanced4 steps

Scrape Google Search Results

Google Search result pages render dynamically, vary by location and device, and frequently update their HTML structure. Extracting organic rankings, featured snippets, and related questions reliably requires a rendering-capable pipeline that handles these variables.

Intermediate4 steps

Scrape Real Estate Listings

Real estate listing pages are dynamically rendered with extensive JavaScript and include map-based interfaces, filter states, and paginated listing grids. Extracting property data reliably requires handling all these layers consistently.

Beginner4 steps

Scrape News Articles

News sites serve article content dynamically, require JavaScript to reveal full article text behind paywalls or subscription prompts, and change their HTML structure frequently. Reliable news extraction requires a robust rendering pipeline and flexible selectors.

Advanced5 steps

Build a Price Comparison Tool

Comparing prices across multiple retailers requires fetching product pages from different sites simultaneously, normalizing diverse price formats, and matching the same product across different naming conventions — all on a recurring schedule.

Intermediate4 steps

Scrape Job Listings

Job boards render listings dynamically, apply geographic and device-based filtering, and paginate results across hundreds of pages. Building a job data pipeline requires handling JavaScript rendering and systematic pagination.

Beginner4 steps

Scrape Data with Python

Python is the most popular language for web scraping — with excellent libraries for HTTP requests, HTML parsing, and data processing. This guide covers the complete setup from API key to clean extracted data.

Beginner4 steps

Scrape Data with Node.js

Node.js is an excellent platform for web scraping pipelines — native async/await makes concurrent requests natural, and the npm ecosystem provides powerful HTML parsing tools. This guide covers the complete setup using modern Node.js.

Beginner4 steps

Extract Data from a Website Without an API

Most websites don't expose their data through a formal API. When you need structured data from a site that has no API, web scraping is the standard approach — fetching the publicly visible pages and extracting the data programmatically.

Intermediate4 steps

Scrape Product Reviews

Product reviews are spread across paginated review sections, often loaded lazily or hidden behind "Show more" interactions. Collecting a complete review dataset requires handling JavaScript rendering, pagination, and possibly scroll-triggered loading.

Intermediate4 steps

Scrape Publicly Listed Profile Data

Professional directories and public-facing profile pages contain publicly listed information — names, job titles, locations, and professional summaries — that businesses use for market research and lead generation.

JavaScript Rendering

Handle single-page applications, React/Vue/Angular frontends, and dynamically loaded content.

Beginner4 steps

Handle JavaScript-Rendered Pages

Single-page applications built with React, Vue, or Angular load content dynamically after the initial HTML response. A standard HTTP request returns an empty shell — you need full browser execution to get the actual content.

Website Compatibility

Reliably access websites with compatibility layers, challenge pages, and rate limiting.

Beginner3 steps

Handle Website Challenges Automatically

Many websites present challenge pages to verify that visitors are real users before delivering content. Standard HTTP scrapers receive the challenge page instead of the data — requiring a full browser environment with challenge resolution capability to proceed.

Beginner3 steps

Use Proxies for Web Scraping

Scraping the same website from a single IP address quickly triggers rate limits and blocks. Proxy rotation distributes requests across multiple IP addresses, making your scraper appear as natural user traffic rather than automated requests from one source.

Intermediate4 steps

Avoid Getting Blocked When Scraping

Web scrapers that send requests too fast, use identifiable patterns, or send unusual headers get blocked. Consistent, reliable data collection requires managing request pacing, rotating identifiers, and handling compatibility requirements automatically.

Pagination & Crawling

Collect data across multiple pages, infinite scroll, and cursor-based pagination.

Intermediate4 steps

Scrape Paginated Results

Data spread across multiple pages requires iterating through each page systematically. Pagination patterns vary widely — some use page numbers in the URL, others use query parameters, cursors, or infinite scroll — each requiring a different approach.

Advanced4 steps

Handle Infinite Scroll When Scraping

Infinite scroll pages load new content dynamically as the user scrolls down — meaning a standard HTTP fetch only returns the initially visible content. Collecting the full dataset requires either simulating scroll actions or intercepting the underlying data requests.

Structured Data

Extract clean JSON, tables, product schemas, and structured content from raw HTML.

Beginner4 steps

Extract Structured Data from HTML

Raw HTML contains the data you need buried in nested tags, inconsistent formatting, and multiple possible locations. Extracting clean, structured output requires a systematic approach using CSS selectors, JSON-LD parsing, or table extraction.

Performance & Scale

Run high-volume scraping jobs efficiently with batching, concurrency, and cost control.

Intermediate5 steps

Monitor Competitor Websites

Staying aware of competitor pricing, product launches, and messaging changes requires checking multiple websites repeatedly. Manual monitoring doesn't scale — you need an automated pipeline that tracks changes and sends alerts.

Advanced4 steps

Build a Web Scraper API Endpoint

If multiple services or team members need access to scraped data, a dedicated scraper API endpoint is more efficient than running scrapers in each service independently. A single API layer handles the scraping, caching, and data normalization.

Looking for use case guides?

Browse use cases for industry-specific data extraction guides — price monitoring, lead generation, AI training data, and more.

Your first scrape.
Sixty seconds.

$1 free credit — up to 5,000 scrapes. No credit card.
Just a POST request.

terminal

curl -X POST https://api.alterlab.io/v1/scrape \

-H "X-API-Key: YOUR_KEY" \

-H "Content-Type: application/json" \

-d '{"url": "https://example.com", "formats": ["markdown"]}'

Start building free

No credit card required · $1 free credit, up to 5,000 scrapes · Balance never expires

Web Scraping Tutorials

Data Extraction

Scrape Amazon Product Data

Extract Emails from a Website

Scrape E-commerce Prices

Scrape Google Search Results

Scrape Real Estate Listings

Scrape News Articles

Build a Price Comparison Tool

Scrape Job Listings

Scrape Data with Python

Scrape Data with Node.js

Extract Data from a Website Without an API

Scrape Product Reviews

Scrape Publicly Listed Profile Data

JavaScript Rendering

Handle JavaScript-Rendered Pages

Website Compatibility

Handle Website Challenges Automatically

Use Proxies for Web Scraping

Avoid Getting Blocked When Scraping

Pagination & Crawling

Scrape Paginated Results

Handle Infinite Scroll When Scraping

Structured Data

Extract Structured Data from HTML

Performance & Scale

Monitor Competitor Websites

Build a Web Scraper API Endpoint

Looking for use case guides?

Your first scrape. Sixty seconds.

Your first scrape.
Sixty seconds.