An open-source Python framework for large-scale web crawling and scraping with async request scheduling, deduplication, and output pipelines.

Scrapy — Web Scraping Glossary

Scrapy is an open-source Python web crawling and scraping framework designed for large-scale data extraction. It provides a complete, production-ready architecture: asynchronous request scheduling via Twisted, URL deduplication to avoid revisiting pages, middleware pipelines for processing requests and responses, item pipelines for cleaning and storing extracted data, and built-in support for exporting to JSON, CSV, XML, and databases.

Scrapy's architecture is built around Spiders — Python classes that define how to crawl a site (start URLs, following rules) and how to extract data (CSS or XPath selectors). Spiders yield Item objects that flow through configurable pipelines: data cleaning, validation, deduplication, and storage. The framework handles the async I/O, request scheduling, and retry logic automatically.

For JavaScript-rendered pages, Scrapy alone is insufficient — it sends plain HTTP requests. The common integration pattern is to use Scrapy as the crawling and pipeline framework while routing requests through AlterLab's API for rendering. The `scrapy-playwright` and `scrapy-splash` integrations provide alternative browser rendering backends. AlterLab can serve as a transparent HTTP rendering proxy for Scrapy spiders.

Examples

import scrapy

class ProductSpider(scrapy.Spider):
    name = 'products'

    def start_requests(self):
        # Route through AlterLab for rendering
        yield scrapy.Request(
            'https://api.alterlab.io/v1/scrape',
            method='POST',
            body='{"url": "https://example.com", "render_js": true}',
            headers={'X-API-Key': 'sk_live_...'}
        )

Scrapy

What is Scrapy?

How does AlterLab handle Scrapy?

Examples

Related Terms

Extract Scrapy data from any website

Your first scrape.
Sixty seconds.

What is Scrapy?

How does AlterLab handle Scrapy?

Examples

Related Terms

Extract Scrapy data from any website

Your first scrape. Sixty seconds.

Your first scrape.
Sixty seconds.