general

Crawl Depth

Crawl depth is the maximum number of link hops a crawler will follow from its seed URLs, limiting the scope of the crawl to pages within N clicks of the starting point.

In a web crawl, depth 0 is the seed URL itself, depth 1 is every page linked from the seed, depth 2 is pages linked from depth-1 pages, and so on. Setting a maximum crawl depth prevents the crawler from venturing into deeply nested pages that are unlikely to contain the target data and helps control crawl budget.

Depth limits are especially important for sites with infinite or near-infinite link graphs generated by parameter combinations (calendar pages, faceted search results). A crawler with no depth limit encountering such a site will generate an unbounded crawl frontier.

In Scrapy, depth is controlled by the `DEPTH_LIMIT` setting. Individual spiders can override depth per request by checking `response.meta.get('depth')` and declining to follow links past a threshold. Some crawlers use a breadth-first strategy (process all depth-N pages before depth-N+1) to ensure the highest-level pages are captured first even if the crawl is terminated early.

Examples

# Scrapy settings: limit crawl depth
# settings.py
DEPTH_LIMIT = 3  # only follow links up to 3 hops from seed

Related Terms

Extract Crawl Depth data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    Crawl Depth — Web Scraping Glossary | AlterLab