general

Crawl Depth

Crawl depth is the maximum number of link hops a crawler will follow from its seed URLs, limiting the scope of the crawl to pages within N clicks of the starting point.

In a web crawl, depth 0 is the seed URL itself, depth 1 is every page linked from the seed, depth 2 is pages linked from depth-1 pages, and so on. Setting a maximum crawl depth prevents the crawler from venturing into deeply nested pages that are unlikely to contain the target data and helps control crawl budget.

Depth limits are especially important for sites with infinite or near-infinite link graphs generated by parameter combinations (calendar pages, faceted search results). A crawler with no depth limit encountering such a site will generate an unbounded crawl frontier.

In Scrapy, depth is controlled by the `DEPTH_LIMIT` setting. Individual spiders can override depth per request by checking `response.meta.get('depth')` and declining to follow links past a threshold. Some crawlers use a breadth-first strategy (process all depth-N pages before depth-N+1) to ensure the highest-level pages are captured first even if the crawl is terminated early.

Examples

# Scrapy settings: limit crawl depth
# settings.py
DEPTH_LIMIT = 3  # only follow links up to 3 hops from seed

Related Terms

Web Crawler

A program that automatically navigates websites by following hyperlinks, systematically indexing pages across an entire domain.

Link Following

Link following (or crawling) is the process of discovering and visiting URLs by extracting hyperlinks from already-scraped pages, enabling systematic traversal of a website.

Crawl Budget

Crawl budget is the number of pages a search engine crawler (or a custom crawler) will fetch from a site within a given timeframe, influenced by server capacity and site size.

robots.txt

A plain-text file at a website's root specifying which URL paths crawlers are permitted or disallowed from accessing.

Sitemap

An XML file listing all URLs on a website with metadata like last modification date, helping crawlers discover and prioritise pages.

Extract Crawl Depth data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Back to Glossary

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.
Just a POST request.

terminal

curl -X POST https://api.alterlab.io/v1/scrape \

-H "X-API-Key: YOUR_KEY" \

-H "Content-Type: application/json" \

-d '{"url": "https://example.com", "formats": ["markdown"]}'

Start building free

No credit card required · Up to 5,000 free scrapes · Balance never expires

Crawl Depth — Web Scraping Glossary | AlterLab

Examples

Related Terms

Extract Crawl Depth data from any website

Your first scrape. Sixty seconds.

Your first scrape.
Sixty seconds.