general

Crawl Budget

Crawl budget is the number of pages a search engine crawler (or a custom crawler) will fetch from a site within a given timeframe, influenced by server capacity and site size.

Search engines allocate a crawl budget to each domain based on server responsiveness and the perceived value of the site's content. Pages that are slow to respond, frequently return errors, or are flagged as low-value by metrics like thin content or low inbound links consume budget disproportionately, leaving high-value pages un-crawled.

For custom crawlers, crawl budget is the number of page fetches the crawler can sustain per unit time without degrading the target server's performance or triggering rate limits. Budgeting involves setting a maximum request rate (requests per second or per minute), prioritising URLs by expected value, and deprioritising or excluding low-value URLs (parameter-heavy variants, session-specific pages, printer-friendly copies).

Sitemap files help both search engine crawlers and custom crawlers allocate their budget efficiently by providing a curated list of the site's most important URLs, optionally with priority and last-modified hints.