Throttling is both a courtesy and a self-preservation strategy. Sending requests faster than a site can process them degrades the target's performance for real users and triggers rate-limit defences sooner. Throttling — capping requests to a fixed rate (e.g., 1 request per second per domain) — keeps scraping traffic below detection thresholds and mimics natural browsing patterns.
Implementations range from simple `time.sleep()` calls between requests to sophisticated token-bucket or leaky-bucket rate limiters that maintain a smooth request rate regardless of processing time variance. Token buckets accumulate credits at a constant rate (e.g., 1 token/second) and each request consumes one token — when tokens run out, the next request waits.
For distributed scraping across many workers, a centralised rate-limit store (Redis) coordinates throttling so that workers collectively respect the per-domain limit rather than each worker independently hitting the limit at full speed.