protocol

Redirect Chain

A redirect chain is a sequence of HTTP redirects (301, 302, etc.) from an initial URL to a final destination, which scrapers must follow to reach the target content.

HTTP redirects occur when a server responds with a 3xx status code and a `Location` header pointing to the new URL. Redirect chains arise when multiple redirects occur in sequence: an HTTP URL redirects to HTTPS, which redirects to a `www` canonical, which redirects to a localised URL. Most HTTP clients follow redirects automatically up to a configurable limit (typically 10–30 hops).

For scrapers, redirect chains matter because the final URL after all redirects is the actual resource, and intermediate URLs may have different HTML than the destination. Tracking the full redirect chain is useful for link-shortener expansion, affiliate URL resolution, and detecting cloaking (where search-engine bots are served different content than users).

Excessive redirect chains (5+) indicate configuration problems on the target site and add latency. Circular redirects (A → B → A) must be detected and broken to prevent infinite loops.

Examples

# httpx: capture redirect history
import httpx

response = httpx.get("http://example.com", follow_redirects=True)
for r in response.history:
    print(r.status_code, r.url, "→")
print(response.status_code, response.url)