general

Retry Logic

Automated re-sending of failed requests with backoff strategies, essential for handling transient errors, rate limits, and flaky anti-bot challenges.

Retry logic is the code responsible for automatically re-attempting failed HTTP requests under appropriate conditions. In web scraping, failures occur frequently and for varied reasons: transient network errors (connection reset, timeout), rate limit responses (HTTP 429), temporary server errors (HTTP 500, 503), anti-bot challenges that require a different strategy, and CAPTCHA gates. Not all failures warrant a retry — 404 Not Found and 403 Forbidden (permanent ban) should not be retried.

Effective retry logic uses exponential backoff with jitter: the first retry waits 1 second, the second waits 2 seconds, the third waits 4 seconds, with random jitter added to each delay to prevent synchronised retry storms from multiple concurrent scrapers. A maximum retry count (typically 3-5) prevents infinite loops on permanently failing targets. Circuit breakers trip after a threshold of consecutive failures to give an overwhelmed target server time to recover.

Different failure types warrant different retry strategies: HTTP 429 should honour the Retry-After header if present; HTTP 503 suggests escalating to a higher anti-bot tier; connection timeouts suggest trying a different proxy IP; CAPTCHA responses require challenge resolution before retrying. AlterLab's retry layer handles these cases automatically — developers receive a successful response or a descriptive error after all retry attempts are exhausted.

Examples

# Retry with exponential backoff
import time, random

def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        response = scrape(url)
        if response.status == 200:
            return response
        if response.status == 429:
            wait = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait)
    raise Exception(f"Failed after {max_retries} attempts")

Related Terms

    Retry Logic — Web Scraping Glossary | AlterLab