HTTP defines how clients (browsers, scrapers) formulate requests and how servers respond. A request specifies a method (GET, POST, PUT, DELETE), a URL, and optional headers and body. The server responds with a status code, response headers, and an optional body containing the requested resource (HTML, JSON, image, etc.).
Key HTTP status codes for scrapers: 200 OK (success), 301/302 (redirect — follow the Location header), 403 Forbidden (access denied), 404 Not Found, 429 Too Many Requests (rate limited — respect Retry-After), 503 Service Unavailable (server temporarily down). Understanding these codes is essential for writing robust retry and error-handling logic.
HTTP/1.1 uses text-based headers and persistent connections. HTTP/2 uses binary framing and multiplexing. HTTP/3 runs over QUIC (UDP-based). Anti-bot systems fingerprint the HTTP version and protocol features used by the client, so scrapers should use the same HTTP version as a real browser for the target site.