protocol

HTTP

HTTP (HyperText Transfer Protocol) is the application-layer protocol used by web browsers and scrapers to request and receive resources from web servers.

HTTP defines how clients (browsers, scrapers) formulate requests and how servers respond. A request specifies a method (GET, POST, PUT, DELETE), a URL, and optional headers and body. The server responds with a status code, response headers, and an optional body containing the requested resource (HTML, JSON, image, etc.).

Key HTTP status codes for scrapers: 200 OK (success), 301/302 (redirect — follow the Location header), 403 Forbidden (access denied), 404 Not Found, 429 Too Many Requests (rate limited — respect Retry-After), 503 Service Unavailable (server temporarily down). Understanding these codes is essential for writing robust retry and error-handling logic.

HTTP/1.1 uses text-based headers and persistent connections. HTTP/2 uses binary framing and multiplexing. HTTP/3 runs over QUIC (UDP-based). Anti-bot systems fingerprint the HTTP version and protocol features used by the client, so scrapers should use the same HTTP version as a real browser for the target site.

Examples

# Minimal HTTP GET with Python httpx
import httpx

response = httpx.get(
    "https://example.com",
    headers={"User-Agent": "Mozilla/5.0 ..."},
    follow_redirects=True,
    timeout=30
)
print(response.status_code, len(response.text))

Related Terms

Extract HTTP data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    HTTP — Web Scraping Glossary | AlterLab