general

URL

A URL (Uniform Resource Locator) is the address of a resource on the web, comprising a scheme, host, optional port, path, query string, and fragment.

A URL uniquely identifies a web resource. Its components are: scheme (`https`), authority (host `example.com` and optional port `:8080`), path (`/products/widget`), query string (`?page=2&sort=price`), and fragment (`#reviews`). Fragments are client-side only and are not sent to the server; scrapers targeting fragment-routed SPAs must handle them via browser-based rendering.

URL normalisation is a common preprocessing step in crawlers: converting relative URLs to absolute, lowercasing the scheme and host, removing default ports (`:80` for http, `:443` for https), sorting query parameters, and stripping tracking parameters like `utm_source`. Normalised URLs improve deduplication accuracy by ensuring that `https://example.com/p?a=1&b=2` and `https://example.com/p?b=2&a=1` are recognised as the same resource.

URLs can encode arbitrary data in query parameters, making them a key input for systematic scraping of paginated or parameterised datasets. Iterating over known URL patterns (changing a product ID or page number) is a fast and reliable scraping strategy when the URL structure is predictable.

Examples

# URL parsing and normalisation
from urllib.parse import urlparse, urlencode, urlunparse
import urllib.parse

parsed = urlparse("https://example.com/search?q=scraping&page=1#results")
print(parsed.scheme)    # https
print(parsed.netloc)    # example.com
print(parsed.path)      # /search
print(parsed.query)     # q=scraping&page=1
# Fragment is not sent to server

Related Terms

Extract URL data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    URL — Web Scraping Glossary | AlterLab