When the same content is accessible at multiple URLs — due to query parameters, trailing slashes, `www` vs. non-`www` subdomains, or session tokens — search engines and scrapers need to know which URL to treat as authoritative. The `rel=canonical` link tag in the HTML `<head>` signals the preferred URL.
For crawlers, following canonical URLs prevents indexing duplicate content and wasting crawl budget. A robust crawler reads the canonical tag and records the canonical URL alongside or instead of the requested URL. Deduplication logic can then match records by canonical URL rather than the exact URL that was fetched.
Canonical URLs also matter for link graph construction: inbound links to duplicate URLs should be consolidated and attributed to the canonical for accurate page-authority calculations.