Every web page's text must be encoded as bytes for transmission. The encoding specifies which byte sequence represents each character. UTF-8 is the dominant encoding on the modern web (covering all Unicode characters) but legacy sites still use ISO-8859-1 (Latin-1), Windows-1252, or region-specific encodings (GBK for Chinese, Shift-JIS for Japanese).
The encoding can be declared in three places: the HTTP `Content-Type` header (`Content-Type: text/html; charset=utf-8`), the HTML `<meta charset='utf-8'>` tag, or the XML declaration. Scrapers must detect and apply the correct encoding before parsing text; mismatched encoding produces mojibake — garbled sequences like `é` instead of `é`.
Python's `chardet` and `charset-normalizer` libraries can detect encoding heuristically when it is not declared. `requests` library applies the `apparent_encoding` detected by chardet when the server does not specify a charset. BeautifulSoup also handles encoding detection internally when given raw bytes.