BeautifulSoup is a Python library for parsing HTML and XML documents and navigating the resulting parse tree. It accepts HTML from any source — an HTTP response, a file, a string — and exposes the parsed document through an intuitive API: find elements by tag name (`soup.find('div')`), by class (`soup.find_all('p', class_='description')`), by CSS selector (`soup.select('.price')`), or by attribute (`soup.find('a', href=True)`).
BeautifulSoup works with multiple underlying parsers. The built-in `html.parser` is included with Python and handles most HTML correctly. For faster parsing of large documents, `lxml` provides a C-backed HTML and XML parser. For extremely lenient parsing of malformed HTML, `html5lib` produces a browser-compatible parse tree regardless of how broken the markup is.
The key limitation of BeautifulSoup is that it operates on static HTML — it does not execute JavaScript. For JavaScript-rendered pages, the HTML must first be rendered by a browser (Playwright, Puppeteer, or a scraping API with JavaScript rendering enabled). AlterLab's API returns the post-render HTML, which can then be parsed with BeautifulSoup for structured extraction.