extraction

CSS Selector

A pattern used to match HTML elements by tag, class, ID, or attribute, used in web scraping to extract specific nodes from parsed HTML.

A CSS selector is a pattern syntax originally designed for styling web pages that is widely used in web scraping to target specific HTML elements. CSS selectors match elements by tag name (`div`), class (`.price`), ID (`#product-title`), attribute (`a[href]`), hierarchical relationship (`article > p`), or combinations of these. The syntax is concise and readable, making it the most common extraction method for structured HTML.

CSS selectors are supported natively in browsers via `document.querySelector()` and `document.querySelectorAll()`, and in server-side parsing libraries including BeautifulSoup (`soup.select()`), Cheerio (`$('.price')`), and lxml (`tree.cssselect()`). Playwright and Puppeteer accept CSS selectors directly in element interaction methods.

The primary limitation of CSS selectors is fragility: they are coupled to the specific DOM structure of the target page. If the site changes its HTML (redesign, A/B test, framework migration), the selector may return empty results silently. For stable internal structures, CSS selectors are the most efficient extraction method; for variable or frequently changing pages, AI-powered schema extraction is more resilient.

Examples

# CSS selector examples
div.price > span        # span directly inside .price div
#product-title          # element with id="product-title"
a[data-testid="link"]   # anchor with specific data attribute
article:first-child h2  # h2 inside the first article

Related Terms

    CSS Selector — Web Scraping Glossary | AlterLab