XPath (XML Path Language) is a query language for selecting nodes in XML and HTML documents using path expressions. While CSS selectors match elements by their styling attributes and structural relationships, XPath can express more complex traversals: selecting nodes based on their text content, selecting parent or sibling nodes from a known child, counting sibling elements, or applying arithmetic operations to attribute values.
XPath expressions follow a path syntax: `//` selects anywhere in the document, `/` selects direct children, `@` accesses attributes, `[]` applies predicates (conditions), and `text()` extracts text nodes. For example, `//div[@class='price']/span[1]/text()` selects the text of the first span inside any div with class "price".
XPath's greater expressiveness makes it valuable for situations where CSS selectors cannot express the required relationship — such as selecting a node based on adjacent sibling text or navigating up the DOM tree. It is the standard in XML processing ecosystems and is supported by lxml, Scrapy, and browser DevTools. AlterLab's structured extraction supports both CSS selector and XPath specification in requests.