Data extraction is the step after web scraping that transforms raw HTML into usable structured data. While scraping retrieves the page content, extraction identifies and isolates the specific fields needed — product name, price, review count, publication date — and outputs clean, typed records.
Extraction techniques range from CSS selectors and XPath expressions for simple, consistent DOM structures to AI-powered schema extraction for complex or variable layouts. CSS selectors like `div.price > span` and XPath expressions like `//div[@class='price']/span/text()` are precise and fast for stable page structures. For pages where the DOM changes frequently or is semantically complex, AI extraction accepts a JSON schema definition and uses a language model to locate and return the matching fields.
AlterLab supports both approaches. For structured, predictable pages, pass CSS selectors or XPath in the request. For variable layouts, pass an `extract_schema` JSON schema and AlterLab returns the extracted fields directly in the response — no HTML parsing required on your end.