Structured data extraction is the process of transforming unstructured HTML into typed, structured data records according to a predefined schema. Rather than returning raw HTML for the caller to parse, structured extraction identifies specific fields — product name, price, availability, review count, publication date — and outputs clean JSON that matches the schema.
Traditional extraction uses CSS selectors or XPath expressions to locate elements by position in the DOM tree. This approach is fast and deterministic but brittle: any change to the site's HTML structure breaks the selector. AI-powered extraction uses a language model to understand the semantic meaning of page content and extract fields regardless of the exact DOM structure, providing resilience against layout changes.
AlterLab supports both approaches. For stable, consistent DOM structures, pass CSS selectors in the request. For variable or complex layouts, pass an `extract_schema` JSON schema and AlterLab's AI extraction engine returns the matching data directly — no HTML parsing required on the caller side. The schema follows JSON Schema syntax and can describe nested objects, arrays, and typed fields.