Schema.org was founded by Google, Bing, Yahoo, and Yandex in 2011 to create a shared vocabulary for describing web content. The vocabulary defines hundreds of types — `Product`, `Event`, `Recipe`, `JobPosting`, `Person`, `Organization`, and more — each with a standardised set of properties. Webmasters embed schema.org annotations in their pages using JSON-LD, Microdata, or RDFa markup.
For scrapers and data pipelines, schema.org annotations offer publisher-curated, semantically labelled data. A product page annotated with `schema.org/Product` reliably exposes fields like `name`, `description`, `sku`, `brand`, `offers`, and `aggregateRating` — without requiring per-site CSS selector maintenance.
The Google Rich Results documentation closely follows schema.org, so pages optimised for rich results are also optimised for structured scraping. Not all sites annotate their content, but e-commerce, news, events, and recipe sites have high adoption rates due to the SEO incentives.