format

Schema.org

Schema.org is a collaborative vocabulary of structured data types used to annotate web content so search engines and automated tools can understand its meaning.

Schema.org was founded by Google, Bing, Yahoo, and Yandex in 2011 to create a shared vocabulary for describing web content. The vocabulary defines hundreds of types — `Product`, `Event`, `Recipe`, `JobPosting`, `Person`, `Organization`, and more — each with a standardised set of properties. Webmasters embed schema.org annotations in their pages using JSON-LD, Microdata, or RDFa markup.

For scrapers and data pipelines, schema.org annotations offer publisher-curated, semantically labelled data. A product page annotated with `schema.org/Product` reliably exposes fields like `name`, `description`, `sku`, `brand`, `offers`, and `aggregateRating` — without requiring per-site CSS selector maintenance.

The Google Rich Results documentation closely follows schema.org, so pages optimised for rich results are also optimised for structured scraping. Not all sites annotate their content, but e-commerce, news, events, and recipe sites have high adoption rates due to the SEO incentives.

Examples

# Extract schema.org Product data from JSON-LD
import json
from bs4 import BeautifulSoup

def get_product_schema(html):
    soup = BeautifulSoup(html, "html.parser")
    for tag in soup.find_all("script", type="application/ld+json"):
        try:
            data = json.loads(tag.string or "")
            if data.get("@type") == "Product":
                return data
        except json.JSONDecodeError:
            continue
    return None

Related Terms

Extract Schema.org data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    Schema.org — Web Scraping Glossary | AlterLab