general

Data Enrichment

Data enrichment augments scraped records with additional information from secondary sources — geocoding addresses, looking up company data, or appending social profiles.

Raw scraped data is rarely complete on its own. A product record may include a name and price but lack a standardised category, brand identifier, or competitor comparison. Data enrichment adds this context by joining the scraped record against one or more secondary data sources.

Common enrichment steps in scraping pipelines include: geocoding physical addresses to latitude/longitude using a mapping API, resolving company names to standardised identifiers (Dun & Bradstreet DUNS, LEI), appending demographic data to a geographic region, expanding product SKUs to full catalogue entries, or classifying text into a taxonomy using an LLM.

Enrichment introduces dependencies on external APIs, which must be accounted for in rate limiting (many enrichment APIs have strict quotas), error handling (enrichment failure should not block the primary record from being stored), and cost management (enrichment calls add variable cost proportional to record volume).

Related Terms

A data pipeline is an automated sequence of steps that ingests raw data from a source, transforms it, and delivers it to a destination such as a database, data warehouse, or analytics system.

Structured Data Extraction

Converting free-form HTML into typed JSON records using explicit schemas, producing clean structured output instead of raw markup.

LLM (Large Language Model)

A Large Language Model is a neural network trained on vast text corpora that can generate, summarise, translate, and reason over natural language at human level.

A web service that exposes data and actions through standard HTTP methods and resource-oriented URLs, returning structured JSON responses.

Deduplication is the process of identifying and removing or merging duplicate records in a scraped dataset, ensuring each real-world entity appears exactly once.

Extract Data Enrichment data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

Back to Glossary

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.
Just a POST request.

terminal

curl -X POST https://api.alterlab.io/v1/scrape \

-H "X-API-Key: YOUR_KEY" \

-H "Content-Type: application/json" \

-d '{"url": "https://example.com", "formats": ["markdown"]}'

Start building free

No credit card required · Up to 5,000 free scrapes · Balance never expires

Data Enrichment — Web Scraping Glossary | AlterLab