E-commerce Data Extraction
Monitor competitor prices, extract product catalogs, and detect pricing changes across any e-commerce site.
The Problem
E-commerce teams need real-time pricing intelligence to stay competitive. Manually checking competitor prices doesn't scale, and most e-commerce sites use aggressive anti-bot protection that blocks standard scrapers. You need:
- Reliable extraction from sites with Cloudflare, DataDome, and custom CAPTCHAs
- Structured product data (price, stock, rating) — not raw HTML
- Scheduled monitoring with change detection and alerting
- Batch processing for thousands of products without managing infrastructure
Solution Architecture
AlterLab handles the entire pipeline — from bypassing anti-bot protections to returning structured JSON:
1. Scrape
POST /scrape with extraction_schema to get structured product data. Anti-bot bypass is automatic.
2. Batch
POST /batch to process hundreds of product URLs in a single API call with async results via webhooks.
3. Schedule
POST /schedules to set up recurring scrapes with cron expressions. Compare results to detect price changes.
Quick Example
Extract structured product data from any e-commerce page in a single API call:
import requests
response = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"url": "https://example-store.com/product/wireless-headphones",
"extraction_schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"currency": {"type": "string"},
"in_stock": {"type": "boolean"},
"rating": {"type": "number"},
"review_count": {"type": "integer"}
}
}
}
)
product = response.json().get("filtered_content", {})
print(f"{product['name']}: {product['currency']}{product['price']}")
# Wireless Headphones Pro: $79.99Advanced Patterns
Price Change Detection
Schedule recurring scrapes and compare results to detect changes. Disable caching so each run fetches live data:
import requests
def check_price(api_key, url):
"""Fetch current price with caching disabled."""
resp = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={"X-API-Key": api_key},
json={
"url": url,
"cache": False,
"extraction_schema": {
"type": "object",
"properties": {
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
}
}
}
)
return resp.json().get("filtered_content", {})
# Compare with previous run
previous_price = 89.99
current = check_price("YOUR_API_KEY", "https://store.com/product/123")
if current.get("price") and current["price"] != previous_price:
diff = current["price"] - previous_price
direction = "dropped" if diff < 0 else "increased"
print(f"Price {direction} by ${abs(diff):.2f}")Catalog Extraction at Scale
Use the batch endpoint to extract hundreds of products in a single request. Combine with webhooks for async processing:
import requests
product_urls = [
"https://store.com/product/1",
"https://store.com/product/2",
"https://store.com/product/3",
# ... up to 1,000 URLs per batch
]
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"},
"sku": {"type": "string"}
}
}
response = requests.post(
"https://api.alterlab.io/api/v1/batch",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"requests": [
{"url": url, "extraction_schema": schema}
for url in product_urls
],
"webhook_url": "https://your-app.com/webhooks/products"
}
)
job_id = response.json().get("job_id")
print(f"Batch submitted: {job_id}")Cost Optimization
cost_controls.max_tier to cap spending per request. For price monitoring, tier 3 is usually sufficient — avoid tier 5 (CAPTCHA solving) unless necessary.Related Guides
E-commerce Scraping Tutorial
Step-by-step tutorial with code examples for product pages, categories, and variants.
Batch Scraping Guide
Submit up to 1,000 URLs in a single API call with async processing.
Scheduler Guide
Automate recurring scrapes with cron expressions.
JSON Schema Filtering
Filter extracted data to match your desired output structure.