REST API
Complete reference for the AlterLab REST API. All endpoints use JSON for requests and responses.
Base URL
https://api.alterlab.io/api/v1Production-Ready Documentation
Quick Start
Get started with the AlterLab API in under 2 minutes. Here's your first request:
Step 1: Get Your API Key
Sign up at alterlab.io and generate an API key from the dashboard.
Step 2: Make Your First Request
# Using cURL
curl -X POST https://api.alterlab.io/api/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Using Python
import requests
response = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={"X-API-Key": "YOUR_API_KEY"},
json={"url": "https://example.com"}
)
print(response.json())
# Using Python SDK (Recommended)
from alterlab import AlterLabSync
client = AlterLabSync(api_key="YOUR_API_KEY")
result = client.scrape("https://example.com")
print(result["content"][:100]) # First 100 charsStep 3: Handle the Response
Simple requests return 200 with content immediately. Complex requests return 202 with a job_id for polling.
{
"url": "https://example.com",
"status_code": 200,
"content": "<!DOCTYPE html>...",
"title": "Example Domain",
"billing": {
"total_credits": 1,
"tier_used": "1"
}
}What's Next?
- Learn about Python SDK for easier integration
- Explore advanced options like screenshots and OCR
- Understand tier escalation and cost controls
Getting Structured JSON
To get structured JSON data (product info, article metadata, etc.) instead of raw HTML, use the formats parameter:
# Get structured JSON from any page
curl -X POST https://api.alterlab.io/api/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.amazon.com/dp/B0D5HMLP7S",
"formats": ["json"]
}'The response will include auto-extracted structured data:
{
"url": "https://www.amazon.com/dp/B0D5HMLP7S",
"status_code": 200,
"content": {
"json": {
"@type": "Product",
"name": "Water Brush Pen Set",
"price": 8.99,
"currency": "USD",
"rating": 4.4,
"reviewCount": 47,
"availability": "InStock",
"images": ["https://..."],
"specifications": {...},
"reviews": [...]
}
}
}No Schema Required
extraction_schema if you want to filter the JSON to specific fields. See Structured Extraction for filtering options.Authentication
All API requests require authentication using an API key. Include your API key in the X-API-Key header.
Keep your API key secure
Unified Scrape Endpoint Recommended
The unified endpoint handles all scraping modes through a single, intelligent interface. It automatically selects the optimal tier (1-5) based on site complexity, supports both synchronous (200) and asynchronous (202) execution patterns, and provides detailed billing breakdowns.
/api/v1/scrapeUnified scraping endpoint with intelligent tier escalation, cost controls, and sync/async execution modes.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| url | string | Required | The URL of the web page to scrape |
| mode | string | Optional | Scraping mode: auto (default), html, js, pdf, or ocrDefault: auto |
| sync | boolean | Optional | Enable blocking mode: API polls internally until complete (60-120s max), returns 200. Set false for immediate 202 with job_id.Default: true |
| formats | string[] | Optional | Output formats: text, json, html, markdown. Use ['json'] to get structured data extraction.Default: ['markdown', 'json'] |
| advanced | AdvancedOptions | Optional | Advanced options: render_js, screenshot, generate_pdf, ocr, use_proxy, markdown, wait_condition |
| cost_controls | CostControls | Optional | Cost controls: max_credits, max_tier, prefer_cost, prefer_speed, fail_fast |
| force_refresh | boolean | Optional | Bypass cache and force fresh scrapeDefault: false |
| include_raw_html | boolean | Optional | Include raw HTML in responseDefault: false |
| timeout | integer | Optional | Request timeout in seconds (1-300)Default: 30 |
| extraction_schema | object | Optional | JSON Schema defining the structure you want extracted. Response will include filtered_content with data matching your schema. See JSON Schema Filtering guide. |
Request Example
# Simple synchronous scrape (default)
curl -X POST https://api.alterlab.io/api/v1/scrape \
-H "X-API-Key: sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"mode": "auto"
}'
# Async scrape with advanced options
curl -X POST https://api.alterlab.io/api/v1/scrape \
-H "X-API-Key: sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://spa-website.com",
"mode": "js",
"sync": false,
"advanced": {
"render_js": true,
"screenshot": true,
"markdown": true,
"wait_condition": "networkidle"
},
"cost_controls": {
"max_credits": 10,
"prefer_cost": true
}
}'Response Example
// Sync response (200) - Simple requests with sync=true (default)
{
"url": "https://example.com",
"status_code": 200,
"content": {
"html": "<!DOCTYPE html>...",
"text": "Cleaned text content...",
"json": {
"title": "Example Domain",
"description": "Example page",
"metadata": {...}
}
},
"title": "Example Domain",
"metadata": {
"description": "Example page",
"keywords": ["example"]
},
"headers": {
"content-type": "text/html; charset=UTF-8"
},
"cached": false,
"response_time_ms": 1234,
"size_bytes": 15234,
"screenshot_url": null, // URL if screenshot: true (available 24h)
"pdf_url": null, // URL if generate_pdf: true (available 24h)
"billing": {
"total_credits": 1,
"tier_used": "1",
"escalations": [
{
"tier": "1",
"result": "success",
"credits": 1,
"duration_ms": 234
}
],
"savings": 19
},
"extraction_method": "algorithmic",
"version": "v1"
}
// Async response (202) - Complex requests with sync=false or auto-detected
{
"job_id": "550e8400-e29b-41d4-a716-446655440000"
}Execution Modes
- Sync (sync=true, default): API queues the job to the worker service and polls internally every 100-500ms. When the job completes (within 60-120 seconds), returns 200 with full content. No manual polling needed - simpler request/response pattern. If job exceeds timeout, falls back to 202 with job_id for manual polling.
- Async (sync=false): Immediately returns 202 with job_id. You must poll /api/v1/jobs/{job_id} or use WebSocket for results. Recommended for long-running scrapes (>60s), batch operations, webhooks, or real-time updates via WebSocket.
- Key difference: Both modes queue to the same worker service. sync=true just adds automatic polling on the API side, making it simpler for clients but with a timeout constraint. sync=false gives you full control over polling/WebSocket with no timeout limits.
When to Use Async Mode (sync=false)
sync: false for:- Scrapes expected to take longer than 60 seconds (large PDFs, complex sites)
- Batch operations where you want to queue multiple jobs and poll together
- When you need webhook delivery instead of polling
- Real-time status updates via WebSocket
sync: true is simpler and eliminates manual polling code.How Sync Mode Works (Under the Hood)
Understanding sync mode helps you choose the right execution pattern for your use case:
Regardless of sync setting, all scrape requests are queued to the worker service. This ensures consistent anti-bot capabilities, proxy management, and resource pooling.
When sync=true, the API server holds your HTTP connection open and polls Redis every 100-500ms checking job status. When complete, it returns the full result as a 200 response. Maximum wait: 60-120 seconds.
When sync=false, the API immediately returns 202 with job_id and closes the connection. You control polling frequency and timeout. No server-side timeout constraints.
If sync=true and job exceeds timeout (60-120s), API returns 202 with job_id as fallback. You can then poll manually. This prevents hung connections while still supporting long-running jobs.
Key Insight: sync=true is purely a convenience feature. Both modes use the same worker infrastructure and have identical scraping capabilities. The only difference is who handles polling: the API server (sync=true) or your client code (sync=false).
Response Format
The content field may be a plain string (simple sync requests) or a structured object when using the formats parameter.
{
"url": "https://example.com",
"status_code": 200,
"content": "<!DOCTYPE html>...",
"billing": {"total_credits": 1}
}{
"url": "https://example.com",
"status_code": 200,
"content": {
"html": "<!DOCTYPE html>...",
"text": "Clean text...",
"markdown": "# Example..."
},
"billing": {"total_credits": 1}
}Python SDK Recommended
The official Python SDK provides a simple, intuitive interface for the AlterLab API with automatic polling, retry logic, and type hints.
Installation
pip install alterlabAsync vs Sync
AlterLab (async, requires await) and AlterLabSync (sync, no await needed). Choose based on your codebase.Sync Usage (Recommended for scripts)
from alterlab import AlterLabSync
# Use AlterLabSync for synchronous code (no await needed)
with AlterLabSync(api_key="YOUR_API_KEY") as client:
result = client.scrape("https://example.com")
print(result["content"])
print(f"Credits used: {result['billing']['total_credits']}")
# With schema filtering
result = client.scrape(
url="https://amazon.com/dp/B0123456789",
extraction_schema={
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
}
}
)
product = result["filtered_content"] # Your filtered data
print(f"Product: {product['name']} - ${product['price']}")Async Usage (For async applications)
import asyncio
from alterlab import AlterLab
async def main():
# AlterLab is async - requires await
async with AlterLab(api_key="YOUR_API_KEY") as client:
result = await client.scrape("https://example.com")
print(result["content"])
# With JS rendering
result = await client.scrape(
url="https://spa-app.com",
mode="js",
advanced=AdvancedOptions(render_js=True, screenshot=True)
)
asyncio.run(main())Advanced Options
from alterlab import AlterLabSync, AdvancedOptions, CostControls
with AlterLabSync(api_key="YOUR_API_KEY") as client:
# With advanced options
result = client.scrape(
url="https://example.com",
mode="js",
advanced=AdvancedOptions(
render_js=True,
screenshot=True,
markdown=True,
wait_condition="networkidle"
),
cost_controls=CostControls(
max_credits=10,
max_tier="4",
prefer_cost=True
)
)
# Screenshot saved to result["screenshot_url"]
# Markdown in result["content"]["markdown"] if formats specifiedCost Estimation
from alterlab import AlterLabSync, AdvancedOptions
with AlterLabSync(api_key="YOUR_API_KEY") as client:
# Estimate cost before scraping
estimate = client.estimate_cost(
url="https://example.com",
mode="auto",
advanced=AdvancedOptions(render_js=True, screenshot=True)
)
print(f"Estimated credits: {estimate['estimated_credits']}")
print(f"Max possible: {estimate['max_possible_credits']}")
# Only scrape if within budget
if estimate['estimated_credits'] <= 5:
result = client.scrape(url="https://example.com")Error Handling
from alterlab import AlterLabSync, AlterLabAPIError, AlterLabTimeoutError
with AlterLabSync(api_key="YOUR_API_KEY") as client:
try:
result = client.scrape("https://example.com")
except AlterLabAPIError as e:
if e.status_code == 402:
print("Insufficient balance!")
elif e.status_code == 429:
print(f"Rate limited. Retry after {e.retry_after}s")
else:
print(f"API error: {e.detail}")
except AlterLabTimeoutError:
print("Request timed out")
except Exception as e:
print(f"Unexpected error: {e}")Manual Job Management
# Get job status without waiting
job_id = "550e8400-e29b-41d4-a716-446655440000"
status = client.get_job_status(job_id)
print(f"Job status: {status['status']}")
# Poll job with custom settings
result = client.poll_job(
job_id=job_id,
poll_interval=2.0, # Check every 2 seconds
poll_timeout=300.0 # Give up after 5 minutes
)Configuration
# Custom configuration (sync client)
client = AlterLabSync(
api_key="YOUR_API_KEY",
base_url="https://api.alterlab.io", # API base (paths include /api/v1)
timeout=60.0, # Request timeout
max_retries=3, # Retry failed requests
retry_backoff=2.0 # Exponential backoff multiplier
)
# Or use environment variable
# export ALTERLAB_API_KEY="YOUR_API_KEY"
client = AlterLabSync() # Auto-loads from envSDK Benefits
- Automatic polling: Handles 202 responses and job polling for you
- Retry logic: Automatically retries failed requests with exponential backoff
- Type hints: Full type annotations for IDE autocomplete
- Error handling: Custom exception types for different error scenarios
- Convenience methods: Mode-specific methods for common use cases
Structured Extraction (Optional Filtering)
Already Getting JSON?
formats: ["json"] as shown in Getting Structured JSON. This section is for filtering that JSON to specific fields.Filter and restructure extracted data to match your desired output format using JSON Schema. This is pure data transformation - no additional cost.
JSON Schema Filtering
Pass extraction_schema to filter extracted data to your desired structure. The filtered result appears in filtered_content:
# Extract product data with schema
result = client.scrape(
url="https://amazon.com/dp/B0123456789",
mode="auto",
extraction_schema={
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"currency": {"type": "string"},
"in_stock": {"type": "boolean"},
"rating": {"type": "number"},
"reviews_count": {"type": "integer"}
}
}
)
# Access filtered data (matches your schema)
product = result["filtered_content"]
print(f"Name: {product['name']}")
print(f"Price: {product['price']}")
print(f"In Stock: {product['in_stock']}")Field Aliases
Schema filtering automatically maps common field name variations:
in_stock → availabilitysku → asin, product_idimage_urls → imagestitle → name, headlineSee JSON Schema Filtering guide for the complete list.
Response Structure
{
"url": "https://amazon.com/dp/B0123456789",
"status_code": 200,
"content": { ... }, // Full extraction (Schema.org, metadata, etc.)
"filtered_content": { // YOUR filtered data (only when extraction_schema provided)
"name": "Product Name",
"price": 29.99,
"in_stock": true
},
"billing": { "total_credits": 3 }
}Zero Additional Cost
Advanced Options
Advanced options provide fine-grained control over scraping behavior and enable premium features.
| Option | Type | Cost | Description |
|---|---|---|---|
| render_js | boolean | +3 | Use headless browser for JavaScript rendering |
| screenshot | boolean | +1 | Capture full-page screenshot (requires render_js) |
| generate_pdf | boolean | +2 | Generate PDF of rendered page (requires render_js) |
| ocr | boolean | +5 | Extract text from images using OCR |
| use_proxy | boolean | +1 | Route through premium proxy network |
| markdown | boolean | Free | Convert content to Markdown format |
| wait_condition | string | Free | Wait condition: domcontentloaded, networkidle, load |
Example with Advanced Options
{
"url": "https://spa-app.com",
"mode": "auto",
"advanced": {
"render_js": true,
"screenshot": true,
"generate_pdf": true,
"markdown": true,
"use_proxy": true,
"wait_condition": "networkidle"
}
}
// Response includes download URLs (available for 24 hours):
// "screenshot_url": "https://alterlab.io/downloads/screenshots/2025-01-15/job-id.png"
// "pdf_url": "https://alterlab.io/downloads/pdfs/2025-01-15/job-id.pdf"Cost Controls
{
"url": "https://example.com",
"mode": "auto",
"cost_controls": {
"max_credits": 5,
"max_tier": "4",
"prefer_cost": true,
"fail_fast": false
}
}Tier Escalation & Cost Controls
AlterLab uses an intelligent 5-tier escalation system that automatically tries the cheapest method first and escalates only when needed. Each tier has different capabilities, speeds, and costs.
| Tier | Name | Cost | Per $1 | Description |
|---|---|---|---|---|
| 1 | Curl | $0.0002 | 5,000 | Ultra-fast curl binary for static sites |
| 2 | HTTP | $0.0003 | 3,333 | HTTPX with TLS fingerprinting and HTTP/2 |
| 3 | Stealth | $0.002 | 500 | curl_cffi with Chrome browser impersonation |
| 4 | Browser | $0.004 | 250 | Playwright browser automation for JS sites |
| 5 | Captcha | $0.02 | 50 | Browser with AI-powered captcha solving |
How Escalation Works
- Start cheapest: By default, starts at Tier 1 (Curl: $0.0002)
- Attempt scrape: Tries to scrape with current tier's method
- Check success: If successful (status 200, valid content), stop and return result
- Escalate if failed: If failed (timeout, blocked, error), move to next tier and retry
- Stop at success or max tier: Returns when successful or when max_tier/max_credits reached
- Detailed billing: Response includes all attempts, final tier used, and cost saved
Cost Control Parameters
max_credits(float, optional)Maximum cost to spend on this request. API will not escalate beyond this budget. Example: max_credits: 0.004 stops at Tier 4 (Browser).
max_tier(string, optional)Maximum tier to escalate to: "1" (Curl), "2" (HTTP), "3" (Stealth), "4" (Browser), or "5" (Captcha). Example: max_tier: "4" stops at Browser.
prefer_cost(boolean, default: false)Start with cheapest tier (Tier 1) and try each tier sequentially. Best for known simple sites.
prefer_speed(boolean, default: false)Start with Tier 4 (Browser) for guaranteed success on most sites. Higher cost but faster overall.
fail_fast(boolean, default: false)Return error instead of escalating to expensive tiers. Useful when you want predictable costs.
Example: Cost-Optimized Request
{
"url": "https://example.com",
"mode": "auto",
"cost_controls": {
"max_credits": 0.004,
"max_tier": "4",
"prefer_cost": true,
"fail_fast": false
}
}
// Response billing breakdown:
{
"billing": {
"total_cost": 0.002,
"tier_used": "3",
"escalations": [
{"tier": "1", "result": "failed", "cost": 0.0002, "duration_ms": 250, "error": "403 Forbidden"},
{"tier": "2", "result": "failed", "cost": 0.0003, "duration_ms": 2100, "error": "Blocked by WAF"},
{"tier": "3", "result": "success", "cost": 0.002, "duration_ms": 4200}
],
"optimization_suggestion": "Site requires Stealth tier. Consider using prefer_speed with Tier 4 for faster results."
}
}Cost Control Best Practices
- Always set
max_creditsfor production to prevent unexpected charges - Use
prefer_cost: truefor known simple sites - Use
prefer_speed: truefor critical scrapers where reliability matters more than cost - Set
fail_fast: truein testing to avoid unnecessary spending on misconfigured requests
Async Mode & Job Polling
Complex scraping requests return a 202 status with a job_id. Poll the job status endpoint to retrieve results.
/api/v1/jobs/{job_id}Poll job status and retrieve results when completed.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| job_id | string | Required | UUID of the job returned from the scrape endpoint |
Request Example
curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
-H "X-API-Key: sk_live_..."Response Example
// Status: pending
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"url": "https://example.com",
"mode": "auto",
"created_at": "2025-11-05T10:30:00Z"
}
// Status: running
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "running",
"url": "https://example.com",
"mode": "auto",
"progress": 50,
"created_at": "2025-11-05T10:30:00Z"
}
// Status: completed
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"url": "https://example.com",
"mode": "auto",
"result": {
"url": "https://example.com",
"status_code": 200,
"content": {
"html": "...",
"text": "...",
"json": {...}
},
"billing": {
"total_credits": 5,
"tier_used": "4"
}
},
"created_at": "2025-11-05T10:30:00Z",
"completed_at": "2025-11-05T10:30:15Z"
}
// Status: failed
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "failed",
"url": "https://example.com",
"mode": "auto",
"error": "Timeout after 30 seconds",
"created_at": "2025-11-05T10:30:00Z",
"failed_at": "2025-11-05T10:30:30Z"
}Polling Best Practices
- Recommended interval: Poll every 2-5 seconds
- Timeout: Set a maximum polling duration (5 minutes recommended)
- Exponential backoff: Increase polling interval if job takes longer
- Status values: pending → running → completed/failed
Example Polling Loop (JavaScript)
async function pollJobStatus(jobId, apiKey, maxWaitMs = 300000) {
const startTime = Date.now();
const pollInterval = 2000; // 2 seconds
while (Date.now() - startTime < maxWaitMs) {
const response = await fetch(
`https://api.alterlab.io/api/v1/jobs/${jobId}`,
{ headers: { 'X-API-Key': apiKey } }
);
const job = await response.json();
if (job.status === 'completed') {
return job.result;
} else if (job.status === 'failed') {
throw new Error(job.error);
}
// Wait before next poll
await new Promise(resolve => setTimeout(resolve, pollInterval));
}
throw new Error('Job polling timeout');
}WebSocket Alternative
For real-time updates without polling, use WebSocket connections to receive job status updates as they happen. This is more efficient than polling and provides instant notifications.
WebSocket Endpoint
wss://api.alterlab.io/api/v1/ws/jobs?api_key=YOUR_API_KEYProtocol Messages
{"action": "subscribe", "job_id": "<job-uuid>"}{"action": "unsubscribe", "job_id": "<job-uuid>"}{"action": "ping"}{"type": "job_update", "job_id": "...", "status": "running|completed|failed", "result": {...}, "error": null, "ts": 1730451136}{"type": "heartbeat", "ts": 1730451136}Example: JavaScript WebSocket Client
// Connect with API key authentication
const ws = new WebSocket('wss://api.alterlab.io/api/v1/ws/jobs?api_key=sk_live_...');
// Handle connection open
ws.onopen = () => {
console.log('WebSocket connected');
// Subscribe to job updates
ws.send(JSON.stringify({
action: 'subscribe',
job_id: '550e8400-e29b-41d4-a716-446655440000'
}));
};
// Receive real-time updates
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
switch (message.type) {
case 'connected':
console.log('✓ Connection established');
break;
case 'subscribed':
console.log('✓ Subscribed to job:', message.job_id);
break;
case 'job_update':
console.log('Job status:', message.status);
if (message.status === 'completed') {
console.log('✓ Job completed:', message.result);
ws.close();
} else if (message.status === 'failed') {
console.error('✗ Job failed:', message.error);
ws.close();
} else {
console.log('⟳ Job in progress...');
}
break;
case 'heartbeat':
// Server is alive
break;
case 'error':
console.error('WebSocket error:', message.message);
break;
}
};
ws.onerror = (error) => {
console.error('WebSocket connection error:', error);
};
ws.onclose = () => {
console.log('WebSocket disconnected');
};Example: Python WebSocket Client
import asyncio
import json
import websockets
async def watch_job(api_key: str, job_id: str):
uri = f"wss://api.alterlab.io/api/v1/ws/jobs?api_key={api_key}"
async with websockets.connect(uri) as ws:
# Subscribe to job
await ws.send(json.dumps({
"action": "subscribe",
"job_id": job_id
}))
# Listen for updates
async for message in ws:
data = json.loads(message)
if data["type"] == "job_update":
status = data["status"]
print(f"Job status: {status}")
if status == "completed":
print("Job completed:", data["result"])
break
elif status == "failed":
print("Job failed:", data["error"])
break
# Usage
asyncio.run(watch_job("sk_live_...", "550e8400-..."))WebSocket vs Polling
- WebSocket: Instant updates, lower latency, persistent connection, more efficient for long-running jobs
- Polling: Simpler implementation, works through proxies/firewalls, no persistent connection needed
- Recommendation: Use WebSocket for real-time dashboards, polling for simple scripts
Batch Scraping
Submit multiple URLs for scraping in a single request. Batch requests are processed asynchronously, and you can receive results via webhook or by polling individual job statuses.
/api/v1/batchSubmit a batch of URLs for asynchronous processing with optional webhook delivery.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| urls | string[] | Required | Array of URLs to scrape (max 1000 per batch) |
| mode | string | Optional | Scraping mode applied to all URLsDefault: auto |
| webhook_url | string | Optional | URL to receive results via POST webhook |
| advanced | AdvancedOptions | Optional | Advanced options applied to all URLs |
| cost_controls | CostControls | Optional | Cost controls applied to all URLs |
Request Example
curl -X POST https://api.alterlab.io/api/v1/batch \
-H "X-API-Key: sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"mode": "auto",
"webhook_url": "https://your-app.com/webhooks/scraping",
"cost_controls": {
"max_credits": 5
}
}'Response Example
{
"batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
"total_jobs": 3,
"job_ids": [
"550e8400-e29b-41d4-a716-446655440001",
"550e8400-e29b-41d4-a716-446655440002",
"550e8400-e29b-41d4-a716-446655440003"
],
"estimated_credits": 3,
"webhook_url": "https://your-app.com/webhooks/scraping",
"status": "pending"
}Webhook Payload Format
When a job completes, AlterLab sends a POST request to your webhook URL with this payload:
POST https://your-app.com/webhooks/scraping
Content-Type: application/json
X-AlterLab-Signature: sha256=... // Webhook signature for verification
{
"event": "job.completed",
"batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
"job_id": "550e8400-e29b-41d4-a716-446655440001",
"url": "https://example.com/page1",
"status": "completed",
"result": {
"url": "https://example.com/page1",
"status_code": 200,
"content": "...",
"billing": {
"total_credits": 1,
"tier_used": "1"
}
},
"completed_at": "2025-11-05T10:30:15Z"
}Python SDK Batch Example
from alterlab import AlterLabSync
client = AlterLabSync(api_key="YOUR_API_KEY")
# Submit batch with webhook
batch = client.batch_scrape(
urls=[
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
mode="auto",
webhook_url="https://your-app.com/webhooks/scraping"
)
print(f"Batch ID: {batch['batch_id']}")
print(f"Total jobs: {batch['total_jobs']}")
# Or poll each job individually
for job_id in batch['job_ids']:
result = client.poll_job(job_id)
print(f"Job {job_id}: {result['status']}")Batch Limits
- Maximum 1,000 URLs per batch request
- Webhook URL must be publicly accessible (HTTPS required)
- Webhook retries: 3 attempts with exponential backoff
- Batch jobs are processed in parallel (no guaranteed order)
Usage & Balance
Monitor your API usage, spending, and account limits.
/api/v1/usageGet current usage statistics and remaining balance for your account.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| start_date | string | Optional | Start date for usage period (ISO 8601 format) |
| end_date | string | Optional | End date for usage period (ISO 8601 format) |
Request Example
# Current billing period
curl -X GET https://api.alterlab.io/api/v1/usage \
-H "X-API-Key: sk_live_..."
# Specific date range
curl -X GET "https://api.alterlab.io/api/v1/usage?start_date=2025-11-01&end_date=2025-11-30" \
-H "X-API-Key: sk_live_..."Response Example
{
"period": {
"start": "2025-11-01T00:00:00Z",
"end": "2025-11-30T23:59:59Z",
"current": true
},
"balance": {
"current_cents": 7655,
"deposited_cents": 10000,
"used_cents": 2345
},
"requests": {
"total": 2345,
"successful": 2289,
"failed": 56,
"cached": 432
},
"by_mode": {
"html": 1234,
"js": 856,
"pdf": 145,
"ocr": 110
},
"by_tier": {
"1": 1234,
"2": 856,
"3": 145,
"4": 98,
"5": 12
},
"rate_limits": {
"requests_per_minute": 300,
"current_usage": 12,
"reset_at": "2025-11-05T10:31:00Z"
},
"spend_tier": {
"current": "growth",
"rolling_30d_spend_cents": 8500,
"next_tier_at_cents": 20000
}
}Python SDK Usage Example
from alterlab import AlterLabSync
from datetime import datetime, timedelta
client = AlterLabSync(api_key="YOUR_API_KEY")
# Get current usage
usage = client.get_usage()
print(f"Credits remaining: {usage['credits']['remaining']}")
print(f"Requests this period: {usage['requests']['total']}")
# Check if we have enough balance
if usage['credits']['remaining'] < 100:
print("Warning: Low balance! Time to top up.")
# Get usage for specific period
start = datetime.now() - timedelta(days=7)
weekly_usage = client.get_usage(
start_date=start.isoformat(),
end_date=datetime.now().isoformat()
)
print(f"Weekly requests: {weekly_usage['requests']['total']}")Monitoring Best Practices
- Check before large batches: Query /usage before submitting batch jobs to ensure sufficient balance
- Monitor rate limits: Track
rate_limits.current_usageto avoid 429 errors - Set up alerts: Monitor
credits.remainingand alert when below threshold - Track by tier: Use
by_tierbreakdown to optimize tier selection - Cache optimization: Monitor
requests.cachedratio to measure cache efficiency
Balance Management Tips
- Balance never expires - deposit and use at your own pace
- Rate limits scale automatically with your 30-day rolling spend
- Check
spend_tier.next_tier_at_centsto see next rate limit upgrade threshold - Deposit more funds anytime from the dashboard billing page
- Use
cost_controls.max_creditsto prevent budget overruns
Cost Estimation
/api/v1/scrape/estimateEstimate the cost of a scrape request without actually scraping.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| url | string | Required | The URL to estimate |
| mode | string | Optional | Scraping modeDefault: auto |
| advanced | AdvancedOptions | Optional | Advanced options to include in estimate |
Request Example
curl -X POST https://api.alterlab.io/api/v1/scrape/estimate \
-H "X-API-Key: sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"mode": "auto"
}'Response Example
{
"url": "https://example.com",
"estimated_tier": "1",
"estimated_credits": 1,
"confidence": "high",
"max_possible_credits": 10,
"reasoning": "Known simple site - tier 1 should work"
}Rate Limits
Rate limits vary by plan. Monitor your usage through the response headers.
| Header | Description | When Available |
|---|---|---|
| X-AlterLab-Credits | Cost charged for this request | Success (200) |
| X-AlterLab-Tier | Tier used for scraping | Success (200) |
| X-AlterLab-Savings | Savings vs highest tier | Success (200) |
| X-AlterLab-Bytes | Response size in bytes | Success (200) |
| X-AlterLab-Cached | Whether result was cached | Success (200) |
| X-RateLimit-Limit | Maximum requests per minute | Rate limit (429) |
| X-RateLimit-Remaining | Remaining requests in window | Rate limit (429) |
| X-RateLimit-Reset | Unix timestamp when limit resets | Rate limit (429) |
Rate Limit Information
X-AlterLab-* headers. Rate limit errors (429) use X-RateLimit-* headers.Error Handling
| Status Code | Meaning | Action |
|---|---|---|
| 200 | Success (sync response) | Process content directly |
| 202 | Accepted (async response) | Poll job_id for results |
| 400 | Bad Request | Check request parameters |
| 401 | Unauthorized | Verify API key |
| 402 | Payment Required | Insufficient balance, top up account |
| 404 | Not Found | Job doesn't exist or unauthorized |
| 415 | Unsupported Media Type | URL content type not supported |
| 422 | Unprocessable Entity | All tiers failed, site may be blocking |
| 429 | Too Many Requests | Wait for reset time or upgrade plan |
| 500 | Internal Server Error | Retry request, contact support if persists |
| 502 | Bad Gateway | Worker queue error, retry request |
Troubleshooting
Common Errors & Solutions
401 Unauthorized - Invalid API Key
Your API key is missing, invalid, or has been revoked.
- Verify API key format starts with
sk_live_orsk_test_ - Check that key is active in dashboard → API Keys
- Ensure header is
X-API-Key(case-sensitive) - Generate new API key if compromised
402 Payment Required - Insufficient Balance
Your account has insufficient balance for the current billing period.
- Check remaining balance:
GET /api/v1/usage - Upgrade plan in dashboard → Billing
- Add funds or wait for next billing cycle
422 Unprocessable Entity - All Tiers Failed
Site actively blocked all scraping attempts across all tier levels.
- Check if URL requires authentication (login, cookies)
- Verify URL is publicly accessible
- Try with
mode: "js"explicitly - Set
max_tier: "5"to enable CAPTCHA solving - Contact support if site should be accessible
429 Too Many Requests - Rate Limit Exceeded
You've exceeded your plan's rate limit (requests per minute).
- Check
X-RateLimit-Resetheader for reset time - Implement exponential backoff in your code
- Upgrade plan for higher rate limits
- Spread requests over time instead of bursting
202 Accepted → Polling Never Completes
Job returns 202 but polling /jobs/{job_id} never shows completed status.
- Check job status shows "running" or "pending" vs "failed"
- Worker service may be down - check status page
- Timeout may be too low - increase request
timeout - Use WebSocket for real-time updates instead
- Set reasonable polling timeout (5 minutes recommended)
Rate Limit Handling Best Practices
// JavaScript: Retry with exponential backoff
async function scrapeWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
const response = await fetch('https://api.alterlab.io/api/v1/scrape', {
method: 'POST',
headers: {
'X-API-Key': process.env.ALTERLAB_API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({ url })
});
if (response.status === 429) {
const resetTime = response.headers.get('X-RateLimit-Reset');
const waitTime = resetTime ?
(parseInt(resetTime) * 1000 - Date.now()) :
Math.pow(2, i) * 1000;
console.log(`Rate limited. Waiting ${waitTime}ms...`);
await new Promise(resolve => setTimeout(resolve, waitTime));
continue;
}
if (response.ok) {
return await response.json();
}
throw new Error(`HTTP ${response.status}`);
} catch (error) {
if (i === maxRetries - 1) throw error;
await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
}
}
}
// Python: Retry with backoff
import time
import requests
from requests.exceptions import RequestException
def scrape_with_retry(url: str, max_retries: int = 3):
for attempt in range(max_retries):
try:
response = requests.post(
'https://api.alterlab.io/api/v1/scrape',
headers={'X-API-Key': os.environ['ALTERLAB_API_KEY']},
json={'url': url}
)
if response.status_code == 429:
reset_time = response.headers.get('X-RateLimit-Reset')
wait_time = (
int(reset_time) - time.time()
if reset_time else 2 ** attempt
)
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(max(wait_time, 0))
continue
response.raise_for_status()
return response.json()
except RequestException as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)Cost Management Tips
- Estimate before scraping: Use
POST /api/v1/scrape/estimateto check costs before running expensive requests - Set cost controls: Always use
max_creditsin production to prevent unexpected charges from tier escalation - Monitor usage: Check
X-AlterLab-Creditsresponse header and track cumulative usage - Enable caching: Pass
cache: trueto cache responses. Subsequent requests to the same URL return cached results for free (caching is opt-in, disabled by default) - Optimize tier usage: Review
billing.optimization_suggestionin responses to improve cost efficiency
Still Having Issues?
If you're still experiencing problems after trying these solutions:
- Check service status: status.alterlab.io
- Review API audit report: Ensure you're using latest best practices
- Contact support: [email protected]
- Join Discord community: Get help from other developers
Include in support requests: job_id, timestamp, full error message, and API request/response