How do I authenticate with the AlterLab API?

Use the X-API-Key header with your API key for all requests. Get your key from the dashboard at alterlab.io/dashboard/keys.

What is the base URL for the AlterLab API?

The base URL is https://api.alterlab.io/v1. All endpoints are relative to this URL.

How do I scrape JavaScript-heavy websites?

Set render_js: true in your request body. This uses headless Chromium to render the page before returning HTML.

What formats does the API return?

The API returns JSON with HTML content, markdown, metadata, and optional structured data extraction using AI.

API Reference

REST API

Complete reference for the AlterLab REST API. All endpoints use JSON for requests and responses.

Base URL

Bash

https://api.alterlab.io/api/v1

Production-Ready Documentation

This documentation reflects the actual implementation. All code examples are tested and can be copied directly into your application.

AlterLab API Playground for testing scrape requests with live responses — Use the API Playground in your dashboard to test requests and see live responses before writing code.

Quick Start

Get started with the AlterLab API in under 2 minutes. Here's your first request:

Step 1: Get Your API Key

Step 2: Make Your First Request

Bash

# Using cURL
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Using Python
import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={"url": "https://example.com"}
)
print(response.json())

# Using Python SDK (Recommended)
from alterlab import AlterLabSync

client = AlterLabSync(api_key="YOUR_API_KEY")
result = client.scrape("https://example.com")
print(result["content"][:100])  # First 100 chars

Step 3: Handle the Response

Simple requests return 200 with content immediately. Complex requests return 202 with a job_id for polling.

JSON

{
  "url": "https://example.com",
  "status_code": 200,
  "content": "<!DOCTYPE html>...",
  "title": "Example Domain",
  "billing": {
    "total_credits": 1,
    "tier_used": "1"
  }
}

What's Next?

Learn about Python SDK for easier integration
Explore advanced options like screenshots and OCR
Understand tier escalation and cost controls

Getting Structured JSON

To get structured JSON data (product info, article metadata, etc.) instead of raw HTML, use the formats parameter:

Bash

# Get structured JSON from any page
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B0D5HMLP7S",
    "formats": ["json"]
  }'

The response will include auto-extracted structured data:

JSON

{
  "url": "https://www.amazon.com/dp/B0D5HMLP7S",
  "status_code": 200,
  "content": {
    "json": {
      "@type": "Product",
      "name": "Water Brush Pen Set",
      "price": 8.99,
      "currency": "USD",
      "rating": 4.4,
      "reviewCount": 47,
      "availability": "InStock",
      "images": ["https://..."],
      "specifications": {...},
      "reviews": [...]
    }
  }
}

No Schema Required

AlterLab automatically detects the page type (product, article, recipe, etc.) and extracts relevant fields. You only need extraction_schema if you want to filter the JSON to specific fields. See Structured Extraction for filtering options.

Authentication

All API requests require authentication using an API key. Include your API key in the X-API-Key header.

Keep your API key secure

Never expose your API key in client-side code or commit it to version control. Use environment variables to store sensitive credentials.

Unified Scrape Endpoint
Recommended

The unified endpoint handles all scraping modes through a single, intelligent interface. It automatically selects the optimal tier (1-5) based on site complexity, supports both synchronous (200) and asynchronous (202) execution patterns, and provides detailed billing breakdowns.

POST

/api/v1/scrape

Unified scraping endpoint with intelligent tier escalation, cost controls, and sync/async execution modes.

Parameters

Name	Type	Required	Description
url	`string`	Required	The URL of the web page to scrape
mode	`string`	Optional	Scraping mode: auto (default), html, js, pdf, or ocrDefault: `auto`
sync	`boolean`	Optional	Enable blocking mode: API polls internally until complete (60-120s max), returns 200. Set false for immediate 202 with job_id.Default: `true`
formats	`string[]`	Optional	Output formats: text, json, html, markdown. Use ['json'] to get structured data extraction.Default: `['markdown', 'json']`
advanced	`AdvancedOptions`	Optional	Advanced options: render_js, screenshot, generate_pdf, ocr, use_proxy, markdown, wait_condition
cost_controls	`CostControls`	Optional	Cost controls: max_cost, max_tier, prefer_cost, prefer_speed, fail_fast
force_refresh	`boolean`	Optional	Bypass cache and force fresh scrapeDefault: `false`
include_raw_html	`boolean`	Optional	Include raw HTML in responseDefault: `false`
timeout	`integer`	Optional	Request timeout in seconds (1-300)Default: `30`
extraction_schema	`object`	Optional	JSON Schema defining the structure you want extracted. Response will include filtered_content with data matching your schema. See JSON Schema Filtering guide.

Request Example

Bash

# Simple synchronous scrape (default)
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "mode": "auto"
  }'

# Async scrape with advanced options
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://spa-website.com",
    "mode": "js",
    "sync": false,
    "advanced": {
      "render_js": true,
      "screenshot": true,
      "markdown": true,
      "wait_condition": "networkidle"
    },
    "cost_controls": {
      "max_credits": 10,
      "prefer_cost": true
    }
  }'

Response Example

JSON

// Sync response (200) - Simple requests with sync=true (default)
{
  "url": "https://example.com",
  "status_code": 200,
  "content": {
    "html": "<!DOCTYPE html>...",
    "text": "Cleaned text content...",
    "json": {
      "title": "Example Domain",
      "description": "Example page",
      "metadata": {...}
    }
  },
  "title": "Example Domain",
  "metadata": {
    "description": "Example page",
    "keywords": ["example"]
  },
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "cached": false,
  "response_time_ms": 1234,
  "size_bytes": 15234,
  "screenshot_url": null,  // URL if screenshot: true (available 24h)
  "pdf_url": null,         // URL if generate_pdf: true (available 24h)
  "billing": {
    "total_credits": 1,
    "tier_used": "1",
    "escalations": [
      {
        "tier": "1",
        "result": "success",
        "credits": 1,
        "duration_ms": 234
      }
    ],
    "savings": 19
  },
  "extraction_method": "algorithmic",
  "version": "v1"
}

// Async response (202) - Complex requests with sync=false or auto-detected
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000"
}

Execution Modes

Sync (sync=true, default): API queues the job to the worker service and polls internally every 100-500ms. When the job completes (within 60-120 seconds), returns 200 with full content. No manual polling needed - simpler request/response pattern. If job exceeds timeout, falls back to 202 with job_id for manual polling.
Async (sync=false): Immediately returns 202 with job_id. You must poll /api/v1/jobs/{job_id} or use WebSocket for results. Recommended for long-running scrapes (>60s), batch operations, webhooks, or real-time updates via WebSocket.
Key difference: Both modes queue to the same worker service. sync=true just adds automatic polling on the API side, making it simpler for clients but with a timeout constraint. sync=false gives you full control over polling/WebSocket with no timeout limits.

When to Use Async Mode (sync=false)

Use sync: false for:

Scrapes expected to take longer than 60 seconds (large PDFs, complex sites)
Batch operations where you want to queue multiple jobs and poll together
When you need webhook delivery instead of polling
Real-time status updates via WebSocket

For quick scrapes (most websites), sync: true is simpler and eliminates manual polling code.

How Sync Mode Works (Under the Hood)

Understanding sync mode helps you choose the right execution pattern for your use case:

1. Job Queueing (Both Modes)

Regardless of sync setting, all scrape requests are queued to the worker service. This ensures consistent anti-bot capabilities, proxy management, and resource pooling.

2. Internal Polling (sync=true)

When sync=true, the API server holds your HTTP connection open and polls Redis every 100-500ms checking job status. When complete, it returns the full result as a 200 response. Maximum wait: 60-120 seconds.

3. Manual Polling (sync=false)

When sync=false, the API immediately returns 202 with job_id and closes the connection. You control polling frequency and timeout. No server-side timeout constraints.

4. Timeout Behavior

If sync=true and job exceeds timeout (60-120s), API returns 202 with job_id as fallback. You can then poll manually. This prevents hung connections while still supporting long-running jobs.

Key Insight: sync=true is purely a convenience feature. Both modes use the same worker infrastructure and have identical scraping capabilities. The only difference is who handles polling: the API server (sync=true) or your client code (sync=false).

Response Format

The content field may be a plain string (simple sync requests) or a structured object when using the formats parameter.

Simple sync (string)

JSON

{
  "url": "https://example.com",
  "status_code": 200,
  "content": "<!DOCTYPE html>...",
  "billing": {"total_credits": 1}
}

With formats (object)

JSON

{
  "url": "https://example.com",
  "status_code": 200,
  "content": {
    "html": "<!DOCTYPE html>...",
    "text": "Clean text...",
    "markdown": "# Example..."
  },
  "billing": {"total_credits": 1}
}

Python SDK
Recommended

The official Python SDK provides a simple, intuitive interface for the AlterLab API with automatic polling, retry logic, and type hints.

Installation

Bash

pip install alterlab

Async vs Sync

The SDK provides two clients: AlterLab (async, requires await) and AlterLabSync (sync, no await needed). Choose based on your codebase.

Sync Usage (Recommended for scripts)

Python

from alterlab import AlterLabSync

# Use AlterLabSync for synchronous code (no await needed)
with AlterLabSync(api_key="YOUR_API_KEY") as client:
    result = client.scrape("https://example.com")
    print(result["content"])
    print(f"Cost: {result['billing']['total_credits']}")

    # With schema filtering
    result = client.scrape(
        url="https://amazon.com/dp/B0123456789",
        extraction_schema={
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"},
                "in_stock": {"type": "boolean"}
            }
        }
    )
    product = result["filtered_content"]  # Your filtered data
    print(f"Product: {product['name']} - ${product['price']}")

Async Usage (For async applications)

Python

import asyncio
from alterlab import AlterLab

async def main():
    # AlterLab is async - requires await
    async with AlterLab(api_key="YOUR_API_KEY") as client:
        result = await client.scrape("https://example.com")
        print(result["content"])

        # With JS rendering
        result = await client.scrape(
            url="https://spa-app.com",
            mode="js",
            advanced=AdvancedOptions(render_js=True, screenshot=True)
        )

asyncio.run(main())

Advanced Options

Python

from alterlab import AlterLabSync, AdvancedOptions, CostControls

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    # With advanced options
    result = client.scrape(
        url="https://example.com",
        mode="js",
        advanced=AdvancedOptions(
            render_js=True,
            screenshot=True,
            markdown=True,
            wait_condition="networkidle"
        ),
        cost_controls=CostControls(
            max_credits=10,
            max_tier="4",
            prefer_cost=True
        )
    )

    # Screenshot saved to result["screenshot_url"]
    # Markdown in result["content"]["markdown"] if formats specified

Cost Estimation

Python

from alterlab import AlterLabSync, AdvancedOptions

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    # Estimate cost before scraping
    estimate = client.estimate_cost(
        url="https://example.com",
        mode="auto",
        advanced=AdvancedOptions(render_js=True, screenshot=True)
    )

    print(f"Estimated credits: {estimate['estimated_credits']}")
    print(f"Max possible: {estimate['max_possible_credits']}")

    # Only scrape if within budget
    if estimate['estimated_credits'] <= 5:
        result = client.scrape(url="https://example.com")

Error Handling

Python

from alterlab import AlterLabSync, AlterLabAPIError, AlterLabTimeoutError

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    try:
        result = client.scrape("https://example.com")
    except AlterLabAPIError as e:
        if e.status_code == 402:
            print("Insufficient balance!")
        elif e.status_code == 429:
            print(f"Rate limited. Retry after {e.retry_after}s")
        else:
            print(f"API error: {e.detail}")
    except AlterLabTimeoutError:
        print("Request timed out")
    except Exception as e:
        print(f"Unexpected error: {e}")

Manual Job Management

Python

# Get job status without waiting
job_id = "550e8400-e29b-41d4-a716-446655440000"
status = client.get_job_status(job_id)
print(f"Job status: {status['status']}")

# Poll job with custom settings
result = client.poll_job(
    job_id=job_id,
    poll_interval=2.0,  # Check every 2 seconds
    poll_timeout=300.0   # Give up after 5 minutes
)

Configuration

Python

# Custom configuration (sync client)
client = AlterLabSync(
    api_key="YOUR_API_KEY",
    base_url="https://api.alterlab.io",  # API base (paths include /api/v1)
    timeout=60.0,  # Request timeout
    max_retries=3,  # Retry failed requests
    retry_backoff=2.0  # Exponential backoff multiplier
)

# Or use environment variable
# export ALTERLAB_API_KEY="YOUR_API_KEY"
client = AlterLabSync()  # Auto-loads from env

SDK Benefits

Automatic polling: Handles 202 responses and job polling for you
Retry logic: Automatically retries failed requests with exponential backoff
Type hints: Full type annotations for IDE autocomplete
Error handling: Custom exception types for different error scenarios
Convenience methods: Mode-specific methods for common use cases

Structured Extraction (Optional Filtering)

Already Getting JSON?

If you just want structured JSON data, use formats: ["json"] as shown in Getting Structured JSON. This section is for filtering that JSON to specific fields.

Filter and restructure extracted data to match your desired output format using JSON Schema. This is pure data transformation - no additional cost.

JSON Schema Filtering

Pass extraction_schema to filter extracted data to your desired structure. The filtered result appears in filtered_content:

Python

# Extract product data with schema
result = client.scrape(
    url="https://amazon.com/dp/B0123456789",
    mode="auto",
    extraction_schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "currency": {"type": "string"},
            "in_stock": {"type": "boolean"},
            "rating": {"type": "number"},
            "reviews_count": {"type": "integer"}
        }
    }
)

# Access filtered data (matches your schema)
product = result["filtered_content"]
print(f"Name: {product['name']}")
print(f"Price: {product['price']}")
print(f"In Stock: {product['in_stock']}")

Field Aliases

Schema filtering automatically maps common field name variations:

in_stock → availability

sku → asin, product_id

image_urls → images

title → name, headline

See JSON Schema Filtering guide for the complete list.

Response Structure

JSON

{
  "url": "https://amazon.com/dp/B0123456789",
  "status_code": 200,
  "content": { ... },           // Full extraction (Schema.org, metadata, etc.)
  "filtered_content": {         // YOUR filtered data (only when extraction_schema provided)
    "name": "Product Name",
    "price": 29.99,
    "in_stock": true
  },
  "billing": { "total_credits": 3 }
}

Zero Additional Cost

Schema filtering is pure data transformation - no LLM calls, no extra charges. It filters existing structured data (Schema.org, Open Graph, playbook extractions) to match your schema.

Advanced Options

Advanced options provide fine-grained control over scraping behavior and enable premium features.

Option	Type	Cost	Description
render_js	boolean	+3	Use headless browser for JavaScript rendering
screenshot	boolean	+1	Capture full-page screenshot (requires render_js)
generate_pdf	boolean	+2	Generate PDF of rendered page (requires render_js)
ocr	boolean	+5	Extract text from images using OCR
use_proxy	boolean	+1	Route through premium proxy network
markdown	boolean	Free	Convert content to Markdown format
wait_condition	string	Free	Wait condition: domcontentloaded, networkidle, load
remove_cookie_banners	boolean	Free	Remove cookie consent banners before extraction (default: true)
use_own_proxy	boolean	$0.0008	Use your integrated proxy instead of system proxy
use_system_proxy	boolean	+1	Override your default proxy integration to use AlterLab system proxy
proxy_integration_id	string	-	Specific proxy integration ID (requires use_own_proxy)
proxy_country	string	-	Preferred proxy country code for geo-targeting (e.g., US, DE)
session_id	UUID	-	Stored session ID for authenticated scraping (injects cookies and headers)
cookies	object	-	Inline cookies for one-off auth scraping (max 100, mutually exclusive with session_id)
session_headers	object	-	Inline auth headers (e.g., Authorization: Bearer). Max 50 headers.

Example with Advanced Options

JSON

{
  "url": "https://spa-app.com",
  "mode": "auto",
  "advanced": {
    "render_js": true,
    "screenshot": true,
    "generate_pdf": true,
    "markdown": true,
    "use_proxy": true,
    "wait_condition": "networkidle"
  }
}

// Response includes download URLs (available for 24 hours):
// "screenshot_url": "https://alterlab.io/downloads/screenshots/2025-01-15/job-id.png"
// "pdf_url": "https://alterlab.io/downloads/pdfs/2025-01-15/job-id.pdf"

Cost Controls

The cost_controls object lets you control tier escalation behavior and set budget limits per request.

Parameter	Type	Description
max_credits	number	Maximum credits to spend on this request. Request fails if cost would exceed this.
force_tier	string	Force a specific tier (1, 2, 3, 3.5, 4). Skips escalation entirely.
max_tier	string	Maximum tier to escalate to. Prevents escalation beyond this level.
prefer_cost	boolean	Optimize for cost: try cheaper tiers first before escalating.
prefer_speed	boolean	Optimize for speed: skip to the most reliable tier immediately.
fail_fast	boolean	Return error instead of escalating to expensive tiers.

JSON

{
  "url": "https://example.com",
  "mode": "auto",
  "cost_controls": {
    "max_credits": 5,
    "max_tier": "4",
    "prefer_cost": true,
    "fail_fast": false
  }
}

Tier Escalation & Cost Controls

AlterLab uses an intelligent 5-tier escalation system that automatically tries the cheapest method first and escalates only when needed. Each tier has different capabilities, speeds, and costs.

Tier	Name	Cost	Per $1	Description
1	Curl	$0.0002	5,000	Ultra-fast curl binary for static sites
2	HTTP	$0.0003	3,333	HTTPX with TLS fingerprinting and HTTP/2
3	Stealth	$0.002	500	curl_cffi with Chrome browser impersonation
4	Browser	$0.004	250	Playwright browser automation for JS sites
5	Captcha	$0.02	50	Browser with AI-powered captcha solving

How Escalation Works

Start cheapest: By default, starts at Tier 1 (Curl: $0.0002)
Attempt scrape: Tries to scrape with current tier's method
Check success: If successful (status 200, valid content), stop and return result
Escalate if failed: If failed (timeout, blocked, error), move to next tier and retry
Stop at success or max tier: Returns when successful or when max_tier/max_credits reached
Detailed billing: Response includes all attempts, final tier used, and cost saved

Cost Control Parameters

max_credits(float, optional)

Maximum cost to spend on this request. API will not escalate beyond this budget. Example: max_credits: 0.004 stops at Tier 4 (Browser).

max_tier(string, optional)

Maximum tier to escalate to: "1" (Curl), "2" (HTTP), "3" (Stealth), "4" (Browser), or "5" (Captcha). Example: max_tier: "4" stops at Browser.

prefer_cost(boolean, default: false)

Start with cheapest tier (Tier 1) and try each tier sequentially. Best for known simple sites.

prefer_speed(boolean, default: false)

Start with Tier 4 (Browser) for guaranteed success on most sites. Higher cost but faster overall.

fail_fast(boolean, default: false)

Return error instead of escalating to expensive tiers. Useful when you want predictable costs.

Example: Cost-Optimized Request

JSON

{
  "url": "https://example.com",
  "mode": "auto",
  "cost_controls": {
    "max_credits": 0.004,
    "max_tier": "4",
    "prefer_cost": true,
    "fail_fast": false
  }
}

// Response billing breakdown:
{
  "billing": {
    "total_cost": 0.002,
    "tier_used": "3",
    "escalations": [
      {"tier": "1", "result": "failed", "cost": 0.0002, "duration_ms": 250, "error": "403 Forbidden"},
      {"tier": "2", "result": "failed", "cost": 0.0003, "duration_ms": 2100, "error": "Blocked by WAF"},
      {"tier": "3", "result": "success", "cost": 0.002, "duration_ms": 4200}
    ],
    "optimization_suggestion": "Site requires Stealth tier. Consider using prefer_speed with Tier 4 for faster results."
  }
}

Cost Control Best Practices

Always set max_credits for production to prevent unexpected charges
Use prefer_cost: true for known simple sites
Use prefer_speed: true for critical scrapers where reliability matters more than cost
Set fail_fast: true in testing to avoid unnecessary spending on misconfigured requests

Async Mode & Job Polling

Complex scraping requests return a 202 status with a job_id. Poll the job status endpoint to retrieve results.

GET

/api/v1/jobs/{job_id}

Poll job status and retrieve results when completed.

Parameters

Name	Type	Required	Description
job_id	`string`	Required	UUID of the job returned from the scrape endpoint

Request Example

Bash

curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
  -H "X-API-Key: sk_live_..."

Response Example

JSON

// Status: pending
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "url": "https://example.com",
  "mode": "auto",
  "created_at": "2025-11-05T10:30:00Z"
}

// Status: running
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "running",
  "url": "https://example.com",
  "mode": "auto",
  "progress": 50,
  "created_at": "2025-11-05T10:30:00Z"
}

// Status: completed
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "url": "https://example.com",
  "mode": "auto",
  "result": {
    "url": "https://example.com",
    "status_code": 200,
    "content": {
      "html": "...",
      "text": "...",
      "json": {...}
    },
    "billing": {
      "total_credits": 5,
      "tier_used": "4"
    }
  },
  "created_at": "2025-11-05T10:30:00Z",
  "completed_at": "2025-11-05T10:30:15Z"
}

// Status: failed
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "url": "https://example.com",
  "mode": "auto",
  "error": "Timeout after 30 seconds",
  "created_at": "2025-11-05T10:30:00Z",
  "failed_at": "2025-11-05T10:30:30Z"
}

Polling Best Practices

Recommended interval: Poll every 2-5 seconds
Timeout: Set a maximum polling duration (5 minutes recommended)
Exponential backoff: Increase polling interval if job takes longer
Status values: pending → running → completed/failed

Example Polling Loop (JavaScript)

JAVASCRIPT

async function pollJobStatus(jobId, apiKey, maxWaitMs = 300000) {
  const startTime = Date.now();
  const pollInterval = 2000; // 2 seconds

  while (Date.now() - startTime < maxWaitMs) {
    const response = await fetch(
      `https://api.alterlab.io/api/v1/jobs/${jobId}`,
      { headers: { 'X-API-Key': apiKey } }
    );

    const job = await response.json();

    if (job.status === 'completed') {
      return job.result;
    } else if (job.status === 'failed') {
      throw new Error(job.error);
    }

    // Wait before next poll
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }

  throw new Error('Job polling timeout');
}

WebSocket Alternative

For real-time updates without polling, use WebSocket connections to receive job status updates as they happen. This is more efficient than polling and provides instant notifications.

WebSocket Endpoint

wss://api.alterlab.io/api/v1/ws/jobs?api_key=YOUR_API_KEY

Protocol Messages

Client → Server (Subscribe):

JSON

{"action": "subscribe", "job_id": "<job-uuid>"}

Client → Server (Unsubscribe):

JSON

{"action": "unsubscribe", "job_id": "<job-uuid>"}

Client → Server (Ping):

JSON

{"action": "ping"}

Server → Client (Job Update):

JSON

{"type": "job_update", "job_id": "...", "status": "running|completed|failed", "result": {...}, "error": null, "ts": 1730451136}

Server → Client (Heartbeat):

JSON

{"type": "heartbeat", "ts": 1730451136}

Example: JavaScript WebSocket Client

JAVASCRIPT

// Connect with API key authentication
const ws = new WebSocket('wss://api.alterlab.io/api/v1/ws/jobs?api_key=sk_live_...');

// Handle connection open
ws.onopen = () => {
  console.log('WebSocket connected');

  // Subscribe to job updates
  ws.send(JSON.stringify({
    action: 'subscribe',
    job_id: '550e8400-e29b-41d4-a716-446655440000'
  }));
};

// Receive real-time updates
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);

  switch (message.type) {
    case 'connected':
      console.log('✓ Connection established');
      break;

    case 'subscribed':
      console.log('✓ Subscribed to job:', message.job_id);
      break;

    case 'job_update':
      console.log('Job status:', message.status);

      if (message.status === 'completed') {
        console.log('✓ Job completed:', message.result);
        ws.close();
      } else if (message.status === 'failed') {
        console.error('✗ Job failed:', message.error);
        ws.close();
      } else {
        console.log('⟳ Job in progress...');
      }
      break;

    case 'heartbeat':
      // Server is alive
      break;

    case 'error':
      console.error('WebSocket error:', message.message);
      break;
  }
};

ws.onerror = (error) => {
  console.error('WebSocket connection error:', error);
};

ws.onclose = () => {
  console.log('WebSocket disconnected');
};

Example: Python WebSocket Client

Python

import asyncio
import json
import websockets

async def watch_job(api_key: str, job_id: str):
    uri = f"wss://api.alterlab.io/api/v1/ws/jobs?api_key={api_key}"

    async with websockets.connect(uri) as ws:
        # Subscribe to job
        await ws.send(json.dumps({
            "action": "subscribe",
            "job_id": job_id
        }))

        # Listen for updates
        async for message in ws:
            data = json.loads(message)

            if data["type"] == "job_update":
                status = data["status"]
                print(f"Job status: {status}")

                if status == "completed":
                    print("Job completed:", data["result"])
                    break
                elif status == "failed":
                    print("Job failed:", data["error"])
                    break

# Usage
asyncio.run(watch_job("sk_live_...", "550e8400-..."))

WebSocket vs Polling

WebSocket: Instant updates, lower latency, persistent connection, more efficient for long-running jobs
Polling: Simpler implementation, works through proxies/firewalls, no persistent connection needed
Recommendation: Use WebSocket for real-time dashboards, polling for simple scripts

Batch Scraping

Submit multiple URLs for scraping in a single request. Batch requests are processed asynchronously, and you can receive results via webhook or by polling individual job statuses.

POST

/api/v1/batch

Submit a batch of URLs for asynchronous processing with optional webhook delivery.

Parameters

Name	Type	Required	Description
urls	`string[]`	Required	Array of URLs to scrape (max 1000 per batch)
mode	`string`	Optional	Scraping mode applied to all URLsDefault: `auto`
webhook_url	`string`	Optional	URL to receive results via POST webhook
advanced	`AdvancedOptions`	Optional	Advanced options applied to all URLs
cost_controls	`CostControls`	Optional	Cost controls applied to all URLs

Request Example

Bash

curl -X POST https://api.alterlab.io/api/v1/batch \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "mode": "auto",
    "webhook_url": "https://your-app.com/webhooks/scraping",
    "cost_controls": {
      "max_credits": 5
    }
  }'

Response Example

JSON

{
  "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
  "total_jobs": 3,
  "job_ids": [
    "550e8400-e29b-41d4-a716-446655440001",
    "550e8400-e29b-41d4-a716-446655440002",
    "550e8400-e29b-41d4-a716-446655440003"
  ],
  "estimated_credits": 3,
  "webhook_url": "https://your-app.com/webhooks/scraping",
  "status": "pending"
}

Webhook Payload Format

When a job completes, AlterLab sends a POST request to your webhook URL with this payload:

Bash

POST https://your-app.com/webhooks/scraping
Content-Type: application/json
X-AlterLab-Signature: sha256=...  // Webhook signature for verification

{
  "event": "job.completed",
  "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
  "job_id": "550e8400-e29b-41d4-a716-446655440001",
  "url": "https://example.com/page1",
  "status": "completed",
  "result": {
    "url": "https://example.com/page1",
    "status_code": 200,
    "content": "...",
    "billing": {
      "total_credits": 1,
      "tier_used": "1"
    }
  },
  "completed_at": "2025-11-05T10:30:15Z"
}

Python SDK Batch Example

Python

from alterlab import AlterLabSync

client = AlterLabSync(api_key="YOUR_API_KEY")

# Submit batch with webhook
batch = client.batch_scrape(
    urls=[
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3"
    ],
    mode="auto",
    webhook_url="https://your-app.com/webhooks/scraping"
)

print(f"Batch ID: {batch['batch_id']}")
print(f"Total jobs: {batch['total_jobs']}")

# Or poll each job individually
for job_id in batch['job_ids']:
    result = client.poll_job(job_id)
    print(f"Job {job_id}: {result['status']}")

Batch Limits

Maximum 1,000 URLs per batch request
Webhook URL must be publicly accessible (HTTPS required)
Webhook retries: 3 attempts with exponential backoff
Batch jobs are processed in parallel (no guaranteed order)

Usage & Balance

Monitor your API usage, spending, and account limits.

GET

/api/v1/usage

Get current usage statistics and remaining balance for your account.

Parameters

Name	Type	Required	Description
start_date	`string`	Optional	Start date for usage period (ISO 8601 format)
end_date	`string`	Optional	End date for usage period (ISO 8601 format)

Request Example

Bash

# Current billing period
curl -X GET https://api.alterlab.io/api/v1/usage \
  -H "X-API-Key: sk_live_..."

# Specific date range
curl -X GET "https://api.alterlab.io/api/v1/usage?start_date=2025-11-01&end_date=2025-11-30" \
  -H "X-API-Key: sk_live_..."

Response Example

JSON

{
  "period": {
    "start": "2025-11-01T00:00:00Z",
    "end": "2025-11-30T23:59:59Z",
    "current": true
  },
  "balance": {
    "current_cents": 7655,
    "deposited_cents": 10000,
    "used_cents": 2345
  },
  "requests": {
    "total": 2345,
    "successful": 2289,
    "failed": 56,
    "cached": 432
  },
  "by_mode": {
    "html": 1234,
    "js": 856,
    "pdf": 145,
    "ocr": 110
  },
  "by_tier": {
    "1": 1234,
    "2": 856,
    "3": 145,
    "4": 98,
    "5": 12
  },
  "rate_limits": {
    "requests_per_minute": 300,
    "current_usage": 12,
    "reset_at": "2025-11-05T10:31:00Z"
  },
  "spend_tier": {
    "current": "growth",
    "rolling_30d_spend_cents": 8500,
    "next_tier_at_cents": 20000
  }
}

Python SDK Usage Example

Python

from alterlab import AlterLabSync
from datetime import datetime, timedelta

client = AlterLabSync(api_key="YOUR_API_KEY")

# Get current usage
usage = client.get_usage()
print(f"Credits remaining: {usage['credits']['remaining']}")
print(f"Requests this period: {usage['requests']['total']}")

# Check if we have enough balance
if usage['credits']['remaining'] < 100:
    print("Warning: Low balance! Time to top up.")

# Get usage for specific period
start = datetime.now() - timedelta(days=7)
weekly_usage = client.get_usage(
    start_date=start.isoformat(),
    end_date=datetime.now().isoformat()
)
print(f"Weekly requests: {weekly_usage['requests']['total']}")

Monitoring Best Practices

Check before large batches: Query /usage before submitting batch jobs to ensure sufficient balance
Monitor rate limits: Track rate_limits.current_usage to avoid 429 errors
Set up alerts: Monitor credits.remaining and alert when below threshold
Track by tier: Use by_tier breakdown to optimize tier selection
Cache optimization: Monitor requests.cached ratio to measure cache efficiency

Balance Management Tips

Balance never expires - deposit and use at your own pace
Rate limits scale automatically with your 30-day rolling spend
Check spend_tier.next_tier_at_cents to see next rate limit upgrade threshold
Deposit more funds anytime from the dashboard billing page
Use cost_controls.max_credits to prevent budget overruns

Cost Estimation

POST

/api/v1/scrape/estimate

Estimate the cost of a scrape request without actually scraping.

Parameters

Name	Type	Required	Description
url	`string`	Required	The URL to estimate
mode	`string`	Optional	Scraping modeDefault: `auto`
advanced	`AdvancedOptions`	Optional	Advanced options to include in estimate

Request Example

Bash

curl -X POST https://api.alterlab.io/api/v1/scrape/estimate \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "mode": "auto"
  }'

Response Example

JSON

{
  "url": "https://example.com",
  "estimated_tier": "1",
  "estimated_credits": 1,
  "confidence": "high",
  "max_possible_credits": 10,
  "reasoning": "Known simple site - tier 1 should work"
}

Rate Limits

Rate limits vary by plan. Monitor your usage through the response headers.

Header	Description	When Available
X-AlterLab-Credits	Cost charged for this request	Success (200)
X-AlterLab-Tier	Tier used for scraping	Success (200)
X-AlterLab-Savings	Savings vs highest tier	Success (200)
X-AlterLab-Bytes	Response size in bytes	Success (200)
X-AlterLab-Cached	Whether result was cached	Success (200)
X-RateLimit-Limit	Maximum requests per minute	Rate limit (429)
X-RateLimit-Remaining	Remaining requests in window	Rate limit (429)
X-RateLimit-Reset	Unix timestamp when limit resets	Rate limit (429)

Rate Limit Information

Success responses (200, 202) use X-AlterLab-* headers. Rate limit errors (429) use X-RateLimit-* headers.

Error Handling

Status Code	Meaning	Action
200	Success (sync response)	Process content directly
202	Accepted (async response)	Poll job_id for results
400	Bad Request	Check request parameters
401	Unauthorized	Verify API key
402	Payment Required	Insufficient balance, top up account
404	Not Found	Job doesn't exist or unauthorized
415	Unsupported Media Type	URL content type not supported
422	Unprocessable Entity	All tiers failed, site may be blocking
429	Too Many Requests	Wait for reset time or upgrade plan
500	Internal Server Error	Retry request, contact support if persists
502	Bad Gateway	Worker queue error, retry request

Troubleshooting

Common Errors & Solutions

401 Unauthorized - Invalid API Key

Your API key is missing, invalid, or has been revoked.

Verify API key format starts with sk_live_ or sk_test_
Check that key is active in dashboard → API Keys
Ensure header is X-API-Key (case-sensitive)
Generate new API key if compromised

402 Payment Required - Insufficient Balance

Your account has insufficient balance for the current billing period.

Check remaining balance: GET /api/v1/usage
Upgrade plan in dashboard → Billing
Add funds or wait for next billing cycle

422 Unprocessable Entity - All Tiers Failed

Site actively blocked all scraping attempts across all tier levels.

Check if URL requires authentication (login, cookies)
Verify URL is publicly accessible
Try with mode: "js" explicitly
Set max_tier: "5" to enable CAPTCHA solving
Contact support if site should be accessible

429 Too Many Requests - Rate Limit Exceeded

You've exceeded your plan's rate limit (requests per minute).

Check X-RateLimit-Reset header for reset time
Implement exponential backoff in your code
Upgrade plan for higher rate limits
Spread requests over time instead of bursting

202 Accepted → Polling Never Completes

Job returns 202 but polling /jobs/{job_id} never shows completed status.

Check job status shows "running" or "pending" vs "failed"
Worker service may be down - check status page
Timeout may be too low - increase request timeout
Use WebSocket for real-time updates instead
Set reasonable polling timeout (5 minutes recommended)

Rate Limit Handling Best Practices

JAVASCRIPT

// JavaScript: Retry with exponential backoff
async function scrapeWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch('https://api.alterlab.io/api/v1/scrape', {
        method: 'POST',
        headers: {
          'X-API-Key': process.env.ALTERLAB_API_KEY,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ url })
      });

      if (response.status === 429) {
        const resetTime = response.headers.get('X-RateLimit-Reset');
        const waitTime = resetTime ?
          (parseInt(resetTime) * 1000 - Date.now()) :
          Math.pow(2, i) * 1000;

        console.log(`Rate limited. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      if (response.ok) {
        return await response.json();
      }

      throw new Error(`HTTP ${response.status}`);

    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
    }
  }
}

// Python: Retry with backoff
import time
import requests
from requests.exceptions import RequestException

def scrape_with_retry(url: str, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                'https://api.alterlab.io/api/v1/scrape',
                headers={'X-API-Key': os.environ['ALTERLAB_API_KEY']},
                json={'url': url}
            )

            if response.status_code == 429:
                reset_time = response.headers.get('X-RateLimit-Reset')
                wait_time = (
                    int(reset_time) - time.time()
                    if reset_time else 2 ** attempt
                )
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(max(wait_time, 0))
                continue

            response.raise_for_status()
            return response.json()

        except RequestException as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

Cost Management Tips

Estimate before scraping: Use POST /api/v1/scrape/estimateto check costs before running expensive requests
Set cost controls: Always use max_creditsin production to prevent unexpected charges from tier escalation
Monitor usage: Check X-AlterLab-Creditsresponse header and track cumulative usage
Enable caching: Pass cache: true to cache responses. Subsequent requests to the same URL return cached results for free (caching is opt-in, disabled by default)
Optimize tier usage: Review billing.optimization_suggestionin responses to improve cost efficiency

Still Having Issues?

If you're still experiencing problems after trying these solutions:

Check service status: status.alterlab.io
Review API audit report: Ensure you're using latest best practices
Contact support: [email protected]
Join Discord community: Get help from other developers

Include in support requests: job_id, timestamp, full error message, and API request/response

Your First Request Job Polling

Last updated: March 2026

API Reference

REST API

Complete reference for the AlterLab REST API. All endpoints use JSON for requests and responses.

Base URL

Bash

https://api.alterlab.io/api/v1

Production-Ready Documentation

This documentation reflects the actual implementation. All code examples are tested and can be copied directly into your application.

Quick Start

Get started with the AlterLab API in under 2 minutes. Here's your first request:

Step 1: Get Your API Key

Step 2: Make Your First Request

Bash

# Using cURL
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Using Python
import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={"url": "https://example.com"}
)
print(response.json())

# Using Python SDK (Recommended)
from alterlab import AlterLabSync

client = AlterLabSync(api_key="YOUR_API_KEY")
result = client.scrape("https://example.com")
print(result["content"][:100])  # First 100 chars

Step 3: Handle the Response

Simple requests return 200 with content immediately. Complex requests return 202 with a job_id for polling.

JSON

{
  "url": "https://example.com",
  "status_code": 200,
  "content": "<!DOCTYPE html>...",
  "title": "Example Domain",
  "billing": {
    "total_credits": 1,
    "tier_used": "1"
  }
}

What's Next?

Learn about Python SDK for easier integration
Explore advanced options like screenshots and OCR
Understand tier escalation and cost controls

Getting Structured JSON

To get structured JSON data (product info, article metadata, etc.) instead of raw HTML, use the formats parameter:

Bash

# Get structured JSON from any page
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B0D5HMLP7S",
    "formats": ["json"]
  }'

The response will include auto-extracted structured data:

JSON

{
  "url": "https://www.amazon.com/dp/B0D5HMLP7S",
  "status_code": 200,
  "content": {
    "json": {
      "@type": "Product",
      "name": "Water Brush Pen Set",
      "price": 8.99,
      "currency": "USD",
      "rating": 4.4,
      "reviewCount": 47,
      "availability": "InStock",
      "images": ["https://..."],
      "specifications": {...},
      "reviews": [...]
    }
  }
}

No Schema Required

Authentication

All API requests require authentication using an API key. Include your API key in the X-API-Key header.

Keep your API key secure

Never expose your API key in client-side code or commit it to version control. Use environment variables to store sensitive credentials.

Unified Scrape Endpoint
Recommended

POST

/api/v1/scrape

Unified scraping endpoint with intelligent tier escalation, cost controls, and sync/async execution modes.

Parameters

Name	Type	Required	Description
url	`string`	Required	The URL of the web page to scrape
mode	`string`	Optional	Scraping mode: auto (default), html, js, pdf, or ocrDefault: `auto`
sync	`boolean`	Optional	Enable blocking mode: API polls internally until complete (60-120s max), returns 200. Set false for immediate 202 with job_id.Default: `true`
formats	`string[]`	Optional	Output formats: text, json, html, markdown. Use ['json'] to get structured data extraction.Default: `['markdown', 'json']`
advanced	`AdvancedOptions`	Optional	Advanced options: render_js, screenshot, generate_pdf, ocr, use_proxy, markdown, wait_condition
cost_controls	`CostControls`	Optional	Cost controls: max_cost, max_tier, prefer_cost, prefer_speed, fail_fast
force_refresh	`boolean`	Optional	Bypass cache and force fresh scrapeDefault: `false`
include_raw_html	`boolean`	Optional	Include raw HTML in responseDefault: `false`
timeout	`integer`	Optional	Request timeout in seconds (1-300)Default: `30`
extraction_schema	`object`	Optional	JSON Schema defining the structure you want extracted. Response will include filtered_content with data matching your schema. See JSON Schema Filtering guide.

Request Example

Bash

# Simple synchronous scrape (default)
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "mode": "auto"
  }'

# Async scrape with advanced options
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://spa-website.com",
    "mode": "js",
    "sync": false,
    "advanced": {
      "render_js": true,
      "screenshot": true,
      "markdown": true,
      "wait_condition": "networkidle"
    },
    "cost_controls": {
      "max_credits": 10,
      "prefer_cost": true
    }
  }'

Response Example

JSON

// Sync response (200) - Simple requests with sync=true (default)
{
  "url": "https://example.com",
  "status_code": 200,
  "content": {
    "html": "<!DOCTYPE html>...",
    "text": "Cleaned text content...",
    "json": {
      "title": "Example Domain",
      "description": "Example page",
      "metadata": {...}
    }
  },
  "title": "Example Domain",
  "metadata": {
    "description": "Example page",
    "keywords": ["example"]
  },
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "cached": false,
  "response_time_ms": 1234,
  "size_bytes": 15234,
  "screenshot_url": null,  // URL if screenshot: true (available 24h)
  "pdf_url": null,         // URL if generate_pdf: true (available 24h)
  "billing": {
    "total_credits": 1,
    "tier_used": "1",
    "escalations": [
      {
        "tier": "1",
        "result": "success",
        "credits": 1,
        "duration_ms": 234
      }
    ],
    "savings": 19
  },
  "extraction_method": "algorithmic",
  "version": "v1"
}

// Async response (202) - Complex requests with sync=false or auto-detected
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000"
}

Execution Modes

Sync (sync=true, default): API queues the job to the worker service and polls internally every 100-500ms. When the job completes (within 60-120 seconds), returns 200 with full content. No manual polling needed - simpler request/response pattern. If job exceeds timeout, falls back to 202 with job_id for manual polling.
Async (sync=false): Immediately returns 202 with job_id. You must poll /api/v1/jobs/{job_id} or use WebSocket for results. Recommended for long-running scrapes (>60s), batch operations, webhooks, or real-time updates via WebSocket.
Key difference: Both modes queue to the same worker service. sync=true just adds automatic polling on the API side, making it simpler for clients but with a timeout constraint. sync=false gives you full control over polling/WebSocket with no timeout limits.

When to Use Async Mode (sync=false)

Use sync: false for:

Scrapes expected to take longer than 60 seconds (large PDFs, complex sites)
Batch operations where you want to queue multiple jobs and poll together
When you need webhook delivery instead of polling
Real-time status updates via WebSocket

For quick scrapes (most websites), sync: true is simpler and eliminates manual polling code.

How Sync Mode Works (Under the Hood)

Understanding sync mode helps you choose the right execution pattern for your use case:

1. Job Queueing (Both Modes)

Regardless of sync setting, all scrape requests are queued to the worker service. This ensures consistent anti-bot capabilities, proxy management, and resource pooling.

2. Internal Polling (sync=true)

3. Manual Polling (sync=false)

When sync=false, the API immediately returns 202 with job_id and closes the connection. You control polling frequency and timeout. No server-side timeout constraints.

4. Timeout Behavior

If sync=true and job exceeds timeout (60-120s), API returns 202 with job_id as fallback. You can then poll manually. This prevents hung connections while still supporting long-running jobs.

Response Format

The content field may be a plain string (simple sync requests) or a structured object when using the formats parameter.

Simple sync (string)

JSON

{
  "url": "https://example.com",
  "status_code": 200,
  "content": "<!DOCTYPE html>...",
  "billing": {"total_credits": 1}
}

With formats (object)

JSON

{
  "url": "https://example.com",
  "status_code": 200,
  "content": {
    "html": "<!DOCTYPE html>...",
    "text": "Clean text...",
    "markdown": "# Example..."
  },
  "billing": {"total_credits": 1}
}

Python SDK
Recommended

The official Python SDK provides a simple, intuitive interface for the AlterLab API with automatic polling, retry logic, and type hints.

Installation

Bash

pip install alterlab

Async vs Sync

The SDK provides two clients: AlterLab (async, requires await) and AlterLabSync (sync, no await needed). Choose based on your codebase.

Sync Usage (Recommended for scripts)

Python

from alterlab import AlterLabSync

# Use AlterLabSync for synchronous code (no await needed)
with AlterLabSync(api_key="YOUR_API_KEY") as client:
    result = client.scrape("https://example.com")
    print(result["content"])
    print(f"Cost: {result['billing']['total_credits']}")

    # With schema filtering
    result = client.scrape(
        url="https://amazon.com/dp/B0123456789",
        extraction_schema={
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"},
                "in_stock": {"type": "boolean"}
            }
        }
    )
    product = result["filtered_content"]  # Your filtered data
    print(f"Product: {product['name']} - ${product['price']}")

Async Usage (For async applications)

Python

import asyncio
from alterlab import AlterLab

async def main():
    # AlterLab is async - requires await
    async with AlterLab(api_key="YOUR_API_KEY") as client:
        result = await client.scrape("https://example.com")
        print(result["content"])

        # With JS rendering
        result = await client.scrape(
            url="https://spa-app.com",
            mode="js",
            advanced=AdvancedOptions(render_js=True, screenshot=True)
        )

asyncio.run(main())

Advanced Options

Python

from alterlab import AlterLabSync, AdvancedOptions, CostControls

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    # With advanced options
    result = client.scrape(
        url="https://example.com",
        mode="js",
        advanced=AdvancedOptions(
            render_js=True,
            screenshot=True,
            markdown=True,
            wait_condition="networkidle"
        ),
        cost_controls=CostControls(
            max_credits=10,
            max_tier="4",
            prefer_cost=True
        )
    )

    # Screenshot saved to result["screenshot_url"]
    # Markdown in result["content"]["markdown"] if formats specified

Cost Estimation

Python

from alterlab import AlterLabSync, AdvancedOptions

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    # Estimate cost before scraping
    estimate = client.estimate_cost(
        url="https://example.com",
        mode="auto",
        advanced=AdvancedOptions(render_js=True, screenshot=True)
    )

    print(f"Estimated credits: {estimate['estimated_credits']}")
    print(f"Max possible: {estimate['max_possible_credits']}")

    # Only scrape if within budget
    if estimate['estimated_credits'] <= 5:
        result = client.scrape(url="https://example.com")

Error Handling

Python

from alterlab import AlterLabSync, AlterLabAPIError, AlterLabTimeoutError

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    try:
        result = client.scrape("https://example.com")
    except AlterLabAPIError as e:
        if e.status_code == 402:
            print("Insufficient balance!")
        elif e.status_code == 429:
            print(f"Rate limited. Retry after {e.retry_after}s")
        else:
            print(f"API error: {e.detail}")
    except AlterLabTimeoutError:
        print("Request timed out")
    except Exception as e:
        print(f"Unexpected error: {e}")

Manual Job Management

Python

# Get job status without waiting
job_id = "550e8400-e29b-41d4-a716-446655440000"
status = client.get_job_status(job_id)
print(f"Job status: {status['status']}")

# Poll job with custom settings
result = client.poll_job(
    job_id=job_id,
    poll_interval=2.0,  # Check every 2 seconds
    poll_timeout=300.0   # Give up after 5 minutes
)

Configuration

Python

# Custom configuration (sync client)
client = AlterLabSync(
    api_key="YOUR_API_KEY",
    base_url="https://api.alterlab.io",  # API base (paths include /api/v1)
    timeout=60.0,  # Request timeout
    max_retries=3,  # Retry failed requests
    retry_backoff=2.0  # Exponential backoff multiplier
)

# Or use environment variable
# export ALTERLAB_API_KEY="YOUR_API_KEY"
client = AlterLabSync()  # Auto-loads from env

SDK Benefits

Automatic polling: Handles 202 responses and job polling for you
Retry logic: Automatically retries failed requests with exponential backoff
Type hints: Full type annotations for IDE autocomplete
Error handling: Custom exception types for different error scenarios
Convenience methods: Mode-specific methods for common use cases

Structured Extraction (Optional Filtering)

Already Getting JSON?

If you just want structured JSON data, use formats: ["json"] as shown in Getting Structured JSON. This section is for filtering that JSON to specific fields.

Filter and restructure extracted data to match your desired output format using JSON Schema. This is pure data transformation - no additional cost.

JSON Schema Filtering

Pass extraction_schema to filter extracted data to your desired structure. The filtered result appears in filtered_content:

Python

# Extract product data with schema
result = client.scrape(
    url="https://amazon.com/dp/B0123456789",
    mode="auto",
    extraction_schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "currency": {"type": "string"},
            "in_stock": {"type": "boolean"},
            "rating": {"type": "number"},
            "reviews_count": {"type": "integer"}
        }
    }
)

# Access filtered data (matches your schema)
product = result["filtered_content"]
print(f"Name: {product['name']}")
print(f"Price: {product['price']}")
print(f"In Stock: {product['in_stock']}")

Field Aliases

Schema filtering automatically maps common field name variations:

in_stock → availability

sku → asin, product_id

image_urls → images

title → name, headline

See JSON Schema Filtering guide for the complete list.

Response Structure

JSON

{
  "url": "https://amazon.com/dp/B0123456789",
  "status_code": 200,
  "content": { ... },           // Full extraction (Schema.org, metadata, etc.)
  "filtered_content": {         // YOUR filtered data (only when extraction_schema provided)
    "name": "Product Name",
    "price": 29.99,
    "in_stock": true
  },
  "billing": { "total_credits": 3 }
}

Zero Additional Cost

Schema filtering is pure data transformation - no LLM calls, no extra charges. It filters existing structured data (Schema.org, Open Graph, playbook extractions) to match your schema.

Advanced Options

Advanced options provide fine-grained control over scraping behavior and enable premium features.

Option	Type	Cost	Description
render_js	boolean	+3	Use headless browser for JavaScript rendering
screenshot	boolean	+1	Capture full-page screenshot (requires render_js)
generate_pdf	boolean	+2	Generate PDF of rendered page (requires render_js)
ocr	boolean	+5	Extract text from images using OCR
use_proxy	boolean	+1	Route through premium proxy network
markdown	boolean	Free	Convert content to Markdown format
wait_condition	string	Free	Wait condition: domcontentloaded, networkidle, load
remove_cookie_banners	boolean	Free	Remove cookie consent banners before extraction (default: true)
use_own_proxy	boolean	$0.0008	Use your integrated proxy instead of system proxy
use_system_proxy	boolean	+1	Override your default proxy integration to use AlterLab system proxy
proxy_integration_id	string	-	Specific proxy integration ID (requires use_own_proxy)
proxy_country	string	-	Preferred proxy country code for geo-targeting (e.g., US, DE)
session_id	UUID	-	Stored session ID for authenticated scraping (injects cookies and headers)
cookies	object	-	Inline cookies for one-off auth scraping (max 100, mutually exclusive with session_id)
session_headers	object	-	Inline auth headers (e.g., Authorization: Bearer). Max 50 headers.

Example with Advanced Options

JSON

{
  "url": "https://spa-app.com",
  "mode": "auto",
  "advanced": {
    "render_js": true,
    "screenshot": true,
    "generate_pdf": true,
    "markdown": true,
    "use_proxy": true,
    "wait_condition": "networkidle"
  }
}

// Response includes download URLs (available for 24 hours):
// "screenshot_url": "https://alterlab.io/downloads/screenshots/2025-01-15/job-id.png"
// "pdf_url": "https://alterlab.io/downloads/pdfs/2025-01-15/job-id.pdf"

Cost Controls

The cost_controls object lets you control tier escalation behavior and set budget limits per request.

Parameter	Type	Description
max_credits	number	Maximum credits to spend on this request. Request fails if cost would exceed this.
force_tier	string	Force a specific tier (1, 2, 3, 3.5, 4). Skips escalation entirely.
max_tier	string	Maximum tier to escalate to. Prevents escalation beyond this level.
prefer_cost	boolean	Optimize for cost: try cheaper tiers first before escalating.
prefer_speed	boolean	Optimize for speed: skip to the most reliable tier immediately.
fail_fast	boolean	Return error instead of escalating to expensive tiers.

JSON

{
  "url": "https://example.com",
  "mode": "auto",
  "cost_controls": {
    "max_credits": 5,
    "max_tier": "4",
    "prefer_cost": true,
    "fail_fast": false
  }
}

Tier Escalation & Cost Controls

AlterLab uses an intelligent 5-tier escalation system that automatically tries the cheapest method first and escalates only when needed. Each tier has different capabilities, speeds, and costs.

Tier	Name	Cost	Per $1	Description
1	Curl	$0.0002	5,000	Ultra-fast curl binary for static sites
2	HTTP	$0.0003	3,333	HTTPX with TLS fingerprinting and HTTP/2
3	Stealth	$0.002	500	curl_cffi with Chrome browser impersonation
4	Browser	$0.004	250	Playwright browser automation for JS sites
5	Captcha	$0.02	50	Browser with AI-powered captcha solving

How Escalation Works

Start cheapest: By default, starts at Tier 1 (Curl: $0.0002)
Attempt scrape: Tries to scrape with current tier's method
Check success: If successful (status 200, valid content), stop and return result
Escalate if failed: If failed (timeout, blocked, error), move to next tier and retry
Stop at success or max tier: Returns when successful or when max_tier/max_credits reached
Detailed billing: Response includes all attempts, final tier used, and cost saved

Cost Control Parameters

max_credits(float, optional)

Maximum cost to spend on this request. API will not escalate beyond this budget. Example: max_credits: 0.004 stops at Tier 4 (Browser).

max_tier(string, optional)

Maximum tier to escalate to: "1" (Curl), "2" (HTTP), "3" (Stealth), "4" (Browser), or "5" (Captcha). Example: max_tier: "4" stops at Browser.

prefer_cost(boolean, default: false)

Start with cheapest tier (Tier 1) and try each tier sequentially. Best for known simple sites.

prefer_speed(boolean, default: false)

Start with Tier 4 (Browser) for guaranteed success on most sites. Higher cost but faster overall.

fail_fast(boolean, default: false)

Return error instead of escalating to expensive tiers. Useful when you want predictable costs.

Example: Cost-Optimized Request

JSON

{
  "url": "https://example.com",
  "mode": "auto",
  "cost_controls": {
    "max_credits": 0.004,
    "max_tier": "4",
    "prefer_cost": true,
    "fail_fast": false
  }
}

// Response billing breakdown:
{
  "billing": {
    "total_cost": 0.002,
    "tier_used": "3",
    "escalations": [
      {"tier": "1", "result": "failed", "cost": 0.0002, "duration_ms": 250, "error": "403 Forbidden"},
      {"tier": "2", "result": "failed", "cost": 0.0003, "duration_ms": 2100, "error": "Blocked by WAF"},
      {"tier": "3", "result": "success", "cost": 0.002, "duration_ms": 4200}
    ],
    "optimization_suggestion": "Site requires Stealth tier. Consider using prefer_speed with Tier 4 for faster results."
  }
}

Cost Control Best Practices

Always set max_credits for production to prevent unexpected charges
Use prefer_cost: true for known simple sites
Use prefer_speed: true for critical scrapers where reliability matters more than cost
Set fail_fast: true in testing to avoid unnecessary spending on misconfigured requests

Async Mode & Job Polling

Complex scraping requests return a 202 status with a job_id. Poll the job status endpoint to retrieve results.

GET

/api/v1/jobs/{job_id}

Poll job status and retrieve results when completed.

Parameters

Name	Type	Required	Description
job_id	`string`	Required	UUID of the job returned from the scrape endpoint

Request Example

Bash

curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
  -H "X-API-Key: sk_live_..."

Response Example

JSON

// Status: pending
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "url": "https://example.com",
  "mode": "auto",
  "created_at": "2025-11-05T10:30:00Z"
}

// Status: running
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "running",
  "url": "https://example.com",
  "mode": "auto",
  "progress": 50,
  "created_at": "2025-11-05T10:30:00Z"
}

// Status: completed
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "url": "https://example.com",
  "mode": "auto",
  "result": {
    "url": "https://example.com",
    "status_code": 200,
    "content": {
      "html": "...",
      "text": "...",
      "json": {...}
    },
    "billing": {
      "total_credits": 5,
      "tier_used": "4"
    }
  },
  "created_at": "2025-11-05T10:30:00Z",
  "completed_at": "2025-11-05T10:30:15Z"
}

// Status: failed
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "url": "https://example.com",
  "mode": "auto",
  "error": "Timeout after 30 seconds",
  "created_at": "2025-11-05T10:30:00Z",
  "failed_at": "2025-11-05T10:30:30Z"
}

Polling Best Practices

Recommended interval: Poll every 2-5 seconds
Timeout: Set a maximum polling duration (5 minutes recommended)
Exponential backoff: Increase polling interval if job takes longer
Status values: pending → running → completed/failed

Example Polling Loop (JavaScript)

JAVASCRIPT

async function pollJobStatus(jobId, apiKey, maxWaitMs = 300000) {
  const startTime = Date.now();
  const pollInterval = 2000; // 2 seconds

  while (Date.now() - startTime < maxWaitMs) {
    const response = await fetch(
      `https://api.alterlab.io/api/v1/jobs/${jobId}`,
      { headers: { 'X-API-Key': apiKey } }
    );

    const job = await response.json();

    if (job.status === 'completed') {
      return job.result;
    } else if (job.status === 'failed') {
      throw new Error(job.error);
    }

    // Wait before next poll
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }

  throw new Error('Job polling timeout');
}

WebSocket Alternative

For real-time updates without polling, use WebSocket connections to receive job status updates as they happen. This is more efficient than polling and provides instant notifications.

WebSocket Endpoint

wss://api.alterlab.io/api/v1/ws/jobs?api_key=YOUR_API_KEY

Protocol Messages

Client → Server (Subscribe):

JSON

{"action": "subscribe", "job_id": "<job-uuid>"}

Client → Server (Unsubscribe):

JSON

{"action": "unsubscribe", "job_id": "<job-uuid>"}

Client → Server (Ping):

JSON

{"action": "ping"}

Server → Client (Job Update):

JSON

{"type": "job_update", "job_id": "...", "status": "running|completed|failed", "result": {...}, "error": null, "ts": 1730451136}

Server → Client (Heartbeat):

JSON

{"type": "heartbeat", "ts": 1730451136}

Example: JavaScript WebSocket Client

JAVASCRIPT

// Connect with API key authentication
const ws = new WebSocket('wss://api.alterlab.io/api/v1/ws/jobs?api_key=sk_live_...');

// Handle connection open
ws.onopen = () => {
  console.log('WebSocket connected');

  // Subscribe to job updates
  ws.send(JSON.stringify({
    action: 'subscribe',
    job_id: '550e8400-e29b-41d4-a716-446655440000'
  }));
};

// Receive real-time updates
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);

  switch (message.type) {
    case 'connected':
      console.log('✓ Connection established');
      break;

    case 'subscribed':
      console.log('✓ Subscribed to job:', message.job_id);
      break;

    case 'job_update':
      console.log('Job status:', message.status);

      if (message.status === 'completed') {
        console.log('✓ Job completed:', message.result);
        ws.close();
      } else if (message.status === 'failed') {
        console.error('✗ Job failed:', message.error);
        ws.close();
      } else {
        console.log('⟳ Job in progress...');
      }
      break;

    case 'heartbeat':
      // Server is alive
      break;

    case 'error':
      console.error('WebSocket error:', message.message);
      break;
  }
};

ws.onerror = (error) => {
  console.error('WebSocket connection error:', error);
};

ws.onclose = () => {
  console.log('WebSocket disconnected');
};

Example: Python WebSocket Client

Python

import asyncio
import json
import websockets

async def watch_job(api_key: str, job_id: str):
    uri = f"wss://api.alterlab.io/api/v1/ws/jobs?api_key={api_key}"

    async with websockets.connect(uri) as ws:
        # Subscribe to job
        await ws.send(json.dumps({
            "action": "subscribe",
            "job_id": job_id
        }))

        # Listen for updates
        async for message in ws:
            data = json.loads(message)

            if data["type"] == "job_update":
                status = data["status"]
                print(f"Job status: {status}")

                if status == "completed":
                    print("Job completed:", data["result"])
                    break
                elif status == "failed":
                    print("Job failed:", data["error"])
                    break

# Usage
asyncio.run(watch_job("sk_live_...", "550e8400-..."))

WebSocket vs Polling

WebSocket: Instant updates, lower latency, persistent connection, more efficient for long-running jobs
Polling: Simpler implementation, works through proxies/firewalls, no persistent connection needed
Recommendation: Use WebSocket for real-time dashboards, polling for simple scripts

Batch Scraping

Submit multiple URLs for scraping in a single request. Batch requests are processed asynchronously, and you can receive results via webhook or by polling individual job statuses.

POST

/api/v1/batch

Submit a batch of URLs for asynchronous processing with optional webhook delivery.

Parameters

Name	Type	Required	Description
urls	`string[]`	Required	Array of URLs to scrape (max 1000 per batch)
mode	`string`	Optional	Scraping mode applied to all URLsDefault: `auto`
webhook_url	`string`	Optional	URL to receive results via POST webhook
advanced	`AdvancedOptions`	Optional	Advanced options applied to all URLs
cost_controls	`CostControls`	Optional	Cost controls applied to all URLs

Request Example

Bash

curl -X POST https://api.alterlab.io/api/v1/batch \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "mode": "auto",
    "webhook_url": "https://your-app.com/webhooks/scraping",
    "cost_controls": {
      "max_credits": 5
    }
  }'

Response Example

JSON

{
  "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
  "total_jobs": 3,
  "job_ids": [
    "550e8400-e29b-41d4-a716-446655440001",
    "550e8400-e29b-41d4-a716-446655440002",
    "550e8400-e29b-41d4-a716-446655440003"
  ],
  "estimated_credits": 3,
  "webhook_url": "https://your-app.com/webhooks/scraping",
  "status": "pending"
}

Webhook Payload Format

When a job completes, AlterLab sends a POST request to your webhook URL with this payload:

Bash

POST https://your-app.com/webhooks/scraping
Content-Type: application/json
X-AlterLab-Signature: sha256=...  // Webhook signature for verification

{
  "event": "job.completed",
  "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
  "job_id": "550e8400-e29b-41d4-a716-446655440001",
  "url": "https://example.com/page1",
  "status": "completed",
  "result": {
    "url": "https://example.com/page1",
    "status_code": 200,
    "content": "...",
    "billing": {
      "total_credits": 1,
      "tier_used": "1"
    }
  },
  "completed_at": "2025-11-05T10:30:15Z"
}

Python SDK Batch Example

Python

from alterlab import AlterLabSync

client = AlterLabSync(api_key="YOUR_API_KEY")

# Submit batch with webhook
batch = client.batch_scrape(
    urls=[
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3"
    ],
    mode="auto",
    webhook_url="https://your-app.com/webhooks/scraping"
)

print(f"Batch ID: {batch['batch_id']}")
print(f"Total jobs: {batch['total_jobs']}")

# Or poll each job individually
for job_id in batch['job_ids']:
    result = client.poll_job(job_id)
    print(f"Job {job_id}: {result['status']}")

Batch Limits

Maximum 1,000 URLs per batch request
Webhook URL must be publicly accessible (HTTPS required)
Webhook retries: 3 attempts with exponential backoff
Batch jobs are processed in parallel (no guaranteed order)

Usage & Balance

Monitor your API usage, spending, and account limits.

GET

/api/v1/usage

Get current usage statistics and remaining balance for your account.

Parameters

Name	Type	Required	Description
start_date	`string`	Optional	Start date for usage period (ISO 8601 format)
end_date	`string`	Optional	End date for usage period (ISO 8601 format)

Request Example

Bash

# Current billing period
curl -X GET https://api.alterlab.io/api/v1/usage \
  -H "X-API-Key: sk_live_..."

# Specific date range
curl -X GET "https://api.alterlab.io/api/v1/usage?start_date=2025-11-01&end_date=2025-11-30" \
  -H "X-API-Key: sk_live_..."

Response Example

JSON

{
  "period": {
    "start": "2025-11-01T00:00:00Z",
    "end": "2025-11-30T23:59:59Z",
    "current": true
  },
  "balance": {
    "current_cents": 7655,
    "deposited_cents": 10000,
    "used_cents": 2345
  },
  "requests": {
    "total": 2345,
    "successful": 2289,
    "failed": 56,
    "cached": 432
  },
  "by_mode": {
    "html": 1234,
    "js": 856,
    "pdf": 145,
    "ocr": 110
  },
  "by_tier": {
    "1": 1234,
    "2": 856,
    "3": 145,
    "4": 98,
    "5": 12
  },
  "rate_limits": {
    "requests_per_minute": 300,
    "current_usage": 12,
    "reset_at": "2025-11-05T10:31:00Z"
  },
  "spend_tier": {
    "current": "growth",
    "rolling_30d_spend_cents": 8500,
    "next_tier_at_cents": 20000
  }
}

Python SDK Usage Example

Python

from alterlab import AlterLabSync
from datetime import datetime, timedelta

client = AlterLabSync(api_key="YOUR_API_KEY")

# Get current usage
usage = client.get_usage()
print(f"Credits remaining: {usage['credits']['remaining']}")
print(f"Requests this period: {usage['requests']['total']}")

# Check if we have enough balance
if usage['credits']['remaining'] < 100:
    print("Warning: Low balance! Time to top up.")

# Get usage for specific period
start = datetime.now() - timedelta(days=7)
weekly_usage = client.get_usage(
    start_date=start.isoformat(),
    end_date=datetime.now().isoformat()
)
print(f"Weekly requests: {weekly_usage['requests']['total']}")

Monitoring Best Practices

Check before large batches: Query /usage before submitting batch jobs to ensure sufficient balance
Monitor rate limits: Track rate_limits.current_usage to avoid 429 errors
Set up alerts: Monitor credits.remaining and alert when below threshold
Track by tier: Use by_tier breakdown to optimize tier selection
Cache optimization: Monitor requests.cached ratio to measure cache efficiency

Balance Management Tips

Balance never expires - deposit and use at your own pace
Rate limits scale automatically with your 30-day rolling spend
Check spend_tier.next_tier_at_cents to see next rate limit upgrade threshold
Deposit more funds anytime from the dashboard billing page
Use cost_controls.max_credits to prevent budget overruns

Cost Estimation

POST

/api/v1/scrape/estimate

Estimate the cost of a scrape request without actually scraping.

Parameters

Name	Type	Required	Description
url	`string`	Required	The URL to estimate
mode	`string`	Optional	Scraping modeDefault: `auto`
advanced	`AdvancedOptions`	Optional	Advanced options to include in estimate

Request Example

Bash

curl -X POST https://api.alterlab.io/api/v1/scrape/estimate \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "mode": "auto"
  }'

Response Example

JSON

{
  "url": "https://example.com",
  "estimated_tier": "1",
  "estimated_credits": 1,
  "confidence": "high",
  "max_possible_credits": 10,
  "reasoning": "Known simple site - tier 1 should work"
}

Rate Limits

Rate limits vary by plan. Monitor your usage through the response headers.

Header	Description	When Available
X-AlterLab-Credits	Cost charged for this request	Success (200)
X-AlterLab-Tier	Tier used for scraping	Success (200)
X-AlterLab-Savings	Savings vs highest tier	Success (200)
X-AlterLab-Bytes	Response size in bytes	Success (200)
X-AlterLab-Cached	Whether result was cached	Success (200)
X-RateLimit-Limit	Maximum requests per minute	Rate limit (429)
X-RateLimit-Remaining	Remaining requests in window	Rate limit (429)
X-RateLimit-Reset	Unix timestamp when limit resets	Rate limit (429)

Rate Limit Information

Success responses (200, 202) use X-AlterLab-* headers. Rate limit errors (429) use X-RateLimit-* headers.

Error Handling

Status Code	Meaning	Action
200	Success (sync response)	Process content directly
202	Accepted (async response)	Poll job_id for results
400	Bad Request	Check request parameters
401	Unauthorized	Verify API key
402	Payment Required	Insufficient balance, top up account
404	Not Found	Job doesn't exist or unauthorized
415	Unsupported Media Type	URL content type not supported
422	Unprocessable Entity	All tiers failed, site may be blocking
429	Too Many Requests	Wait for reset time or upgrade plan
500	Internal Server Error	Retry request, contact support if persists
502	Bad Gateway	Worker queue error, retry request

Troubleshooting

Common Errors & Solutions

401 Unauthorized - Invalid API Key

Your API key is missing, invalid, or has been revoked.

Verify API key format starts with sk_live_ or sk_test_
Check that key is active in dashboard → API Keys
Ensure header is X-API-Key (case-sensitive)
Generate new API key if compromised

402 Payment Required - Insufficient Balance

Your account has insufficient balance for the current billing period.

Check remaining balance: GET /api/v1/usage
Upgrade plan in dashboard → Billing
Add funds or wait for next billing cycle

422 Unprocessable Entity - All Tiers Failed

Site actively blocked all scraping attempts across all tier levels.

Check if URL requires authentication (login, cookies)
Verify URL is publicly accessible
Try with mode: "js" explicitly
Set max_tier: "5" to enable CAPTCHA solving
Contact support if site should be accessible

429 Too Many Requests - Rate Limit Exceeded

You've exceeded your plan's rate limit (requests per minute).

Check X-RateLimit-Reset header for reset time
Implement exponential backoff in your code
Upgrade plan for higher rate limits
Spread requests over time instead of bursting

202 Accepted → Polling Never Completes

Job returns 202 but polling /jobs/{job_id} never shows completed status.

Check job status shows "running" or "pending" vs "failed"
Worker service may be down - check status page
Timeout may be too low - increase request timeout
Use WebSocket for real-time updates instead
Set reasonable polling timeout (5 minutes recommended)

Rate Limit Handling Best Practices

JAVASCRIPT

// JavaScript: Retry with exponential backoff
async function scrapeWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch('https://api.alterlab.io/api/v1/scrape', {
        method: 'POST',
        headers: {
          'X-API-Key': process.env.ALTERLAB_API_KEY,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ url })
      });

      if (response.status === 429) {
        const resetTime = response.headers.get('X-RateLimit-Reset');
        const waitTime = resetTime ?
          (parseInt(resetTime) * 1000 - Date.now()) :
          Math.pow(2, i) * 1000;

        console.log(`Rate limited. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      if (response.ok) {
        return await response.json();
      }

      throw new Error(`HTTP ${response.status}`);

    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
    }
  }
}

// Python: Retry with backoff
import time
import requests
from requests.exceptions import RequestException

def scrape_with_retry(url: str, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                'https://api.alterlab.io/api/v1/scrape',
                headers={'X-API-Key': os.environ['ALTERLAB_API_KEY']},
                json={'url': url}
            )

            if response.status_code == 429:
                reset_time = response.headers.get('X-RateLimit-Reset')
                wait_time = (
                    int(reset_time) - time.time()
                    if reset_time else 2 ** attempt
                )
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(max(wait_time, 0))
                continue

            response.raise_for_status()
            return response.json()

        except RequestException as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

Cost Management Tips

Estimate before scraping: Use POST /api/v1/scrape/estimateto check costs before running expensive requests
Set cost controls: Always use max_creditsin production to prevent unexpected charges from tier escalation
Monitor usage: Check X-AlterLab-Creditsresponse header and track cumulative usage
Enable caching: Pass cache: true to cache responses. Subsequent requests to the same URL return cached results for free (caching is opt-in, disabled by default)
Optimize tier usage: Review billing.optimization_suggestionin responses to improve cost efficiency

Still Having Issues?

If you're still experiencing problems after trying these solutions:

Check service status: status.alterlab.io
Review API audit report: Ensure you're using latest best practices
Contact support: [email protected]
Join Discord community: Get help from other developers

Include in support requests: job_id, timestamp, full error message, and API request/response

Your First Request Job Polling

Last updated: March 2026

REST API

Quick Start

Step 1: Get Your API Key

Step 2: Make Your First Request

Step 3: Handle the Response

Getting Structured JSON

Authentication

Unified Scrape Endpoint Recommended

Parameters

Request Example

Response Example

Execution Modes

How Sync Mode Works (Under the Hood)

Response Format

Python SDK Recommended

Installation

Sync Usage (Recommended for scripts)

Async Usage (For async applications)

Advanced Options

Cost Estimation

Error Handling

Manual Job Management

Configuration

Structured Extraction (Optional Filtering)

JSON Schema Filtering

Field Aliases

Response Structure

Advanced Options

Example with Advanced Options

Cost Controls

Tier Escalation & Cost Controls

How Escalation Works

Cost Control Parameters

Example: Cost-Optimized Request

Async Mode & Job Polling

Parameters

Request Example

Response Example

Example Polling Loop (JavaScript)

WebSocket Alternative

Protocol Messages

Example: JavaScript WebSocket Client

Example: Python WebSocket Client

Batch Scraping

Parameters

Request Example

Response Example

Webhook Payload Format

Python SDK Batch Example

Usage & Balance

Parameters

Request Example

Response Example

Python SDK Usage Example

Monitoring Best Practices

Balance Management Tips

Cost Estimation

Parameters

Request Example

Response Example

Rate Limits

Error Handling

Troubleshooting

Common Errors & Solutions

401 Unauthorized - Invalid API Key

402 Payment Required - Insufficient Balance

422 Unprocessable Entity - All Tiers Failed

429 Too Many Requests - Rate Limit Exceeded

202 Accepted → Polling Never Completes

Rate Limit Handling Best Practices

Cost Management Tips

Still Having Issues?

REST API

Quick Start

Step 1: Get Your API Key

Step 2: Make Your First Request

Step 3: Handle the Response

Getting Structured JSON

Authentication

Unified Scrape Endpoint Recommended

Unified Scrape Endpoint
Recommended

Python SDK
Recommended

Unified Scrape Endpoint
Recommended

Python SDK
Recommended