AlterLabAlterLab
API Reference

REST API

Complete reference for the AlterLab REST API. All endpoints use JSON for requests and responses.

Base URL

https://api.alterlab.io/api/v1

Production-Ready Documentation

This documentation reflects the actual implementation. All code examples are tested and can be copied directly into your application.

Quick Start

Get started with the AlterLab API in under 2 minutes. Here's your first request:

Step 1: Get Your API Key

Sign up at alterlab.io and generate an API key from the dashboard.

Step 2: Make Your First Request

# Using cURL
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Using Python
import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={"url": "https://example.com"}
)
print(response.json())

# Using Python SDK (Recommended)
from alterlab import AlterLabSync

client = AlterLabSync(api_key="YOUR_API_KEY")
result = client.scrape("https://example.com")
print(result["content"][:100])  # First 100 chars

Step 3: Handle the Response

Simple requests return 200 with content immediately. Complex requests return 202 with a job_id for polling.

{
  "url": "https://example.com",
  "status_code": 200,
  "content": "<!DOCTYPE html>...",
  "title": "Example Domain",
  "billing": {
    "total_credits": 1,
    "tier_used": "1"
  }
}

What's Next?

Getting Structured JSON

To get structured JSON data (product info, article metadata, etc.) instead of raw HTML, use the formats parameter:

# Get structured JSON from any page
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B0D5HMLP7S",
    "formats": ["json"]
  }'

The response will include auto-extracted structured data:

{
  "url": "https://www.amazon.com/dp/B0D5HMLP7S",
  "status_code": 200,
  "content": {
    "json": {
      "@type": "Product",
      "name": "Water Brush Pen Set",
      "price": 8.99,
      "currency": "USD",
      "rating": 4.4,
      "reviewCount": 47,
      "availability": "InStock",
      "images": ["https://..."],
      "specifications": {...},
      "reviews": [...]
    }
  }
}

No Schema Required

AlterLab automatically detects the page type (product, article, recipe, etc.) and extracts relevant fields. You only need extraction_schema if you want to filter the JSON to specific fields. See Structured Extraction for filtering options.

Authentication

All API requests require authentication using an API key. Include your API key in the X-API-Key header.

Keep your API key secure

Never expose your API key in client-side code or commit it to version control. Use environment variables to store sensitive credentials.

Unified Scrape Endpoint
Recommended

The unified endpoint handles all scraping modes through a single, intelligent interface. It automatically selects the optimal tier (1-5) based on site complexity, supports both synchronous (200) and asynchronous (202) execution patterns, and provides detailed billing breakdowns.

POST
/api/v1/scrape

Unified scraping endpoint with intelligent tier escalation, cost controls, and sync/async execution modes.

Parameters

NameTypeRequiredDescription
urlstring
Required
The URL of the web page to scrape
modestringOptionalScraping mode: auto (default), html, js, pdf, or ocrDefault: auto
syncbooleanOptionalEnable blocking mode: API polls internally until complete (60-120s max), returns 200. Set false for immediate 202 with job_id.Default: true
formatsstring[]OptionalOutput formats: text, json, html, markdown. Use ['json'] to get structured data extraction.Default: ['markdown', 'json']
advancedAdvancedOptionsOptionalAdvanced options: render_js, screenshot, generate_pdf, ocr, use_proxy, markdown, wait_condition
cost_controlsCostControlsOptionalCost controls: max_credits, max_tier, prefer_cost, prefer_speed, fail_fast
force_refreshbooleanOptionalBypass cache and force fresh scrapeDefault: false
include_raw_htmlbooleanOptionalInclude raw HTML in responseDefault: false
timeoutintegerOptionalRequest timeout in seconds (1-300)Default: 30
extraction_schemaobjectOptionalJSON Schema defining the structure you want extracted. Response will include filtered_content with data matching your schema. See JSON Schema Filtering guide.

Request Example

# Simple synchronous scrape (default)
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "mode": "auto"
  }'

# Async scrape with advanced options
curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://spa-website.com",
    "mode": "js",
    "sync": false,
    "advanced": {
      "render_js": true,
      "screenshot": true,
      "markdown": true,
      "wait_condition": "networkidle"
    },
    "cost_controls": {
      "max_credits": 10,
      "prefer_cost": true
    }
  }'

Response Example

// Sync response (200) - Simple requests with sync=true (default)
{
  "url": "https://example.com",
  "status_code": 200,
  "content": {
    "html": "<!DOCTYPE html>...",
    "text": "Cleaned text content...",
    "json": {
      "title": "Example Domain",
      "description": "Example page",
      "metadata": {...}
    }
  },
  "title": "Example Domain",
  "metadata": {
    "description": "Example page",
    "keywords": ["example"]
  },
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "cached": false,
  "response_time_ms": 1234,
  "size_bytes": 15234,
  "screenshot_url": null,  // URL if screenshot: true (available 24h)
  "pdf_url": null,         // URL if generate_pdf: true (available 24h)
  "billing": {
    "total_credits": 1,
    "tier_used": "1",
    "escalations": [
      {
        "tier": "1",
        "result": "success",
        "credits": 1,
        "duration_ms": 234
      }
    ],
    "savings": 19
  },
  "extraction_method": "algorithmic",
  "version": "v1"
}

// Async response (202) - Complex requests with sync=false or auto-detected
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000"
}

Execution Modes

  • Sync (sync=true, default): API queues the job to the worker service and polls internally every 100-500ms. When the job completes (within 60-120 seconds), returns 200 with full content. No manual polling needed - simpler request/response pattern. If job exceeds timeout, falls back to 202 with job_id for manual polling.
  • Async (sync=false): Immediately returns 202 with job_id. You must poll /api/v1/jobs/{job_id} or use WebSocket for results. Recommended for long-running scrapes (>60s), batch operations, webhooks, or real-time updates via WebSocket.
  • Key difference: Both modes queue to the same worker service. sync=true just adds automatic polling on the API side, making it simpler for clients but with a timeout constraint. sync=false gives you full control over polling/WebSocket with no timeout limits.

When to Use Async Mode (sync=false)

Use sync: false for:
  • Scrapes expected to take longer than 60 seconds (large PDFs, complex sites)
  • Batch operations where you want to queue multiple jobs and poll together
  • When you need webhook delivery instead of polling
  • Real-time status updates via WebSocket
For quick scrapes (most websites), sync: true is simpler and eliminates manual polling code.

How Sync Mode Works (Under the Hood)

Understanding sync mode helps you choose the right execution pattern for your use case:

1. Job Queueing (Both Modes)

Regardless of sync setting, all scrape requests are queued to the worker service. This ensures consistent anti-bot capabilities, proxy management, and resource pooling.

2. Internal Polling (sync=true)

When sync=true, the API server holds your HTTP connection open and polls Redis every 100-500ms checking job status. When complete, it returns the full result as a 200 response. Maximum wait: 60-120 seconds.

3. Manual Polling (sync=false)

When sync=false, the API immediately returns 202 with job_id and closes the connection. You control polling frequency and timeout. No server-side timeout constraints.

4. Timeout Behavior

If sync=true and job exceeds timeout (60-120s), API returns 202 with job_id as fallback. You can then poll manually. This prevents hung connections while still supporting long-running jobs.

Key Insight: sync=true is purely a convenience feature. Both modes use the same worker infrastructure and have identical scraping capabilities. The only difference is who handles polling: the API server (sync=true) or your client code (sync=false).

Response Format

The content field may be a plain string (simple sync requests) or a structured object when using the formats parameter.

Simple sync (string)
{
  "url": "https://example.com",
  "status_code": 200,
  "content": "<!DOCTYPE html>...",
  "billing": {"total_credits": 1}
}
With formats (object)
{
  "url": "https://example.com",
  "status_code": 200,
  "content": {
    "html": "<!DOCTYPE html>...",
    "text": "Clean text...",
    "markdown": "# Example..."
  },
  "billing": {"total_credits": 1}
}

Python SDK
Recommended

The official Python SDK provides a simple, intuitive interface for the AlterLab API with automatic polling, retry logic, and type hints.

Installation

pip install alterlab

Async vs Sync

The SDK provides two clients: AlterLab (async, requires await) and AlterLabSync (sync, no await needed). Choose based on your codebase.

Sync Usage (Recommended for scripts)

from alterlab import AlterLabSync

# Use AlterLabSync for synchronous code (no await needed)
with AlterLabSync(api_key="YOUR_API_KEY") as client:
    result = client.scrape("https://example.com")
    print(result["content"])
    print(f"Credits used: {result['billing']['total_credits']}")

    # With schema filtering
    result = client.scrape(
        url="https://amazon.com/dp/B0123456789",
        extraction_schema={
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"},
                "in_stock": {"type": "boolean"}
            }
        }
    )
    product = result["filtered_content"]  # Your filtered data
    print(f"Product: {product['name']} - ${product['price']}")

Async Usage (For async applications)

import asyncio
from alterlab import AlterLab

async def main():
    # AlterLab is async - requires await
    async with AlterLab(api_key="YOUR_API_KEY") as client:
        result = await client.scrape("https://example.com")
        print(result["content"])

        # With JS rendering
        result = await client.scrape(
            url="https://spa-app.com",
            mode="js",
            advanced=AdvancedOptions(render_js=True, screenshot=True)
        )

asyncio.run(main())

Advanced Options

from alterlab import AlterLabSync, AdvancedOptions, CostControls

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    # With advanced options
    result = client.scrape(
        url="https://example.com",
        mode="js",
        advanced=AdvancedOptions(
            render_js=True,
            screenshot=True,
            markdown=True,
            wait_condition="networkidle"
        ),
        cost_controls=CostControls(
            max_credits=10,
            max_tier="4",
            prefer_cost=True
        )
    )

    # Screenshot saved to result["screenshot_url"]
    # Markdown in result["content"]["markdown"] if formats specified

Cost Estimation

from alterlab import AlterLabSync, AdvancedOptions

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    # Estimate cost before scraping
    estimate = client.estimate_cost(
        url="https://example.com",
        mode="auto",
        advanced=AdvancedOptions(render_js=True, screenshot=True)
    )

    print(f"Estimated credits: {estimate['estimated_credits']}")
    print(f"Max possible: {estimate['max_possible_credits']}")

    # Only scrape if within budget
    if estimate['estimated_credits'] <= 5:
        result = client.scrape(url="https://example.com")

Error Handling

from alterlab import AlterLabSync, AlterLabAPIError, AlterLabTimeoutError

with AlterLabSync(api_key="YOUR_API_KEY") as client:
    try:
        result = client.scrape("https://example.com")
    except AlterLabAPIError as e:
        if e.status_code == 402:
            print("Insufficient balance!")
        elif e.status_code == 429:
            print(f"Rate limited. Retry after {e.retry_after}s")
        else:
            print(f"API error: {e.detail}")
    except AlterLabTimeoutError:
        print("Request timed out")
    except Exception as e:
        print(f"Unexpected error: {e}")

Manual Job Management

# Get job status without waiting
job_id = "550e8400-e29b-41d4-a716-446655440000"
status = client.get_job_status(job_id)
print(f"Job status: {status['status']}")

# Poll job with custom settings
result = client.poll_job(
    job_id=job_id,
    poll_interval=2.0,  # Check every 2 seconds
    poll_timeout=300.0   # Give up after 5 minutes
)

Configuration

# Custom configuration (sync client)
client = AlterLabSync(
    api_key="YOUR_API_KEY",
    base_url="https://api.alterlab.io",  # API base (paths include /api/v1)
    timeout=60.0,  # Request timeout
    max_retries=3,  # Retry failed requests
    retry_backoff=2.0  # Exponential backoff multiplier
)

# Or use environment variable
# export ALTERLAB_API_KEY="YOUR_API_KEY"
client = AlterLabSync()  # Auto-loads from env

SDK Benefits

  • Automatic polling: Handles 202 responses and job polling for you
  • Retry logic: Automatically retries failed requests with exponential backoff
  • Type hints: Full type annotations for IDE autocomplete
  • Error handling: Custom exception types for different error scenarios
  • Convenience methods: Mode-specific methods for common use cases

Structured Extraction (Optional Filtering)

Already Getting JSON?

If you just want structured JSON data, use formats: ["json"] as shown in Getting Structured JSON. This section is for filtering that JSON to specific fields.

Filter and restructure extracted data to match your desired output format using JSON Schema. This is pure data transformation - no additional cost.

JSON Schema Filtering

Pass extraction_schema to filter extracted data to your desired structure. The filtered result appears in filtered_content:

# Extract product data with schema
result = client.scrape(
    url="https://amazon.com/dp/B0123456789",
    mode="auto",
    extraction_schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "currency": {"type": "string"},
            "in_stock": {"type": "boolean"},
            "rating": {"type": "number"},
            "reviews_count": {"type": "integer"}
        }
    }
)

# Access filtered data (matches your schema)
product = result["filtered_content"]
print(f"Name: {product['name']}")
print(f"Price: {product['price']}")
print(f"In Stock: {product['in_stock']}")

Field Aliases

Schema filtering automatically maps common field name variations:

in_stock → availability
sku → asin, product_id
image_urls → images
title → name, headline

See JSON Schema Filtering guide for the complete list.

Response Structure

{
  "url": "https://amazon.com/dp/B0123456789",
  "status_code": 200,
  "content": { ... },           // Full extraction (Schema.org, metadata, etc.)
  "filtered_content": {         // YOUR filtered data (only when extraction_schema provided)
    "name": "Product Name",
    "price": 29.99,
    "in_stock": true
  },
  "billing": { "total_credits": 3 }
}

Zero Additional Cost

Schema filtering is pure data transformation - no LLM calls, no extra charges. It filters existing structured data (Schema.org, Open Graph, playbook extractions) to match your schema.

Advanced Options

Advanced options provide fine-grained control over scraping behavior and enable premium features.

OptionTypeCostDescription
render_jsboolean+3Use headless browser for JavaScript rendering
screenshotboolean+1Capture full-page screenshot (requires render_js)
generate_pdfboolean+2Generate PDF of rendered page (requires render_js)
ocrboolean+5Extract text from images using OCR
use_proxyboolean+1Route through premium proxy network
markdownbooleanFreeConvert content to Markdown format
wait_conditionstringFreeWait condition: domcontentloaded, networkidle, load

Example with Advanced Options

{
  "url": "https://spa-app.com",
  "mode": "auto",
  "advanced": {
    "render_js": true,
    "screenshot": true,
    "generate_pdf": true,
    "markdown": true,
    "use_proxy": true,
    "wait_condition": "networkidle"
  }
}

// Response includes download URLs (available for 24 hours):
// "screenshot_url": "https://alterlab.io/downloads/screenshots/2025-01-15/job-id.png"
// "pdf_url": "https://alterlab.io/downloads/pdfs/2025-01-15/job-id.pdf"

Cost Controls

{
  "url": "https://example.com",
  "mode": "auto",
  "cost_controls": {
    "max_credits": 5,
    "max_tier": "4",
    "prefer_cost": true,
    "fail_fast": false
  }
}

Tier Escalation & Cost Controls

AlterLab uses an intelligent 5-tier escalation system that automatically tries the cheapest method first and escalates only when needed. Each tier has different capabilities, speeds, and costs.

TierNameCostPer $1Description
1Curl$0.00025,000Ultra-fast curl binary for static sites
2HTTP$0.00033,333HTTPX with TLS fingerprinting and HTTP/2
3Stealth$0.002500curl_cffi with Chrome browser impersonation
4Browser$0.004250Playwright browser automation for JS sites
5Captcha$0.0250Browser with AI-powered captcha solving

How Escalation Works

  1. Start cheapest: By default, starts at Tier 1 (Curl: $0.0002)
  2. Attempt scrape: Tries to scrape with current tier's method
  3. Check success: If successful (status 200, valid content), stop and return result
  4. Escalate if failed: If failed (timeout, blocked, error), move to next tier and retry
  5. Stop at success or max tier: Returns when successful or when max_tier/max_credits reached
  6. Detailed billing: Response includes all attempts, final tier used, and cost saved

Cost Control Parameters

max_credits(float, optional)

Maximum cost to spend on this request. API will not escalate beyond this budget. Example: max_credits: 0.004 stops at Tier 4 (Browser).

max_tier(string, optional)

Maximum tier to escalate to: "1" (Curl), "2" (HTTP), "3" (Stealth), "4" (Browser), or "5" (Captcha). Example: max_tier: "4" stops at Browser.

prefer_cost(boolean, default: false)

Start with cheapest tier (Tier 1) and try each tier sequentially. Best for known simple sites.

prefer_speed(boolean, default: false)

Start with Tier 4 (Browser) for guaranteed success on most sites. Higher cost but faster overall.

fail_fast(boolean, default: false)

Return error instead of escalating to expensive tiers. Useful when you want predictable costs.

Example: Cost-Optimized Request

{
  "url": "https://example.com",
  "mode": "auto",
  "cost_controls": {
    "max_credits": 0.004,
    "max_tier": "4",
    "prefer_cost": true,
    "fail_fast": false
  }
}

// Response billing breakdown:
{
  "billing": {
    "total_cost": 0.002,
    "tier_used": "3",
    "escalations": [
      {"tier": "1", "result": "failed", "cost": 0.0002, "duration_ms": 250, "error": "403 Forbidden"},
      {"tier": "2", "result": "failed", "cost": 0.0003, "duration_ms": 2100, "error": "Blocked by WAF"},
      {"tier": "3", "result": "success", "cost": 0.002, "duration_ms": 4200}
    ],
    "optimization_suggestion": "Site requires Stealth tier. Consider using prefer_speed with Tier 4 for faster results."
  }
}

Cost Control Best Practices

  • Always set max_credits for production to prevent unexpected charges
  • Use prefer_cost: true for known simple sites
  • Use prefer_speed: true for critical scrapers where reliability matters more than cost
  • Set fail_fast: true in testing to avoid unnecessary spending on misconfigured requests

Async Mode & Job Polling

Complex scraping requests return a 202 status with a job_id. Poll the job status endpoint to retrieve results.

GET
/api/v1/jobs/{job_id}

Poll job status and retrieve results when completed.

Parameters

NameTypeRequiredDescription
job_idstring
Required
UUID of the job returned from the scrape endpoint

Request Example

curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
  -H "X-API-Key: sk_live_..."

Response Example

// Status: pending
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "url": "https://example.com",
  "mode": "auto",
  "created_at": "2025-11-05T10:30:00Z"
}

// Status: running
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "running",
  "url": "https://example.com",
  "mode": "auto",
  "progress": 50,
  "created_at": "2025-11-05T10:30:00Z"
}

// Status: completed
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "url": "https://example.com",
  "mode": "auto",
  "result": {
    "url": "https://example.com",
    "status_code": 200,
    "content": {
      "html": "...",
      "text": "...",
      "json": {...}
    },
    "billing": {
      "total_credits": 5,
      "tier_used": "4"
    }
  },
  "created_at": "2025-11-05T10:30:00Z",
  "completed_at": "2025-11-05T10:30:15Z"
}

// Status: failed
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "failed",
  "url": "https://example.com",
  "mode": "auto",
  "error": "Timeout after 30 seconds",
  "created_at": "2025-11-05T10:30:00Z",
  "failed_at": "2025-11-05T10:30:30Z"
}

Polling Best Practices

  • Recommended interval: Poll every 2-5 seconds
  • Timeout: Set a maximum polling duration (5 minutes recommended)
  • Exponential backoff: Increase polling interval if job takes longer
  • Status values: pending → running → completed/failed

Example Polling Loop (JavaScript)

async function pollJobStatus(jobId, apiKey, maxWaitMs = 300000) {
  const startTime = Date.now();
  const pollInterval = 2000; // 2 seconds

  while (Date.now() - startTime < maxWaitMs) {
    const response = await fetch(
      `https://api.alterlab.io/api/v1/jobs/${jobId}`,
      { headers: { 'X-API-Key': apiKey } }
    );

    const job = await response.json();

    if (job.status === 'completed') {
      return job.result;
    } else if (job.status === 'failed') {
      throw new Error(job.error);
    }

    // Wait before next poll
    await new Promise(resolve => setTimeout(resolve, pollInterval));
  }

  throw new Error('Job polling timeout');
}

WebSocket Alternative

For real-time updates without polling, use WebSocket connections to receive job status updates as they happen. This is more efficient than polling and provides instant notifications.

WebSocket Endpoint

wss://api.alterlab.io/api/v1/ws/jobs?api_key=YOUR_API_KEY

Protocol Messages

Client → Server (Subscribe):
{"action": "subscribe", "job_id": "<job-uuid>"}
Client → Server (Unsubscribe):
{"action": "unsubscribe", "job_id": "<job-uuid>"}
Client → Server (Ping):
{"action": "ping"}
Server → Client (Job Update):
{"type": "job_update", "job_id": "...", "status": "running|completed|failed", "result": {...}, "error": null, "ts": 1730451136}
Server → Client (Heartbeat):
{"type": "heartbeat", "ts": 1730451136}

Example: JavaScript WebSocket Client

// Connect with API key authentication
const ws = new WebSocket('wss://api.alterlab.io/api/v1/ws/jobs?api_key=sk_live_...');

// Handle connection open
ws.onopen = () => {
  console.log('WebSocket connected');

  // Subscribe to job updates
  ws.send(JSON.stringify({
    action: 'subscribe',
    job_id: '550e8400-e29b-41d4-a716-446655440000'
  }));
};

// Receive real-time updates
ws.onmessage = (event) => {
  const message = JSON.parse(event.data);

  switch (message.type) {
    case 'connected':
      console.log('✓ Connection established');
      break;

    case 'subscribed':
      console.log('✓ Subscribed to job:', message.job_id);
      break;

    case 'job_update':
      console.log('Job status:', message.status);

      if (message.status === 'completed') {
        console.log('✓ Job completed:', message.result);
        ws.close();
      } else if (message.status === 'failed') {
        console.error('✗ Job failed:', message.error);
        ws.close();
      } else {
        console.log('⟳ Job in progress...');
      }
      break;

    case 'heartbeat':
      // Server is alive
      break;

    case 'error':
      console.error('WebSocket error:', message.message);
      break;
  }
};

ws.onerror = (error) => {
  console.error('WebSocket connection error:', error);
};

ws.onclose = () => {
  console.log('WebSocket disconnected');
};

Example: Python WebSocket Client

import asyncio
import json
import websockets

async def watch_job(api_key: str, job_id: str):
    uri = f"wss://api.alterlab.io/api/v1/ws/jobs?api_key={api_key}"

    async with websockets.connect(uri) as ws:
        # Subscribe to job
        await ws.send(json.dumps({
            "action": "subscribe",
            "job_id": job_id
        }))

        # Listen for updates
        async for message in ws:
            data = json.loads(message)

            if data["type"] == "job_update":
                status = data["status"]
                print(f"Job status: {status}")

                if status == "completed":
                    print("Job completed:", data["result"])
                    break
                elif status == "failed":
                    print("Job failed:", data["error"])
                    break

# Usage
asyncio.run(watch_job("sk_live_...", "550e8400-..."))

WebSocket vs Polling

  • WebSocket: Instant updates, lower latency, persistent connection, more efficient for long-running jobs
  • Polling: Simpler implementation, works through proxies/firewalls, no persistent connection needed
  • Recommendation: Use WebSocket for real-time dashboards, polling for simple scripts

Batch Scraping

Submit multiple URLs for scraping in a single request. Batch requests are processed asynchronously, and you can receive results via webhook or by polling individual job statuses.

POST
/api/v1/batch

Submit a batch of URLs for asynchronous processing with optional webhook delivery.

Parameters

NameTypeRequiredDescription
urlsstring[]
Required
Array of URLs to scrape (max 1000 per batch)
modestringOptionalScraping mode applied to all URLsDefault: auto
webhook_urlstringOptionalURL to receive results via POST webhook
advancedAdvancedOptionsOptionalAdvanced options applied to all URLs
cost_controlsCostControlsOptionalCost controls applied to all URLs

Request Example

curl -X POST https://api.alterlab.io/api/v1/batch \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "mode": "auto",
    "webhook_url": "https://your-app.com/webhooks/scraping",
    "cost_controls": {
      "max_credits": 5
    }
  }'

Response Example

{
  "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
  "total_jobs": 3,
  "job_ids": [
    "550e8400-e29b-41d4-a716-446655440001",
    "550e8400-e29b-41d4-a716-446655440002",
    "550e8400-e29b-41d4-a716-446655440003"
  ],
  "estimated_credits": 3,
  "webhook_url": "https://your-app.com/webhooks/scraping",
  "status": "pending"
}

Webhook Payload Format

When a job completes, AlterLab sends a POST request to your webhook URL with this payload:

POST https://your-app.com/webhooks/scraping
Content-Type: application/json
X-AlterLab-Signature: sha256=...  // Webhook signature for verification

{
  "event": "job.completed",
  "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
  "job_id": "550e8400-e29b-41d4-a716-446655440001",
  "url": "https://example.com/page1",
  "status": "completed",
  "result": {
    "url": "https://example.com/page1",
    "status_code": 200,
    "content": "...",
    "billing": {
      "total_credits": 1,
      "tier_used": "1"
    }
  },
  "completed_at": "2025-11-05T10:30:15Z"
}

Python SDK Batch Example

from alterlab import AlterLabSync

client = AlterLabSync(api_key="YOUR_API_KEY")

# Submit batch with webhook
batch = client.batch_scrape(
    urls=[
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3"
    ],
    mode="auto",
    webhook_url="https://your-app.com/webhooks/scraping"
)

print(f"Batch ID: {batch['batch_id']}")
print(f"Total jobs: {batch['total_jobs']}")

# Or poll each job individually
for job_id in batch['job_ids']:
    result = client.poll_job(job_id)
    print(f"Job {job_id}: {result['status']}")

Batch Limits

  • Maximum 1,000 URLs per batch request
  • Webhook URL must be publicly accessible (HTTPS required)
  • Webhook retries: 3 attempts with exponential backoff
  • Batch jobs are processed in parallel (no guaranteed order)

Usage & Balance

Monitor your API usage, spending, and account limits.

GET
/api/v1/usage

Get current usage statistics and remaining balance for your account.

Parameters

NameTypeRequiredDescription
start_datestringOptionalStart date for usage period (ISO 8601 format)
end_datestringOptionalEnd date for usage period (ISO 8601 format)

Request Example

# Current billing period
curl -X GET https://api.alterlab.io/api/v1/usage \
  -H "X-API-Key: sk_live_..."

# Specific date range
curl -X GET "https://api.alterlab.io/api/v1/usage?start_date=2025-11-01&end_date=2025-11-30" \
  -H "X-API-Key: sk_live_..."

Response Example

{
  "period": {
    "start": "2025-11-01T00:00:00Z",
    "end": "2025-11-30T23:59:59Z",
    "current": true
  },
  "balance": {
    "current_cents": 7655,
    "deposited_cents": 10000,
    "used_cents": 2345
  },
  "requests": {
    "total": 2345,
    "successful": 2289,
    "failed": 56,
    "cached": 432
  },
  "by_mode": {
    "html": 1234,
    "js": 856,
    "pdf": 145,
    "ocr": 110
  },
  "by_tier": {
    "1": 1234,
    "2": 856,
    "3": 145,
    "4": 98,
    "5": 12
  },
  "rate_limits": {
    "requests_per_minute": 300,
    "current_usage": 12,
    "reset_at": "2025-11-05T10:31:00Z"
  },
  "spend_tier": {
    "current": "growth",
    "rolling_30d_spend_cents": 8500,
    "next_tier_at_cents": 20000
  }
}

Python SDK Usage Example

from alterlab import AlterLabSync
from datetime import datetime, timedelta

client = AlterLabSync(api_key="YOUR_API_KEY")

# Get current usage
usage = client.get_usage()
print(f"Credits remaining: {usage['credits']['remaining']}")
print(f"Requests this period: {usage['requests']['total']}")

# Check if we have enough balance
if usage['credits']['remaining'] < 100:
    print("Warning: Low balance! Time to top up.")

# Get usage for specific period
start = datetime.now() - timedelta(days=7)
weekly_usage = client.get_usage(
    start_date=start.isoformat(),
    end_date=datetime.now().isoformat()
)
print(f"Weekly requests: {weekly_usage['requests']['total']}")

Monitoring Best Practices

  • Check before large batches: Query /usage before submitting batch jobs to ensure sufficient balance
  • Monitor rate limits: Track rate_limits.current_usage to avoid 429 errors
  • Set up alerts: Monitor credits.remaining and alert when below threshold
  • Track by tier: Use by_tier breakdown to optimize tier selection
  • Cache optimization: Monitor requests.cached ratio to measure cache efficiency

Balance Management Tips

  • Balance never expires - deposit and use at your own pace
  • Rate limits scale automatically with your 30-day rolling spend
  • Check spend_tier.next_tier_at_cents to see next rate limit upgrade threshold
  • Deposit more funds anytime from the dashboard billing page
  • Use cost_controls.max_credits to prevent budget overruns

Cost Estimation

POST
/api/v1/scrape/estimate

Estimate the cost of a scrape request without actually scraping.

Parameters

NameTypeRequiredDescription
urlstring
Required
The URL to estimate
modestringOptionalScraping modeDefault: auto
advancedAdvancedOptionsOptionalAdvanced options to include in estimate

Request Example

curl -X POST https://api.alterlab.io/api/v1/scrape/estimate \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "mode": "auto"
  }'

Response Example

{
  "url": "https://example.com",
  "estimated_tier": "1",
  "estimated_credits": 1,
  "confidence": "high",
  "max_possible_credits": 10,
  "reasoning": "Known simple site - tier 1 should work"
}

Rate Limits

Rate limits vary by plan. Monitor your usage through the response headers.

HeaderDescriptionWhen Available
X-AlterLab-CreditsCost charged for this requestSuccess (200)
X-AlterLab-TierTier used for scrapingSuccess (200)
X-AlterLab-SavingsSavings vs highest tierSuccess (200)
X-AlterLab-BytesResponse size in bytesSuccess (200)
X-AlterLab-CachedWhether result was cachedSuccess (200)
X-RateLimit-LimitMaximum requests per minuteRate limit (429)
X-RateLimit-RemainingRemaining requests in windowRate limit (429)
X-RateLimit-ResetUnix timestamp when limit resetsRate limit (429)

Rate Limit Information

Success responses (200, 202) use X-AlterLab-* headers. Rate limit errors (429) use X-RateLimit-* headers.

Error Handling

Status CodeMeaningAction
200Success (sync response)Process content directly
202Accepted (async response)Poll job_id for results
400Bad RequestCheck request parameters
401UnauthorizedVerify API key
402Payment RequiredInsufficient balance, top up account
404Not FoundJob doesn't exist or unauthorized
415Unsupported Media TypeURL content type not supported
422Unprocessable EntityAll tiers failed, site may be blocking
429Too Many RequestsWait for reset time or upgrade plan
500Internal Server ErrorRetry request, contact support if persists
502Bad GatewayWorker queue error, retry request

Troubleshooting

Common Errors & Solutions

401 Unauthorized - Invalid API Key

Your API key is missing, invalid, or has been revoked.

  • Verify API key format starts with sk_live_ or sk_test_
  • Check that key is active in dashboard → API Keys
  • Ensure header is X-API-Key (case-sensitive)
  • Generate new API key if compromised

402 Payment Required - Insufficient Balance

Your account has insufficient balance for the current billing period.

  • Check remaining balance: GET /api/v1/usage
  • Upgrade plan in dashboard → Billing
  • Add funds or wait for next billing cycle

422 Unprocessable Entity - All Tiers Failed

Site actively blocked all scraping attempts across all tier levels.

  • Check if URL requires authentication (login, cookies)
  • Verify URL is publicly accessible
  • Try with mode: "js" explicitly
  • Set max_tier: "5" to enable CAPTCHA solving
  • Contact support if site should be accessible

429 Too Many Requests - Rate Limit Exceeded

You've exceeded your plan's rate limit (requests per minute).

  • Check X-RateLimit-Reset header for reset time
  • Implement exponential backoff in your code
  • Upgrade plan for higher rate limits
  • Spread requests over time instead of bursting

202 Accepted → Polling Never Completes

Job returns 202 but polling /jobs/{job_id} never shows completed status.

  • Check job status shows "running" or "pending" vs "failed"
  • Worker service may be down - check status page
  • Timeout may be too low - increase request timeout
  • Use WebSocket for real-time updates instead
  • Set reasonable polling timeout (5 minutes recommended)

Rate Limit Handling Best Practices

// JavaScript: Retry with exponential backoff
async function scrapeWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch('https://api.alterlab.io/api/v1/scrape', {
        method: 'POST',
        headers: {
          'X-API-Key': process.env.ALTERLAB_API_KEY,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ url })
      });

      if (response.status === 429) {
        const resetTime = response.headers.get('X-RateLimit-Reset');
        const waitTime = resetTime ?
          (parseInt(resetTime) * 1000 - Date.now()) :
          Math.pow(2, i) * 1000;

        console.log(`Rate limited. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      if (response.ok) {
        return await response.json();
      }

      throw new Error(`HTTP ${response.status}`);

    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
    }
  }
}

// Python: Retry with backoff
import time
import requests
from requests.exceptions import RequestException

def scrape_with_retry(url: str, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            response = requests.post(
                'https://api.alterlab.io/api/v1/scrape',
                headers={'X-API-Key': os.environ['ALTERLAB_API_KEY']},
                json={'url': url}
            )

            if response.status_code == 429:
                reset_time = response.headers.get('X-RateLimit-Reset')
                wait_time = (
                    int(reset_time) - time.time()
                    if reset_time else 2 ** attempt
                )
                print(f"Rate limited. Waiting {wait_time}s...")
                time.sleep(max(wait_time, 0))
                continue

            response.raise_for_status()
            return response.json()

        except RequestException as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)

Cost Management Tips

  • Estimate before scraping: Use POST /api/v1/scrape/estimateto check costs before running expensive requests
  • Set cost controls: Always use max_creditsin production to prevent unexpected charges from tier escalation
  • Monitor usage: Check X-AlterLab-Creditsresponse header and track cumulative usage
  • Enable caching: Pass cache: true to cache responses. Subsequent requests to the same URL return cached results for free (caching is opt-in, disabled by default)
  • Optimize tier usage: Review billing.optimization_suggestionin responses to improve cost efficiency

Still Having Issues?

If you're still experiencing problems after trying these solutions:

  • Check service status: status.alterlab.io
  • Review API audit report: Ensure you're using latest best practices
  • Contact support: [email protected]
  • Join Discord community: Get help from other developers

Include in support requests: job_id, timestamp, full error message, and API request/response