AlterLabAlterLab
PricingComparePlaygroundBlogDocsChangelog
    AlterLabAlterLab
    PricingComparePlaygroundBlogDocsChangelog
    IntroductionQuickstartInstallationYour First Request
    REST APIJob PollingAPI KeysSessions APINew
    OverviewPythonNode.js
    JavaScript RenderingOutput FormatsPDF & OCRCachingWebhooksJSON Schema FilteringWebSocket Real-TimeBring Your Own ProxyProAuthenticated ScrapingNewWeb CrawlingBatch ScrapingSchedulerChange DetectionCloud Storage ExportSpend LimitsOrganizations & TeamsAlerts & Notifications
    Structured ExtractionAIE-commerce ScrapingNews MonitoringPrice MonitoringMulti-Page CrawlingMonitoring DashboardAI Agent / MCPMCPData Pipeline to Cloud
    PricingRate LimitsError Codes
    From FirecrawlFrom ApifyFrom ScrapingBee / ScraperAPI
    PlaygroundPricingStatus
    API Reference

    REST API

    Complete reference for the AlterLab REST API. All endpoints use JSON for requests and responses.

    Base URL

    Bash
    https://api.alterlab.io/api/v1

    Production-Ready Documentation

    This documentation reflects the actual implementation. All code examples are tested and can be copied directly into your application.
    Use the API Playground in your dashboard to test requests and see live responses before writing code.

    Quick Start

    Get started with the AlterLab API in under 2 minutes. Here's your first request:

    Step 1: Get Your API Key

    Sign up at alterlab.io and generate an API key from the dashboard.

    Step 2: Make Your First Request

    Bash
    # Using cURL
    curl -X POST https://api.alterlab.io/api/v1/scrape \
      -H "X-API-Key: YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"url": "https://example.com"}'
    
    # Using Python
    import requests
    
    response = requests.post(
        "https://api.alterlab.io/api/v1/scrape",
        headers={"X-API-Key": "YOUR_API_KEY"},
        json={"url": "https://example.com"}
    )
    print(response.json())
    
    # Using Python SDK (Recommended)
    from alterlab import AlterLabSync
    
    client = AlterLabSync(api_key="YOUR_API_KEY")
    result = client.scrape("https://example.com")
    print(result["content"][:100])  # First 100 chars

    Step 3: Handle the Response

    Simple requests return 200 with content immediately. Complex requests return 202 with a job_id for polling.

    JSON
    {
      "url": "https://example.com",
      "status_code": 200,
      "content": "<!DOCTYPE html>...",
      "title": "Example Domain",
      "billing": {
        "total_credits": 1,
        "tier_used": "1"
      }
    }

    What's Next?

    • Learn about Python SDK for easier integration
    • Explore advanced options like screenshots and OCR
    • Understand tier escalation and cost controls

    Getting Structured JSON

    To get structured JSON data (product info, article metadata, etc.) instead of raw HTML, use the formats parameter:

    Bash
    # Get structured JSON from any page
    curl -X POST https://api.alterlab.io/api/v1/scrape \
      -H "X-API-Key: YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "url": "https://www.amazon.com/dp/B0D5HMLP7S",
        "formats": ["json"]
      }'

    The response will include auto-extracted structured data:

    JSON
    {
      "url": "https://www.amazon.com/dp/B0D5HMLP7S",
      "status_code": 200,
      "content": {
        "json": {
          "@type": "Product",
          "name": "Water Brush Pen Set",
          "price": 8.99,
          "currency": "USD",
          "rating": 4.4,
          "reviewCount": 47,
          "availability": "InStock",
          "images": ["https://..."],
          "specifications": {...},
          "reviews": [...]
        }
      }
    }

    No Schema Required

    AlterLab automatically detects the page type (product, article, recipe, etc.) and extracts relevant fields. You only need extraction_schema if you want to filter the JSON to specific fields. See Structured Extraction for filtering options.

    Authentication

    All API requests require authentication using an API key. Include your API key in the X-API-Key header.

    Keep your API key secure

    Never expose your API key in client-side code or commit it to version control. Use environment variables to store sensitive credentials.

    Unified Scrape Endpoint
    Recommended

    The unified endpoint handles all scraping modes through a single, intelligent interface. It automatically selects the optimal tier (1-5) based on site complexity, supports both synchronous (200) and asynchronous (202) execution patterns, and provides detailed billing breakdowns.

    POST
    /api/v1/scrape

    Unified scraping endpoint with intelligent tier escalation, cost controls, and sync/async execution modes.

    Parameters

    NameTypeRequiredDescription
    urlstring
    Required
    The URL of the web page to scrape
    modestringOptionalScraping mode: auto (default), html, js, pdf, or ocrDefault: auto
    syncbooleanOptionalEnable blocking mode: API polls internally until complete (60-120s max), returns 200. Set false for immediate 202 with job_id.Default: true
    formatsstring[]OptionalOutput formats: text, json, html, markdown. Use ['json'] to get structured data extraction.Default: ['markdown', 'json']
    advancedAdvancedOptionsOptionalAdvanced options: render_js, screenshot, generate_pdf, ocr, use_proxy, markdown, wait_condition
    cost_controlsCostControlsOptionalCost controls: max_cost, max_tier, prefer_cost, prefer_speed, fail_fast
    force_refreshbooleanOptionalBypass cache and force fresh scrapeDefault: false
    include_raw_htmlbooleanOptionalInclude raw HTML in responseDefault: false
    timeoutintegerOptionalRequest timeout in seconds (1-300)Default: 30
    extraction_schemaobjectOptionalJSON Schema defining the structure you want extracted. Response will include filtered_content with data matching your schema. See JSON Schema Filtering guide.

    Request Example

    Bash
    # Simple synchronous scrape (default)
    curl -X POST https://api.alterlab.io/api/v1/scrape \
      -H "X-API-Key: sk_live_..." \
      -H "Content-Type: application/json" \
      -d '{
        "url": "https://example.com",
        "mode": "auto"
      }'
    
    # Async scrape with advanced options
    curl -X POST https://api.alterlab.io/api/v1/scrape \
      -H "X-API-Key: sk_live_..." \
      -H "Content-Type: application/json" \
      -d '{
        "url": "https://spa-website.com",
        "mode": "js",
        "sync": false,
        "advanced": {
          "render_js": true,
          "screenshot": true,
          "markdown": true,
          "wait_condition": "networkidle"
        },
        "cost_controls": {
          "max_credits": 10,
          "prefer_cost": true
        }
      }'

    Response Example

    JSON
    // Sync response (200) - Simple requests with sync=true (default)
    {
      "url": "https://example.com",
      "status_code": 200,
      "content": {
        "html": "<!DOCTYPE html>...",
        "text": "Cleaned text content...",
        "json": {
          "title": "Example Domain",
          "description": "Example page",
          "metadata": {...}
        }
      },
      "title": "Example Domain",
      "metadata": {
        "description": "Example page",
        "keywords": ["example"]
      },
      "headers": {
        "content-type": "text/html; charset=UTF-8"
      },
      "cached": false,
      "response_time_ms": 1234,
      "size_bytes": 15234,
      "screenshot_url": null,  // URL if screenshot: true (available 24h)
      "pdf_url": null,         // URL if generate_pdf: true (available 24h)
      "billing": {
        "total_credits": 1,
        "tier_used": "1",
        "escalations": [
          {
            "tier": "1",
            "result": "success",
            "credits": 1,
            "duration_ms": 234
          }
        ],
        "savings": 19
      },
      "extraction_method": "algorithmic",
      "version": "v1"
    }
    
    // Async response (202) - Complex requests with sync=false or auto-detected
    {
      "job_id": "550e8400-e29b-41d4-a716-446655440000"
    }

    Execution Modes

    • Sync (sync=true, default): API queues the job to the worker service and polls internally every 100-500ms. When the job completes (within 60-120 seconds), returns 200 with full content. No manual polling needed - simpler request/response pattern. If job exceeds timeout, falls back to 202 with job_id for manual polling.
    • Async (sync=false): Immediately returns 202 with job_id. You must poll /api/v1/jobs/{job_id} or use WebSocket for results. Recommended for long-running scrapes (>60s), batch operations, webhooks, or real-time updates via WebSocket.
    • Key difference: Both modes queue to the same worker service. sync=true just adds automatic polling on the API side, making it simpler for clients but with a timeout constraint. sync=false gives you full control over polling/WebSocket with no timeout limits.

    When to Use Async Mode (sync=false)

    Use sync: false for:
    • Scrapes expected to take longer than 60 seconds (large PDFs, complex sites)
    • Batch operations where you want to queue multiple jobs and poll together
    • When you need webhook delivery instead of polling
    • Real-time status updates via WebSocket
    For quick scrapes (most websites), sync: true is simpler and eliminates manual polling code.

    How Sync Mode Works (Under the Hood)

    Understanding sync mode helps you choose the right execution pattern for your use case:

    1. Job Queueing (Both Modes)

    Regardless of sync setting, all scrape requests are queued to the worker service. This ensures consistent anti-bot capabilities, proxy management, and resource pooling.

    2. Internal Polling (sync=true)

    When sync=true, the API server holds your HTTP connection open and polls Redis every 100-500ms checking job status. When complete, it returns the full result as a 200 response. Maximum wait: 60-120 seconds.

    3. Manual Polling (sync=false)

    When sync=false, the API immediately returns 202 with job_id and closes the connection. You control polling frequency and timeout. No server-side timeout constraints.

    4. Timeout Behavior

    If sync=true and job exceeds timeout (60-120s), API returns 202 with job_id as fallback. You can then poll manually. This prevents hung connections while still supporting long-running jobs.

    Key Insight: sync=true is purely a convenience feature. Both modes use the same worker infrastructure and have identical scraping capabilities. The only difference is who handles polling: the API server (sync=true) or your client code (sync=false).

    Response Format

    The content field may be a plain string (simple sync requests) or a structured object when using the formats parameter.

    Simple sync (string)
    JSON
    {
      "url": "https://example.com",
      "status_code": 200,
      "content": "<!DOCTYPE html>...",
      "billing": {"total_credits": 1}
    }
    With formats (object)
    JSON
    {
      "url": "https://example.com",
      "status_code": 200,
      "content": {
        "html": "<!DOCTYPE html>...",
        "text": "Clean text...",
        "markdown": "# Example..."
      },
      "billing": {"total_credits": 1}
    }

    Python SDK
    Recommended

    The official Python SDK provides a simple, intuitive interface for the AlterLab API with automatic polling, retry logic, and type hints.

    Installation

    Bash
    pip install alterlab

    Async vs Sync

    The SDK provides two clients: AlterLab (async, requires await) and AlterLabSync (sync, no await needed). Choose based on your codebase.

    Sync Usage (Recommended for scripts)

    Python
    from alterlab import AlterLabSync
    
    # Use AlterLabSync for synchronous code (no await needed)
    with AlterLabSync(api_key="YOUR_API_KEY") as client:
        result = client.scrape("https://example.com")
        print(result["content"])
        print(f"Cost: {result['billing']['total_credits']}")
    
        # With schema filtering
        result = client.scrape(
            url="https://amazon.com/dp/B0123456789",
            extraction_schema={
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "price": {"type": "number"},
                    "in_stock": {"type": "boolean"}
                }
            }
        )
        product = result["filtered_content"]  # Your filtered data
        print(f"Product: {product['name']} - ${product['price']}")

    Async Usage (For async applications)

    Python
    import asyncio
    from alterlab import AlterLab
    
    async def main():
        # AlterLab is async - requires await
        async with AlterLab(api_key="YOUR_API_KEY") as client:
            result = await client.scrape("https://example.com")
            print(result["content"])
    
            # With JS rendering
            result = await client.scrape(
                url="https://spa-app.com",
                mode="js",
                advanced=AdvancedOptions(render_js=True, screenshot=True)
            )
    
    asyncio.run(main())

    Advanced Options

    Python
    from alterlab import AlterLabSync, AdvancedOptions, CostControls
    
    with AlterLabSync(api_key="YOUR_API_KEY") as client:
        # With advanced options
        result = client.scrape(
            url="https://example.com",
            mode="js",
            advanced=AdvancedOptions(
                render_js=True,
                screenshot=True,
                markdown=True,
                wait_condition="networkidle"
            ),
            cost_controls=CostControls(
                max_credits=10,
                max_tier="4",
                prefer_cost=True
            )
        )
    
        # Screenshot saved to result["screenshot_url"]
        # Markdown in result["content"]["markdown"] if formats specified

    Cost Estimation

    Python
    from alterlab import AlterLabSync, AdvancedOptions
    
    with AlterLabSync(api_key="YOUR_API_KEY") as client:
        # Estimate cost before scraping
        estimate = client.estimate_cost(
            url="https://example.com",
            mode="auto",
            advanced=AdvancedOptions(render_js=True, screenshot=True)
        )
    
        print(f"Estimated credits: {estimate['estimated_credits']}")
        print(f"Max possible: {estimate['max_possible_credits']}")
    
        # Only scrape if within budget
        if estimate['estimated_credits'] <= 5:
            result = client.scrape(url="https://example.com")

    Error Handling

    Python
    from alterlab import AlterLabSync, AlterLabAPIError, AlterLabTimeoutError
    
    with AlterLabSync(api_key="YOUR_API_KEY") as client:
        try:
            result = client.scrape("https://example.com")
        except AlterLabAPIError as e:
            if e.status_code == 402:
                print("Insufficient balance!")
            elif e.status_code == 429:
                print(f"Rate limited. Retry after {e.retry_after}s")
            else:
                print(f"API error: {e.detail}")
        except AlterLabTimeoutError:
            print("Request timed out")
        except Exception as e:
            print(f"Unexpected error: {e}")

    Manual Job Management

    Python
    # Get job status without waiting
    job_id = "550e8400-e29b-41d4-a716-446655440000"
    status = client.get_job_status(job_id)
    print(f"Job status: {status['status']}")
    
    # Poll job with custom settings
    result = client.poll_job(
        job_id=job_id,
        poll_interval=2.0,  # Check every 2 seconds
        poll_timeout=300.0   # Give up after 5 minutes
    )

    Configuration

    Python
    # Custom configuration (sync client)
    client = AlterLabSync(
        api_key="YOUR_API_KEY",
        base_url="https://api.alterlab.io",  # API base (paths include /api/v1)
        timeout=60.0,  # Request timeout
        max_retries=3,  # Retry failed requests
        retry_backoff=2.0  # Exponential backoff multiplier
    )
    
    # Or use environment variable
    # export ALTERLAB_API_KEY="YOUR_API_KEY"
    client = AlterLabSync()  # Auto-loads from env

    SDK Benefits

    • Automatic polling: Handles 202 responses and job polling for you
    • Retry logic: Automatically retries failed requests with exponential backoff
    • Type hints: Full type annotations for IDE autocomplete
    • Error handling: Custom exception types for different error scenarios
    • Convenience methods: Mode-specific methods for common use cases

    Structured Extraction (Optional Filtering)

    Already Getting JSON?

    If you just want structured JSON data, use formats: ["json"] as shown in Getting Structured JSON. This section is for filtering that JSON to specific fields.

    Filter and restructure extracted data to match your desired output format using JSON Schema. This is pure data transformation - no additional cost.

    JSON Schema Filtering

    Pass extraction_schema to filter extracted data to your desired structure. The filtered result appears in filtered_content:

    Python
    # Extract product data with schema
    result = client.scrape(
        url="https://amazon.com/dp/B0123456789",
        mode="auto",
        extraction_schema={
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "price": {"type": "number"},
                "currency": {"type": "string"},
                "in_stock": {"type": "boolean"},
                "rating": {"type": "number"},
                "reviews_count": {"type": "integer"}
            }
        }
    )
    
    # Access filtered data (matches your schema)
    product = result["filtered_content"]
    print(f"Name: {product['name']}")
    print(f"Price: {product['price']}")
    print(f"In Stock: {product['in_stock']}")

    Field Aliases

    Schema filtering automatically maps common field name variations:

    in_stock → availability
    sku → asin, product_id
    image_urls → images
    title → name, headline

    See JSON Schema Filtering guide for the complete list.

    Response Structure

    JSON
    {
      "url": "https://amazon.com/dp/B0123456789",
      "status_code": 200,
      "content": { ... },           // Full extraction (Schema.org, metadata, etc.)
      "filtered_content": {         // YOUR filtered data (only when extraction_schema provided)
        "name": "Product Name",
        "price": 29.99,
        "in_stock": true
      },
      "billing": { "total_credits": 3 }
    }

    Zero Additional Cost

    Schema filtering is pure data transformation - no LLM calls, no extra charges. It filters existing structured data (Schema.org, Open Graph, playbook extractions) to match your schema.

    Advanced Options

    Advanced options provide fine-grained control over scraping behavior and enable premium features.

    OptionTypeCostDescription
    render_jsboolean+3Use headless browser for JavaScript rendering
    screenshotboolean+1Capture full-page screenshot (requires render_js)
    generate_pdfboolean+2Generate PDF of rendered page (requires render_js)
    ocrboolean+5Extract text from images using OCR
    use_proxyboolean+1Route through premium proxy network
    markdownbooleanFreeConvert content to Markdown format
    wait_conditionstringFreeWait condition: domcontentloaded, networkidle, load
    remove_cookie_bannersbooleanFreeRemove cookie consent banners before extraction (default: true)
    use_own_proxyboolean$0.0008Use your integrated proxy instead of system proxy
    use_system_proxyboolean+1Override your default proxy integration to use AlterLab system proxy
    proxy_integration_idstring-Specific proxy integration ID (requires use_own_proxy)
    proxy_countrystring-Preferred proxy country code for geo-targeting (e.g., US, DE)
    session_idUUID-Stored session ID for authenticated scraping (injects cookies and headers)
    cookiesobject-Inline cookies for one-off auth scraping (max 100, mutually exclusive with session_id)
    session_headersobject-Inline auth headers (e.g., Authorization: Bearer). Max 50 headers.

    Example with Advanced Options

    JSON
    {
      "url": "https://spa-app.com",
      "mode": "auto",
      "advanced": {
        "render_js": true,
        "screenshot": true,
        "generate_pdf": true,
        "markdown": true,
        "use_proxy": true,
        "wait_condition": "networkidle"
      }
    }
    
    // Response includes download URLs (available for 24 hours):
    // "screenshot_url": "https://alterlab.io/downloads/screenshots/2025-01-15/job-id.png"
    // "pdf_url": "https://alterlab.io/downloads/pdfs/2025-01-15/job-id.pdf"

    Cost Controls

    The cost_controls object lets you control tier escalation behavior and set budget limits per request.

    ParameterTypeDescription
    max_creditsnumberMaximum credits to spend on this request. Request fails if cost would exceed this.
    force_tierstringForce a specific tier (1, 2, 3, 3.5, 4). Skips escalation entirely.
    max_tierstringMaximum tier to escalate to. Prevents escalation beyond this level.
    prefer_costbooleanOptimize for cost: try cheaper tiers first before escalating.
    prefer_speedbooleanOptimize for speed: skip to the most reliable tier immediately.
    fail_fastbooleanReturn error instead of escalating to expensive tiers.
    JSON
    {
      "url": "https://example.com",
      "mode": "auto",
      "cost_controls": {
        "max_credits": 5,
        "max_tier": "4",
        "prefer_cost": true,
        "fail_fast": false
      }
    }

    Tier Escalation & Cost Controls

    AlterLab uses an intelligent 5-tier escalation system that automatically tries the cheapest method first and escalates only when needed. Each tier has different capabilities, speeds, and costs.

    TierNameCostPer $1Description
    1Curl$0.00025,000Ultra-fast curl binary for static sites
    2HTTP$0.00033,333HTTPX with TLS fingerprinting and HTTP/2
    3Stealth$0.002500curl_cffi with Chrome browser impersonation
    4Browser$0.004250Playwright browser automation for JS sites
    5Captcha$0.0250Browser with AI-powered captcha solving

    How Escalation Works

    1. Start cheapest: By default, starts at Tier 1 (Curl: $0.0002)
    2. Attempt scrape: Tries to scrape with current tier's method
    3. Check success: If successful (status 200, valid content), stop and return result
    4. Escalate if failed: If failed (timeout, blocked, error), move to next tier and retry
    5. Stop at success or max tier: Returns when successful or when max_tier/max_credits reached
    6. Detailed billing: Response includes all attempts, final tier used, and cost saved

    Cost Control Parameters

    max_credits(float, optional)

    Maximum cost to spend on this request. API will not escalate beyond this budget. Example: max_credits: 0.004 stops at Tier 4 (Browser).

    max_tier(string, optional)

    Maximum tier to escalate to: "1" (Curl), "2" (HTTP), "3" (Stealth), "4" (Browser), or "5" (Captcha). Example: max_tier: "4" stops at Browser.

    prefer_cost(boolean, default: false)

    Start with cheapest tier (Tier 1) and try each tier sequentially. Best for known simple sites.

    prefer_speed(boolean, default: false)

    Start with Tier 4 (Browser) for guaranteed success on most sites. Higher cost but faster overall.

    fail_fast(boolean, default: false)

    Return error instead of escalating to expensive tiers. Useful when you want predictable costs.

    Example: Cost-Optimized Request

    JSON
    {
      "url": "https://example.com",
      "mode": "auto",
      "cost_controls": {
        "max_credits": 0.004,
        "max_tier": "4",
        "prefer_cost": true,
        "fail_fast": false
      }
    }
    
    // Response billing breakdown:
    {
      "billing": {
        "total_cost": 0.002,
        "tier_used": "3",
        "escalations": [
          {"tier": "1", "result": "failed", "cost": 0.0002, "duration_ms": 250, "error": "403 Forbidden"},
          {"tier": "2", "result": "failed", "cost": 0.0003, "duration_ms": 2100, "error": "Blocked by WAF"},
          {"tier": "3", "result": "success", "cost": 0.002, "duration_ms": 4200}
        ],
        "optimization_suggestion": "Site requires Stealth tier. Consider using prefer_speed with Tier 4 for faster results."
      }
    }

    Cost Control Best Practices

    • Always set max_credits for production to prevent unexpected charges
    • Use prefer_cost: true for known simple sites
    • Use prefer_speed: true for critical scrapers where reliability matters more than cost
    • Set fail_fast: true in testing to avoid unnecessary spending on misconfigured requests

    Async Mode & Job Polling

    Complex scraping requests return a 202 status with a job_id. Poll the job status endpoint to retrieve results.

    GET
    /api/v1/jobs/{job_id}

    Poll job status and retrieve results when completed.

    Parameters

    NameTypeRequiredDescription
    job_idstring
    Required
    UUID of the job returned from the scrape endpoint

    Request Example

    Bash
    curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
      -H "X-API-Key: sk_live_..."

    Response Example

    JSON
    // Status: pending
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "pending",
      "url": "https://example.com",
      "mode": "auto",
      "created_at": "2025-11-05T10:30:00Z"
    }
    
    // Status: running
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "running",
      "url": "https://example.com",
      "mode": "auto",
      "progress": 50,
      "created_at": "2025-11-05T10:30:00Z"
    }
    
    // Status: completed
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "completed",
      "url": "https://example.com",
      "mode": "auto",
      "result": {
        "url": "https://example.com",
        "status_code": 200,
        "content": {
          "html": "...",
          "text": "...",
          "json": {...}
        },
        "billing": {
          "total_credits": 5,
          "tier_used": "4"
        }
      },
      "created_at": "2025-11-05T10:30:00Z",
      "completed_at": "2025-11-05T10:30:15Z"
    }
    
    // Status: failed
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "status": "failed",
      "url": "https://example.com",
      "mode": "auto",
      "error": "Timeout after 30 seconds",
      "created_at": "2025-11-05T10:30:00Z",
      "failed_at": "2025-11-05T10:30:30Z"
    }

    Polling Best Practices

    • Recommended interval: Poll every 2-5 seconds
    • Timeout: Set a maximum polling duration (5 minutes recommended)
    • Exponential backoff: Increase polling interval if job takes longer
    • Status values: pending → running → completed/failed

    Example Polling Loop (JavaScript)

    JAVASCRIPT
    async function pollJobStatus(jobId, apiKey, maxWaitMs = 300000) {
      const startTime = Date.now();
      const pollInterval = 2000; // 2 seconds
    
      while (Date.now() - startTime < maxWaitMs) {
        const response = await fetch(
          `https://api.alterlab.io/api/v1/jobs/${jobId}`,
          { headers: { 'X-API-Key': apiKey } }
        );
    
        const job = await response.json();
    
        if (job.status === 'completed') {
          return job.result;
        } else if (job.status === 'failed') {
          throw new Error(job.error);
        }
    
        // Wait before next poll
        await new Promise(resolve => setTimeout(resolve, pollInterval));
      }
    
      throw new Error('Job polling timeout');
    }

    WebSocket Alternative

    For real-time updates without polling, use WebSocket connections to receive job status updates as they happen. This is more efficient than polling and provides instant notifications.

    WebSocket Endpoint

    wss://api.alterlab.io/api/v1/ws/jobs?api_key=YOUR_API_KEY

    Protocol Messages

    Client → Server (Subscribe):
    JSON
    {"action": "subscribe", "job_id": "<job-uuid>"}
    Client → Server (Unsubscribe):
    JSON
    {"action": "unsubscribe", "job_id": "<job-uuid>"}
    Client → Server (Ping):
    JSON
    {"action": "ping"}
    Server → Client (Job Update):
    JSON
    {"type": "job_update", "job_id": "...", "status": "running|completed|failed", "result": {...}, "error": null, "ts": 1730451136}
    Server → Client (Heartbeat):
    JSON
    {"type": "heartbeat", "ts": 1730451136}

    Example: JavaScript WebSocket Client

    JAVASCRIPT
    // Connect with API key authentication
    const ws = new WebSocket('wss://api.alterlab.io/api/v1/ws/jobs?api_key=sk_live_...');
    
    // Handle connection open
    ws.onopen = () => {
      console.log('WebSocket connected');
    
      // Subscribe to job updates
      ws.send(JSON.stringify({
        action: 'subscribe',
        job_id: '550e8400-e29b-41d4-a716-446655440000'
      }));
    };
    
    // Receive real-time updates
    ws.onmessage = (event) => {
      const message = JSON.parse(event.data);
    
      switch (message.type) {
        case 'connected':
          console.log('✓ Connection established');
          break;
    
        case 'subscribed':
          console.log('✓ Subscribed to job:', message.job_id);
          break;
    
        case 'job_update':
          console.log('Job status:', message.status);
    
          if (message.status === 'completed') {
            console.log('✓ Job completed:', message.result);
            ws.close();
          } else if (message.status === 'failed') {
            console.error('✗ Job failed:', message.error);
            ws.close();
          } else {
            console.log('⟳ Job in progress...');
          }
          break;
    
        case 'heartbeat':
          // Server is alive
          break;
    
        case 'error':
          console.error('WebSocket error:', message.message);
          break;
      }
    };
    
    ws.onerror = (error) => {
      console.error('WebSocket connection error:', error);
    };
    
    ws.onclose = () => {
      console.log('WebSocket disconnected');
    };

    Example: Python WebSocket Client

    Python
    import asyncio
    import json
    import websockets
    
    async def watch_job(api_key: str, job_id: str):
        uri = f"wss://api.alterlab.io/api/v1/ws/jobs?api_key={api_key}"
    
        async with websockets.connect(uri) as ws:
            # Subscribe to job
            await ws.send(json.dumps({
                "action": "subscribe",
                "job_id": job_id
            }))
    
            # Listen for updates
            async for message in ws:
                data = json.loads(message)
    
                if data["type"] == "job_update":
                    status = data["status"]
                    print(f"Job status: {status}")
    
                    if status == "completed":
                        print("Job completed:", data["result"])
                        break
                    elif status == "failed":
                        print("Job failed:", data["error"])
                        break
    
    # Usage
    asyncio.run(watch_job("sk_live_...", "550e8400-..."))

    WebSocket vs Polling

    • WebSocket: Instant updates, lower latency, persistent connection, more efficient for long-running jobs
    • Polling: Simpler implementation, works through proxies/firewalls, no persistent connection needed
    • Recommendation: Use WebSocket for real-time dashboards, polling for simple scripts

    Batch Scraping

    Submit multiple URLs for scraping in a single request. Batch requests are processed asynchronously, and you can receive results via webhook or by polling individual job statuses.

    POST
    /api/v1/batch

    Submit a batch of URLs for asynchronous processing with optional webhook delivery.

    Parameters

    NameTypeRequiredDescription
    urlsstring[]
    Required
    Array of URLs to scrape (max 1000 per batch)
    modestringOptionalScraping mode applied to all URLsDefault: auto
    webhook_urlstringOptionalURL to receive results via POST webhook
    advancedAdvancedOptionsOptionalAdvanced options applied to all URLs
    cost_controlsCostControlsOptionalCost controls applied to all URLs

    Request Example

    Bash
    curl -X POST https://api.alterlab.io/api/v1/batch \
      -H "X-API-Key: sk_live_..." \
      -H "Content-Type: application/json" \
      -d '{
        "urls": [
          "https://example.com/page1",
          "https://example.com/page2",
          "https://example.com/page3"
        ],
        "mode": "auto",
        "webhook_url": "https://your-app.com/webhooks/scraping",
        "cost_controls": {
          "max_credits": 5
        }
      }'

    Response Example

    JSON
    {
      "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
      "total_jobs": 3,
      "job_ids": [
        "550e8400-e29b-41d4-a716-446655440001",
        "550e8400-e29b-41d4-a716-446655440002",
        "550e8400-e29b-41d4-a716-446655440003"
      ],
      "estimated_credits": 3,
      "webhook_url": "https://your-app.com/webhooks/scraping",
      "status": "pending"
    }

    Webhook Payload Format

    When a job completes, AlterLab sends a POST request to your webhook URL with this payload:

    Bash
    POST https://your-app.com/webhooks/scraping
    Content-Type: application/json
    X-AlterLab-Signature: sha256=...  // Webhook signature for verification
    
    {
      "event": "job.completed",
      "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
      "job_id": "550e8400-e29b-41d4-a716-446655440001",
      "url": "https://example.com/page1",
      "status": "completed",
      "result": {
        "url": "https://example.com/page1",
        "status_code": 200,
        "content": "...",
        "billing": {
          "total_credits": 1,
          "tier_used": "1"
        }
      },
      "completed_at": "2025-11-05T10:30:15Z"
    }

    Python SDK Batch Example

    Python
    from alterlab import AlterLabSync
    
    client = AlterLabSync(api_key="YOUR_API_KEY")
    
    # Submit batch with webhook
    batch = client.batch_scrape(
        urls=[
            "https://example.com/page1",
            "https://example.com/page2",
            "https://example.com/page3"
        ],
        mode="auto",
        webhook_url="https://your-app.com/webhooks/scraping"
    )
    
    print(f"Batch ID: {batch['batch_id']}")
    print(f"Total jobs: {batch['total_jobs']}")
    
    # Or poll each job individually
    for job_id in batch['job_ids']:
        result = client.poll_job(job_id)
        print(f"Job {job_id}: {result['status']}")

    Batch Limits

    • Maximum 1,000 URLs per batch request
    • Webhook URL must be publicly accessible (HTTPS required)
    • Webhook retries: 3 attempts with exponential backoff
    • Batch jobs are processed in parallel (no guaranteed order)

    Usage & Balance

    Monitor your API usage, spending, and account limits.

    GET
    /api/v1/usage

    Get current usage statistics and remaining balance for your account.

    Parameters

    NameTypeRequiredDescription
    start_datestringOptionalStart date for usage period (ISO 8601 format)
    end_datestringOptionalEnd date for usage period (ISO 8601 format)

    Request Example

    Bash
    # Current billing period
    curl -X GET https://api.alterlab.io/api/v1/usage \
      -H "X-API-Key: sk_live_..."
    
    # Specific date range
    curl -X GET "https://api.alterlab.io/api/v1/usage?start_date=2025-11-01&end_date=2025-11-30" \
      -H "X-API-Key: sk_live_..."

    Response Example

    JSON
    {
      "period": {
        "start": "2025-11-01T00:00:00Z",
        "end": "2025-11-30T23:59:59Z",
        "current": true
      },
      "balance": {
        "current_cents": 7655,
        "deposited_cents": 10000,
        "used_cents": 2345
      },
      "requests": {
        "total": 2345,
        "successful": 2289,
        "failed": 56,
        "cached": 432
      },
      "by_mode": {
        "html": 1234,
        "js": 856,
        "pdf": 145,
        "ocr": 110
      },
      "by_tier": {
        "1": 1234,
        "2": 856,
        "3": 145,
        "4": 98,
        "5": 12
      },
      "rate_limits": {
        "requests_per_minute": 300,
        "current_usage": 12,
        "reset_at": "2025-11-05T10:31:00Z"
      },
      "spend_tier": {
        "current": "growth",
        "rolling_30d_spend_cents": 8500,
        "next_tier_at_cents": 20000
      }
    }

    Python SDK Usage Example

    Python
    from alterlab import AlterLabSync
    from datetime import datetime, timedelta
    
    client = AlterLabSync(api_key="YOUR_API_KEY")
    
    # Get current usage
    usage = client.get_usage()
    print(f"Credits remaining: {usage['credits']['remaining']}")
    print(f"Requests this period: {usage['requests']['total']}")
    
    # Check if we have enough balance
    if usage['credits']['remaining'] < 100:
        print("Warning: Low balance! Time to top up.")
    
    # Get usage for specific period
    start = datetime.now() - timedelta(days=7)
    weekly_usage = client.get_usage(
        start_date=start.isoformat(),
        end_date=datetime.now().isoformat()
    )
    print(f"Weekly requests: {weekly_usage['requests']['total']}")

    Monitoring Best Practices

    • Check before large batches: Query /usage before submitting batch jobs to ensure sufficient balance
    • Monitor rate limits: Track rate_limits.current_usage to avoid 429 errors
    • Set up alerts: Monitor credits.remaining and alert when below threshold
    • Track by tier: Use by_tier breakdown to optimize tier selection
    • Cache optimization: Monitor requests.cached ratio to measure cache efficiency

    Balance Management Tips

    • Balance never expires - deposit and use at your own pace
    • Rate limits scale automatically with your 30-day rolling spend
    • Check spend_tier.next_tier_at_cents to see next rate limit upgrade threshold
    • Deposit more funds anytime from the dashboard billing page
    • Use cost_controls.max_credits to prevent budget overruns

    Cost Estimation

    POST
    /api/v1/scrape/estimate

    Estimate the cost of a scrape request without actually scraping.

    Parameters

    NameTypeRequiredDescription
    urlstring
    Required
    The URL to estimate
    modestringOptionalScraping modeDefault: auto
    advancedAdvancedOptionsOptionalAdvanced options to include in estimate

    Request Example

    Bash
    curl -X POST https://api.alterlab.io/api/v1/scrape/estimate \
      -H "X-API-Key: sk_live_..." \
      -H "Content-Type: application/json" \
      -d '{
        "url": "https://example.com",
        "mode": "auto"
      }'

    Response Example

    JSON
    {
      "url": "https://example.com",
      "estimated_tier": "1",
      "estimated_credits": 1,
      "confidence": "high",
      "max_possible_credits": 10,
      "reasoning": "Known simple site - tier 1 should work"
    }

    Rate Limits

    Rate limits vary by plan. Monitor your usage through the response headers.

    HeaderDescriptionWhen Available
    X-AlterLab-CreditsCost charged for this requestSuccess (200)
    X-AlterLab-TierTier used for scrapingSuccess (200)
    X-AlterLab-SavingsSavings vs highest tierSuccess (200)
    X-AlterLab-BytesResponse size in bytesSuccess (200)
    X-AlterLab-CachedWhether result was cachedSuccess (200)
    X-RateLimit-LimitMaximum requests per minuteRate limit (429)
    X-RateLimit-RemainingRemaining requests in windowRate limit (429)
    X-RateLimit-ResetUnix timestamp when limit resetsRate limit (429)

    Rate Limit Information

    Success responses (200, 202) use X-AlterLab-* headers. Rate limit errors (429) use X-RateLimit-* headers.

    Error Handling

    Status CodeMeaningAction
    200Success (sync response)Process content directly
    202Accepted (async response)Poll job_id for results
    400Bad RequestCheck request parameters
    401UnauthorizedVerify API key
    402Payment RequiredInsufficient balance, top up account
    404Not FoundJob doesn't exist or unauthorized
    415Unsupported Media TypeURL content type not supported
    422Unprocessable EntityAll tiers failed, site may be blocking
    429Too Many RequestsWait for reset time or upgrade plan
    500Internal Server ErrorRetry request, contact support if persists
    502Bad GatewayWorker queue error, retry request

    Troubleshooting

    Common Errors & Solutions

    401 Unauthorized - Invalid API Key

    Your API key is missing, invalid, or has been revoked.

    • Verify API key format starts with sk_live_ or sk_test_
    • Check that key is active in dashboard → API Keys
    • Ensure header is X-API-Key (case-sensitive)
    • Generate new API key if compromised

    402 Payment Required - Insufficient Balance

    Your account has insufficient balance for the current billing period.

    • Check remaining balance: GET /api/v1/usage
    • Upgrade plan in dashboard → Billing
    • Add funds or wait for next billing cycle

    422 Unprocessable Entity - All Tiers Failed

    Site actively blocked all scraping attempts across all tier levels.

    • Check if URL requires authentication (login, cookies)
    • Verify URL is publicly accessible
    • Try with mode: "js" explicitly
    • Set max_tier: "5" to enable CAPTCHA solving
    • Contact support if site should be accessible

    429 Too Many Requests - Rate Limit Exceeded

    You've exceeded your plan's rate limit (requests per minute).

    • Check X-RateLimit-Reset header for reset time
    • Implement exponential backoff in your code
    • Upgrade plan for higher rate limits
    • Spread requests over time instead of bursting

    202 Accepted → Polling Never Completes

    Job returns 202 but polling /jobs/{job_id} never shows completed status.

    • Check job status shows "running" or "pending" vs "failed"
    • Worker service may be down - check status page
    • Timeout may be too low - increase request timeout
    • Use WebSocket for real-time updates instead
    • Set reasonable polling timeout (5 minutes recommended)

    Rate Limit Handling Best Practices

    JAVASCRIPT
    // JavaScript: Retry with exponential backoff
    async function scrapeWithRetry(url, maxRetries = 3) {
      for (let i = 0; i < maxRetries; i++) {
        try {
          const response = await fetch('https://api.alterlab.io/api/v1/scrape', {
            method: 'POST',
            headers: {
              'X-API-Key': process.env.ALTERLAB_API_KEY,
              'Content-Type': 'application/json'
            },
            body: JSON.stringify({ url })
          });
    
          if (response.status === 429) {
            const resetTime = response.headers.get('X-RateLimit-Reset');
            const waitTime = resetTime ?
              (parseInt(resetTime) * 1000 - Date.now()) :
              Math.pow(2, i) * 1000;
    
            console.log(`Rate limited. Waiting ${waitTime}ms...`);
            await new Promise(resolve => setTimeout(resolve, waitTime));
            continue;
          }
    
          if (response.ok) {
            return await response.json();
          }
    
          throw new Error(`HTTP ${response.status}`);
    
        } catch (error) {
          if (i === maxRetries - 1) throw error;
          await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
        }
      }
    }
    
    // Python: Retry with backoff
    import time
    import requests
    from requests.exceptions import RequestException
    
    def scrape_with_retry(url: str, max_retries: int = 3):
        for attempt in range(max_retries):
            try:
                response = requests.post(
                    'https://api.alterlab.io/api/v1/scrape',
                    headers={'X-API-Key': os.environ['ALTERLAB_API_KEY']},
                    json={'url': url}
                )
    
                if response.status_code == 429:
                    reset_time = response.headers.get('X-RateLimit-Reset')
                    wait_time = (
                        int(reset_time) - time.time()
                        if reset_time else 2 ** attempt
                    )
                    print(f"Rate limited. Waiting {wait_time}s...")
                    time.sleep(max(wait_time, 0))
                    continue
    
                response.raise_for_status()
                return response.json()
    
            except RequestException as e:
                if attempt == max_retries - 1:
                    raise
                time.sleep(2 ** attempt)

    Cost Management Tips

    • Estimate before scraping: Use POST /api/v1/scrape/estimateto check costs before running expensive requests
    • Set cost controls: Always use max_creditsin production to prevent unexpected charges from tier escalation
    • Monitor usage: Check X-AlterLab-Creditsresponse header and track cumulative usage
    • Enable caching: Pass cache: true to cache responses. Subsequent requests to the same URL return cached results for free (caching is opt-in, disabled by default)
    • Optimize tier usage: Review billing.optimization_suggestionin responses to improve cost efficiency

    Still Having Issues?

    If you're still experiencing problems after trying these solutions:

    • Check service status: status.alterlab.io
    • Review API audit report: Ensure you're using latest best practices
    • Contact support: [email protected]
    • Join Discord community: Get help from other developers

    Include in support requests: job_id, timestamp, full error message, and API request/response

    Your First RequestJob Polling
    Last updated: March 2026

    On this page