AlterLabAlterLab
Guide

Batch Scraping

Submit up to 100 URLs in a single API call. Jobs are processed in parallel and you can poll for results or receive a webhook when the batch completes.

Async by Design

Batch requests return immediately with a batch_id. Use it to poll status or configure a webhook for delivery.

How It Works

1

Submit

POST an array of URLs to /api/v1/batch. Credits are pre-debited for the estimated total cost.

2

Process

Each URL becomes an individual job processed in parallel. The same tier escalation and anti-bot logic applies to every job.

3

Collect

Poll GET /api/v1/batch/{batch_id} for individual results, or receive a batch.completed webhook when all jobs finish.

Submit a Batch

POST
/api/v1/batch

Submit a batch of URLs for asynchronous processing with optional webhook delivery. Returns 202 Accepted.

curl -X POST https://api.alterlab.io/api/v1/batch \
  -H "X-API-Key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      { "url": "https://example.com/page-1" },
      { "url": "https://example.com/page-2", "mode": "js" },
      {
        "url": "https://example.com/page-3",
        "formats": ["text", "markdown"],
        "cost_controls": { "max_tier": "3" }
      }
    ],
    "webhook_url": "https://your-server.com/webhook"
  }'

Request Body

ParameterTypeRequiredDescription
urlsarrayYesArray of URL objects (1–100 items)
webhook_urlstringNoURL to receive a batch.completed event when all jobs finish

Per-URL Options

Each item in the urls array accepts the same parameters as a single scrape request:

FieldDefaultDescription
urlTarget URL (required)
mode"auto"auto, html, js, pdf, ocr
formats["html", "json"]Output formats: text, json, html, markdown
cost_controlsnullforce_tier, max_tier, prefer_cost, prefer_speed, fail_fast
extraction_schemanullJSON schema for structured extraction
cachefalseEnable caching for this URL
timeout90Per-URL timeout in seconds (1–300)
wait_fornullCSS selector to wait for before extracting

Response (202 Accepted):

{
  "batch_id": "batch_7a3b9c8d-4e2f-1a6b-8c5d-9e0f1a2b3c4d",
  "total_urls": 3,
  "status": "processing",
  "estimated_credits": 45000,
  "job_ids": [
    "a1b2c3d4-...",
    "e5f6g7h8-...",
    "i9j0k1l2-..."
  ]
}

Poll Batch Status

GET
/api/v1/batch/{batch_id}

Returns the aggregate status and individual job results for a batch.

curl https://api.alterlab.io/api/v1/batch/batch_7a3b9c8d-... \
  -H "X-API-Key: your_api_key"

Status Response

{
  "batch_id": "batch_7a3b9c8d-...",
  "status": "completed",
  "total": 3,
  "completed": 2,
  "failed": 1,
  "pending": 0,
  "items": [
    {
      "job_id": "a1b2c3d4-...",
      "url": "https://example.com/page-1",
      "status": "succeeded",
      "result": { "text": "...", "metadata": { ... } }
    },
    {
      "job_id": "e5f6g7h8-...",
      "url": "https://example.com/page-2",
      "status": "succeeded",
      "result": { "text": "...", "metadata": { ... } }
    },
    {
      "job_id": "i9j0k1l2-...",
      "url": "https://example.com/page-3",
      "status": "failed",
      "error": "BLOCKED_BY_ANTIBOT"
    }
  ],
  "created_at": "2026-03-03T12:00:00Z"
}

Batch status values:

StatusMeaning
processingSome jobs are still running
completedAll jobs succeeded
partialAll jobs done, some failed
failedAll jobs failed

Webhook Delivery

If you provide a webhook_url, AlterLab sends a batch.completed POST when every job in the batch finishes (succeeded or failed).

{
  "event": "batch.completed",
  "timestamp": "2026-03-03T12:01:00Z",
  "data": {
    "batch_id": "batch_7a3b9c8d-...",
    "status": "partial",
    "total": 3,
    "completed": 2,
    "failed": 1
  }
}

Webhook is fire-once

The webhook fires exactly once per batch, the first time all jobs are done. Subsequent polls will not re-trigger it.

Billing & Credits

  • Credits are pre-debited when the batch is submitted, based on estimated per-URL cost.
  • If a job fails, the worker automatically refunds the credits for that URL.
  • BYOP (Bring Your Own Proxy) discounts apply per-URL if you have an active proxy integration.
  • Check estimated_credits in the submit response to see the total debited.

Limits

Batch Limits

  • Maximum 100 URLs per batch request
  • Each URL counts as a separate scrape credit-wise
  • Batch jobs are processed in parallel (no guaranteed order)
  • Batch metadata expires after 24 hours — poll or use webhooks before then

Python Example

import alterlab
import time

client = alterlab.AlterLab(api_key="your_api_key")

# Submit batch
batch = client.batch_scrape(
    urls=[
        {"url": "https://example.com/page-1"},
        {"url": "https://example.com/page-2", "mode": "js"},
        {"url": "https://example.com/page-3", "formats": ["markdown"]},
    ],
    webhook_url="https://your-server.com/webhook",
)

print(f"Batch ID: {batch['batch_id']}")
print(f"Estimated credits: {batch['estimated_credits']}")

# Poll until complete
while True:
    status = client.get_batch_status(batch["batch_id"])
    print(f"Status: {status['status']} — {status['completed']}/{status['total']} done")
    if status["status"] != "processing":
        break
    time.sleep(2)

# Process results
for item in status["items"]:
    if item["status"] == "succeeded":
        print(f"✓ {item['url']}: {len(item['result'].get('text', ''))} chars")
    else:
        print(f"✗ {item['url']}: {item.get('error', 'unknown')}")

Node.js Example

import AlterLab from "alterlab";

const client = new AlterLab({ apiKey: "your_api_key" });

// Submit batch
const batch = await client.batchScrape({
  urls: [
    { url: "https://example.com/page-1" },
    { url: "https://example.com/page-2", mode: "js" },
    { url: "https://example.com/page-3", formats: ["markdown"] },
  ],
  webhookUrl: "https://your-server.com/webhook",
});

console.log(`Batch ID: ${batch.batchId}`);

// Poll until complete
let status;
do {
  await new Promise((r) => setTimeout(r, 2000));
  status = await client.getBatchStatus(batch.batchId);
  console.log(`${status.completed}/${status.total} done`);
} while (status.status === "processing");

// Process results
for (const item of status.items) {
  if (item.status === "succeeded") {
    console.log(`✓ ${item.url}`);
  } else {
    console.log(`✗ ${item.url}: ${item.error}`);
  }
}