API Reference

Job Polling API

Poll async scrape jobs for status updates and retrieve results when complete. Every async scrape request returns a job ID that you can poll until the job succeeds or fails.

Base URL

https://api.alterlab.io/api/v1

When to Use Job Polling

Job polling is the simplest way to retrieve async scrape results. If you need real-time updates, consider Webhooks or WebSocket connections instead. See the comparison table below.

Endpoint Reference

GET

/api/v1/jobs/{job_id}

Retrieve the current status and result of an async scrape job. Returns the job object including status, result data (when complete), or error details (when failed).

Parameters

Name	Type	Required	Description
job_id	`string (UUID)`	Required	The job ID returned from the scrape endpoint (POST /api/v1/scrape with async mode)

Request Example

Bash

curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
  -H "X-API-Key: sk_live_..."

Response Example

JSON

{
  "status": "succeeded",
  "result": {
    "url": "https://example.com",
    "status_code": 200,
    "content": {
      "text": "Example Domain ...",
      "html": "<!doctype html>..."
    },
    "billing": {
      "total_credits": 1,
      "tier_used": "1"
    }
  },
  "error": null
}

Authentication

Requires the same API key used to create the scrape job. Pass it via the X-API-Key header. The endpoint verifies job ownership — you can only poll jobs created with your API key.

Rate Limiting

Job polling is exempt from the 30 req/min IP-based pre-auth rate limit. This allows you to poll frequently without hitting global limits. Per-key rate limits still apply (100-10,000 req/min depending on your plan).

Job Lifecycle

Every async scrape job moves through a sequence of statuses. The lifecycle is linear — a job cannot move backwards.

queued

Job created and waiting in the priority queue. Cost has been reserved.

processing

A worker has picked up the job and is actively scraping. May escalate through anti-bot tiers.

succeeded

Scrape completed successfully. The result field contains the scraped data.

failed

All scraping tiers exhausted or an unrecoverable error occurred. The error field has details.

Job Expiry

Job results are stored in Redis with a default TTL of 2 hours. After expiry, polling returns 404 Not Found. Retrieve your results promptly or use webhooks for guaranteed delivery.

Response Schemas

The response shape varies based on the job status. Here is the complete schema for each state.

Queued

JSON

{
  "status": "queued",
  "result": null,
  "error": null
}

Processing

JSON

{
  "status": "processing",
  "result": null,
  "error": null
}

Succeeded

JSON

{
  "status": "succeeded",
  "result": {
    "url": "https://example.com",
    "status_code": 200,
    "content": {
      "text": "Extracted text content...",
      "html": "<!doctype html>...",
      "markdown": "# Page Title\n...",
      "json": { "structured": "data" }
    },
    "billing": {
      "total_credits": 5,
      "tier_used": "4"
    },
    "metadata": {
      "title": "Page Title",
      "description": "Meta description",
      "language": "en"
    }
  },
  "error": null
}

The content fields depend on the formats parameter in your original scrape request. Only requested formats are returned.

Failed

JSON

{
  "status": "failed",
  "result": null,
  "error": "All scraping tiers exhausted. Target site blocked all attempts."
}

Polling Best Practices

Practice	Recommendation
Initial delay	Wait 1-2 seconds before the first poll. Simple pages complete in under 3 seconds.
Poll interval	Start at 2 seconds. Most jobs complete within 5-15 seconds.
Exponential backoff	Double the interval after each poll, up to a maximum of 10 seconds. Example: 2s, 4s, 8s, 10s, 10s...
Timeout	Set a maximum polling duration of 5 minutes. If the job hasn't completed by then, treat it as failed.
Terminal states	Stop polling immediately when status is `succeeded` or `failed`. These are final — the job will not change status again. Also stop on `HTTP 404` — the job has expired (TTL: 2 hours) or never existed. Continuing to poll a 404 wastes requests; resubmit the scrape instead.

Typical Job Duration

Tier 1 (simple HTTP): 1-3 seconds
Tier 2 (stealth HTTP): 2-5 seconds
Tier 3 (headless browser): 5-15 seconds
Tier 4 (anti-bot bypass): 10-30 seconds

Code Examples

Full polling loop examples with exponential backoff and proper error handling.

Bash

# 1. Submit an async scrape
JOB_ID=$(curl -s -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "async": true}' \
  | jq -r '.job_id')

echo "Job submitted: $JOB_ID"

# 2. Poll until complete (with backoff)
INTERVAL=2
MAX_INTERVAL=10
TIMEOUT=300
START=$(date +%s)

while true; do
  ELAPSED=$(( $(date +%s) - START ))
  if [ "$ELAPSED" -ge "$TIMEOUT" ]; then
    echo "Polling timeout after ${TIMEOUT}s"
    exit 1
  fi

  # Capture body and HTTP status code on separate lines
  RAW=$(curl -s -w "\n%{http_code}" https://api.alterlab.io/api/v1/jobs/$JOB_ID \
    -H "X-API-Key: sk_live_...")
  HTTP_STATUS=$(echo "$RAW" | tail -1)
  RESPONSE=$(echo "$RAW" | head -n -1)

  # 404 = job expired or not found — stop polling, resubmit
  if [ "$HTTP_STATUS" = "404" ]; then
    echo "Job expired or not found — resubmit the scrape"
    exit 1
  fi

  STATUS=$(echo "$RESPONSE" | jq -r '.status')
  echo "Status: $STATUS (${ELAPSED}s elapsed)"

  case "$STATUS" in
    succeeded)
      echo "$RESPONSE" | jq '.result'
      exit 0
      ;;
    failed)
      echo "Job failed: $(echo "$RESPONSE" | jq -r '.error')"
      exit 1
      ;;
  esac

  sleep $INTERVAL
  INTERVAL=$(( INTERVAL * 2 > MAX_INTERVAL ? MAX_INTERVAL : INTERVAL * 2 ))
done

Polling vs Webhooks vs WebSocket

AlterLab offers three ways to receive async job results. Choose based on your architecture and latency requirements.

Method	Latency	Complexity	Best For
Polling This page	2-10s (depends on interval)	Low	Scripts, CLIs, simple integrations, serverless functions
Webhooks	Near-instant (push)	Medium	Backend services, event-driven architectures, guaranteed delivery
WebSocket	Instant (real-time)	High	Dashboards, real-time UIs, monitoring multiple jobs simultaneously

Recommendation

Start with polling — it works everywhere and requires no infrastructure. Upgrade to webhooks if you need guaranteed delivery or want to avoid polling overhead. Use WebSocket only for real-time UIs that display live job progress.

Job Types

The same polling endpoint works for all async job types. The response format is identical regardless of how the job was created.

Job Source	Created By	Docs
Single scrape	`POST /api/v1/scrape` with `async: true`	REST API
Batch scrape	`POST /api/v1/batch` — each URL produces a separate job	Batch Guide
Scheduled scrape	Jobs created on a cron schedule via `POST /api/v1/schedules`	Scheduler Guide
Crawl job	`POST /api/v1/crawl` — spawns multiple child scrape jobs	REST API

Error Handling

HTTP Status	Meaning	Action
200 OK	Job found. Check the `status` field for current state.	Continue polling if not terminal, or process result.
401 Unauthorized	Missing or invalid API key.	Check your `X-API-Key` header.
404 Not Found	Job does not exist, has expired (TTL: 2 hours), or belongs to a different user.	Verify the job ID. If the job expired, re-submit the scrape.
429 Too Many Requests	Per-key rate limit exceeded.	Back off and retry. Consider increasing your poll interval.

Security Note

The endpoint returns 404 for both non-existent and unauthorized jobs. This prevents information disclosure — you cannot determine whether a job exists if it belongs to another user.

Handling Job-Level Errors

When the HTTP status is 200 but the job status is failed, the error field contains a description of what went wrong:

JSON

// Common error messages
"All scraping tiers exhausted. Target site blocked all attempts."
"Timeout after 30 seconds"
"DNS resolution failed for target URL"
"Target returned HTTP 403 Forbidden"

Job Polling API

Poll async scrape jobs for status updates and retrieve results when complete. Every async scrape request returns a job ID that you can poll until the job succeeds or fails.

Base URL

https://api.alterlab.io/api/v1

When to Use Job Polling

Job polling is the simplest way to retrieve async scrape results. If you need real-time updates, consider Webhooks or WebSocket connections instead. See the comparison table below.

Endpoint Reference

GET

/api/v1/jobs/{job_id}

Retrieve the current status and result of an async scrape job. Returns the job object including status, result data (when complete), or error details (when failed).

Parameters

Name	Type	Required	Description
job_id	`string (UUID)`	Required	The job ID returned from the scrape endpoint (POST /api/v1/scrape with async mode)

Request Example

Bash

curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
  -H "X-API-Key: sk_live_..."

Response Example

JSON

{
  "status": "succeeded",
  "result": {
    "url": "https://example.com",
    "status_code": 200,
    "content": {
      "text": "Example Domain ...",
      "html": "<!doctype html>..."
    },
    "billing": {
      "total_credits": 1,
      "tier_used": "1"
    }
  },
  "error": null
}

Authentication

Requires the same API key used to create the scrape job. Pass it via the X-API-Key header. The endpoint verifies job ownership — you can only poll jobs created with your API key.

Rate Limiting

Job Lifecycle

Every async scrape job moves through a sequence of statuses. The lifecycle is linear — a job cannot move backwards.

queued

Job created and waiting in the priority queue. Cost has been reserved.

processing

A worker has picked up the job and is actively scraping. May escalate through anti-bot tiers.

succeeded

Scrape completed successfully. The result field contains the scraped data.

failed

All scraping tiers exhausted or an unrecoverable error occurred. The error field has details.

Job Expiry

Job results are stored in Redis with a default TTL of 2 hours. After expiry, polling returns 404 Not Found. Retrieve your results promptly or use webhooks for guaranteed delivery.

Response Schemas

The response shape varies based on the job status. Here is the complete schema for each state.

Queued

JSON

{
  "status": "queued",
  "result": null,
  "error": null
}

Processing

JSON

{
  "status": "processing",
  "result": null,
  "error": null
}

Succeeded

JSON

{
  "status": "succeeded",
  "result": {
    "url": "https://example.com",
    "status_code": 200,
    "content": {
      "text": "Extracted text content...",
      "html": "<!doctype html>...",
      "markdown": "# Page Title\n...",
      "json": { "structured": "data" }
    },
    "billing": {
      "total_credits": 5,
      "tier_used": "4"
    },
    "metadata": {
      "title": "Page Title",
      "description": "Meta description",
      "language": "en"
    }
  },
  "error": null
}

The content fields depend on the formats parameter in your original scrape request. Only requested formats are returned.

Failed

JSON

{
  "status": "failed",
  "result": null,
  "error": "All scraping tiers exhausted. Target site blocked all attempts."
}

Polling Best Practices

Practice	Recommendation
Initial delay	Wait 1-2 seconds before the first poll. Simple pages complete in under 3 seconds.
Poll interval	Start at 2 seconds. Most jobs complete within 5-15 seconds.
Exponential backoff	Double the interval after each poll, up to a maximum of 10 seconds. Example: 2s, 4s, 8s, 10s, 10s...
Timeout	Set a maximum polling duration of 5 minutes. If the job hasn't completed by then, treat it as failed.
Terminal states	Stop polling immediately when status is `succeeded` or `failed`. These are final — the job will not change status again. Also stop on `HTTP 404` — the job has expired (TTL: 2 hours) or never existed. Continuing to poll a 404 wastes requests; resubmit the scrape instead.

Typical Job Duration

Tier 1 (simple HTTP): 1-3 seconds
Tier 2 (stealth HTTP): 2-5 seconds
Tier 3 (headless browser): 5-15 seconds
Tier 4 (anti-bot bypass): 10-30 seconds

Code Examples

Full polling loop examples with exponential backoff and proper error handling.

Bash

# 1. Submit an async scrape
JOB_ID=$(curl -s -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "async": true}' \
  | jq -r '.job_id')

echo "Job submitted: $JOB_ID"

# 2. Poll until complete (with backoff)
INTERVAL=2
MAX_INTERVAL=10
TIMEOUT=300
START=$(date +%s)

while true; do
  ELAPSED=$(( $(date +%s) - START ))
  if [ "$ELAPSED" -ge "$TIMEOUT" ]; then
    echo "Polling timeout after ${TIMEOUT}s"
    exit 1
  fi

  # Capture body and HTTP status code on separate lines
  RAW=$(curl -s -w "\n%{http_code}" https://api.alterlab.io/api/v1/jobs/$JOB_ID \
    -H "X-API-Key: sk_live_...")
  HTTP_STATUS=$(echo "$RAW" | tail -1)
  RESPONSE=$(echo "$RAW" | head -n -1)

  # 404 = job expired or not found — stop polling, resubmit
  if [ "$HTTP_STATUS" = "404" ]; then
    echo "Job expired or not found — resubmit the scrape"
    exit 1
  fi

  STATUS=$(echo "$RESPONSE" | jq -r '.status')
  echo "Status: $STATUS (${ELAPSED}s elapsed)"

  case "$STATUS" in
    succeeded)
      echo "$RESPONSE" | jq '.result'
      exit 0
      ;;
    failed)
      echo "Job failed: $(echo "$RESPONSE" | jq -r '.error')"
      exit 1
      ;;
  esac

  sleep $INTERVAL
  INTERVAL=$(( INTERVAL * 2 > MAX_INTERVAL ? MAX_INTERVAL : INTERVAL * 2 ))
done

Polling vs Webhooks vs WebSocket

AlterLab offers three ways to receive async job results. Choose based on your architecture and latency requirements.

Method	Latency	Complexity	Best For
Polling This page	2-10s (depends on interval)	Low	Scripts, CLIs, simple integrations, serverless functions
Webhooks	Near-instant (push)	Medium	Backend services, event-driven architectures, guaranteed delivery
WebSocket	Instant (real-time)	High	Dashboards, real-time UIs, monitoring multiple jobs simultaneously

Recommendation

Job Types

The same polling endpoint works for all async job types. The response format is identical regardless of how the job was created.

Job Source	Created By	Docs
Single scrape	`POST /api/v1/scrape` with `async: true`	REST API
Batch scrape	`POST /api/v1/batch` — each URL produces a separate job	Batch Guide
Scheduled scrape	Jobs created on a cron schedule via `POST /api/v1/schedules`	Scheduler Guide
Crawl job	`POST /api/v1/crawl` — spawns multiple child scrape jobs	REST API

Error Handling

HTTP Status	Meaning	Action
200 OK	Job found. Check the `status` field for current state.	Continue polling if not terminal, or process result.
401 Unauthorized	Missing or invalid API key.	Check your `X-API-Key` header.
404 Not Found	Job does not exist, has expired (TTL: 2 hours), or belongs to a different user.	Verify the job ID. If the job expired, re-submit the scrape.
429 Too Many Requests	Per-key rate limit exceeded.	Back off and retry. Consider increasing your poll interval.

Security Note

The endpoint returns 404 for both non-existent and unauthorized jobs. This prevents information disclosure — you cannot determine whether a job exists if it belongs to another user.

Handling Job-Level Errors

When the HTTP status is 200 but the job status is failed, the error field contains a description of what went wrong:

JSON

// Common error messages
"All scraping tiers exhausted. Target site blocked all attempts."
"Timeout after 30 seconds"
"DNS resolution failed for target URL"
"Target returned HTTP 403 Forbidden"

Job Polling API

Endpoint Reference

Parameters

Request Example

Response Example

Authentication

Rate Limiting

Job Lifecycle

Response Schemas

Queued

Processing

Succeeded

Failed

Polling Best Practices

Code Examples

Polling vs Webhooks vs WebSocket

Job Types

Error Handling

Handling Job-Level Errors

Related Documentation

Job Polling API

Endpoint Reference

Parameters

Request Example

Response Example

Authentication

Rate Limiting

Job Lifecycle

Response Schemas

Queued

Processing

Succeeded

Failed

Polling Best Practices

Code Examples

Polling vs Webhooks vs WebSocket

Job Types

Error Handling

Handling Job-Level Errors

Related Documentation