Job Polling API
Poll async scrape jobs for status updates and retrieve results when complete. Every async scrape request returns a job ID that you can poll until the job succeeds or fails.
Base URL
https://api.alterlab.io/api/v1When to Use Job Polling
Endpoint Reference
/api/v1/jobs/{job_id}Retrieve the current status and result of an async scrape job. Returns the job object including status, result data (when complete), or error details (when failed).
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| job_id | string (UUID) | Required | The job ID returned from the scrape endpoint (POST /api/v1/scrape with async mode) |
Request Example
curl -X GET https://api.alterlab.io/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000 \
-H "X-API-Key: sk_live_..."Response Example
{
"status": "succeeded",
"result": {
"url": "https://example.com",
"status_code": 200,
"content": {
"text": "Example Domain ...",
"html": "<!doctype html>..."
},
"billing": {
"total_credits": 1,
"tier_used": "1"
}
},
"error": null
}Authentication
Requires the same API key used to create the scrape job. Pass it via the X-API-Key header. The endpoint verifies job ownership — you can only poll jobs created with your API key.
Rate Limiting
Job polling is exempt from the 30 req/min IP-based pre-auth rate limit. This allows you to poll frequently without hitting global limits. Per-key rate limits still apply (100-10,000 req/min depending on your plan).
Job Lifecycle
Every async scrape job moves through a sequence of statuses. The lifecycle is linear — a job cannot move backwards.
queuedJob created and waiting in the priority queue. Cost has been reserved.
processingA worker has picked up the job and is actively scraping. May escalate through anti-bot tiers.
succeededScrape completed successfully. The result field contains the scraped data.
failedAll scraping tiers exhausted or an unrecoverable error occurred. The error field has details.
Job Expiry
404 Not Found. Retrieve your results promptly or use webhooks for guaranteed delivery.Response Schemas
The response shape varies based on the job status. Here is the complete schema for each state.
Queued
{
"status": "queued",
"result": null,
"error": null
}Processing
{
"status": "processing",
"result": null,
"error": null
}Succeeded
{
"status": "succeeded",
"result": {
"url": "https://example.com",
"status_code": 200,
"content": {
"text": "Extracted text content...",
"html": "<!doctype html>...",
"markdown": "# Page Title\n...",
"json": { "structured": "data" }
},
"billing": {
"total_credits": 5,
"tier_used": "4"
},
"metadata": {
"title": "Page Title",
"description": "Meta description",
"language": "en"
}
},
"error": null
}The content fields depend on the formats parameter in your original scrape request. Only requested formats are returned.
Failed
{
"status": "failed",
"result": null,
"error": "All scraping tiers exhausted. Target site blocked all attempts."
}Polling Best Practices
| Practice | Recommendation |
|---|---|
| Initial delay | Wait 1-2 seconds before the first poll. Simple pages complete in under 3 seconds. |
| Poll interval | Start at 2 seconds. Most jobs complete within 5-15 seconds. |
| Exponential backoff | Double the interval after each poll, up to a maximum of 10 seconds. Example: 2s, 4s, 8s, 10s, 10s... |
| Timeout | Set a maximum polling duration of 5 minutes. If the job hasn't completed by then, treat it as failed. |
| Terminal states | Stop polling immediately when status is succeeded or failed. These are final — the job will not change status again. |
Typical Job Duration
- Tier 1 (simple HTTP): 1-3 seconds
- Tier 2 (stealth HTTP): 2-5 seconds
- Tier 3 (headless browser): 5-15 seconds
- Tier 4 (anti-bot bypass): 10-30 seconds
Code Examples
Full polling loop examples with exponential backoff and proper error handling.
# 1. Submit an async scrape
JOB_ID=$(curl -s -X POST https://api.alterlab.io/api/v1/scrape \
-H "X-API-Key: sk_live_..." \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "async": true}' \
| jq -r '.job_id')
echo "Job submitted: $JOB_ID"
# 2. Poll until complete (with backoff)
INTERVAL=2
MAX_INTERVAL=10
TIMEOUT=300
START=$(date +%s)
while true; do
ELAPSED=$(( $(date +%s) - START ))
if [ "$ELAPSED" -ge "$TIMEOUT" ]; then
echo "Polling timeout after ${TIMEOUT}s"
exit 1
fi
RESPONSE=$(curl -s https://api.alterlab.io/api/v1/jobs/$JOB_ID \
-H "X-API-Key: sk_live_...")
STATUS=$(echo "$RESPONSE" | jq -r '.status')
echo "Status: $STATUS (${ELAPSED}s elapsed)"
case "$STATUS" in
succeeded)
echo "$RESPONSE" | jq '.result'
exit 0
;;
failed)
echo "Job failed: $(echo "$RESPONSE" | jq -r '.error')"
exit 1
;;
esac
sleep $INTERVAL
INTERVAL=$(( INTERVAL * 2 > MAX_INTERVAL ? MAX_INTERVAL : INTERVAL * 2 ))
donePolling vs Webhooks vs WebSocket
AlterLab offers three ways to receive async job results. Choose based on your architecture and latency requirements.
| Method | Latency | Complexity | Best For |
|---|---|---|---|
| Polling This page | 2-10s (depends on interval) | Low | Scripts, CLIs, simple integrations, serverless functions |
| Webhooks | Near-instant (push) | Medium | Backend services, event-driven architectures, guaranteed delivery |
| WebSocket | Instant (real-time) | High | Dashboards, real-time UIs, monitoring multiple jobs simultaneously |
Recommendation
Job Types
The same polling endpoint works for all async job types. The response format is identical regardless of how the job was created.
| Job Source | Created By | Docs |
|---|---|---|
| Single scrape | POST /api/v1/scrape with async: true | REST API |
| Batch scrape | POST /api/v1/batch — each URL produces a separate job | Batch Guide |
| Scheduled scrape | Jobs created on a cron schedule via POST /api/v1/schedules | Scheduler Guide |
| Crawl job | POST /api/v1/crawl — spawns multiple child scrape jobs | REST API |
Error Handling
| HTTP Status | Meaning | Action |
|---|---|---|
| 200 OK | Job found. Check the status field for current state. | Continue polling if not terminal, or process result. |
| 401 Unauthorized | Missing or invalid API key. | Check your X-API-Key header. |
| 404 Not Found | Job does not exist, has expired (TTL: 1 hour), or belongs to a different user. | Verify the job ID. If the job expired, re-submit the scrape. |
| 429 Too Many Requests | Per-key rate limit exceeded. | Back off and retry. Consider increasing your poll interval. |
Security Note
404 for both non-existent and unauthorized jobs. This prevents information disclosure — you cannot determine whether a job exists if it belongs to another user.Handling Job-Level Errors
When the HTTP status is 200 but the job status is failed, the error field contains a description of what went wrong:
// Common error messages
"All scraping tiers exhausted. Target site blocked all attempts."
"Timeout after 30 seconds"
"DNS resolution failed for target URL"
"Target returned HTTP 403 Forbidden"