Guide

Caching

Reduce costs and improve performance by caching scrape results. Pay once, reuse many times.

Opt-In Caching

Caching is disabled by default. You must explicitly enable it with cache: true. This ensures you always get fresh data unless you specifically want cached results.

How Caching Works

First Request

When you make a request with cache: true, we scrape the page, store the result, and return it. You're charged normally.

Subsequent Requests

Within the TTL window, same URL requests return cached data instantly. Free for cache hits.

Cache Expiry

After TTL expires, the next request fetches fresh data and refreshes the cache.

Cost Savings

Cache hits are completely free. If you're scraping the same pages repeatedly (e.g., monitoring, testing), caching can reduce your costs by 90%+.

Enabling Cache

Add cache: true to your request:

Bash

curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "cache": true
  }'

Cache TTL

Control how long results are cached with cache_ttl (in seconds):

TTL Value	Duration	Use Case
`60`	1 minute	Real-time data, stock prices
`900`	15 minutes (default)	General scraping
`3600`	1 hour	Product pages, articles
`86400`	24 hours (max)	Static content, documentation

Python

# Cache for 1 hour
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "url": "https://example.com/blog/article",
        "cache": True,
        "cache_ttl": 3600  # 1 hour in seconds
    }
)

# Cache for 24 hours (maximum)
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "url": "https://docs.example.com/api-reference",
        "cache": True,
        "cache_ttl": 86400  # 24 hours
    }
)

TTL Limits

Minimum: 60 seconds
Maximum: 86400 seconds (24 hours)
Default: 900 seconds (15 minutes) when not specified

Cache Key Strategy

Cache keys are generated from the combination of your API key, the target URL, and scraping options. This means:

Same URL + same options = cache hit (free)

Same URL + different options (e.g., different extraction_profile) = separate cache entry

Different API keys = separate cache entries (isolated per user)

Force Refresh

Need fresh data but want to update the cache? Use force_refresh: true:

Python

# Force a fresh scrape and update the cache
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "url": "https://example.com/products",
        "cache": True,
        "force_refresh": True  # Bypass cache, fetch fresh, update cache
    }
)

# This is charged normally (not a cache hit)
print(f"Cost: {data['credits_used']}")  # Normal pricing

When to Use Force Refresh:

You know the page content has changed
User explicitly requests fresh data
Debugging/testing cache behavior

Cache Response Headers

The response includes information about cache status:

Response

JSON

{
  "content": "...",
  "cached": true,                          // true if served from cache
  "cached_at": "2026-03-24T10:30:00Z",    // when result was cached
  "expires_at": "2026-03-24T10:45:00Z",   // when cache entry expires
  "stale_cache": false,                    // true if stale cache served after scrape failure
  "response_time_ms": 12,                 // much faster for cache hits
  "billing": {
    "total_credits": 0,                   // $0 for cache hits
    "tier_used": "cache"
  }
}

Best Practices

1. Match TTL to Content Freshness

Set TTL based on how often the content changes. News sites need shorter TTLs than documentation.

2. Use Consistent URLs

Cache keys are based on the exact URL. example.com/page and example.com/page/ are different cache entries.

3. Cache Static Resources Aggressively

Documentation, help pages, and rarely-changing content can use 24-hour TTL.

4. Don't Cache Dynamic Content

Search results, personalized pages, and time-sensitive data should use cache: false.

Cache Invalidation

To invalidate a cached entry before TTL expires:

Option 1: Force Refresh

Use force_refresh: true to fetch fresh data and update the cache.

Option 2: Disable Cache

Set cache: false to bypass cache entirely (doesn't update cache).

Option 3: Wait for TTL

Cache automatically expires after TTL. No action needed.

No Manual Purge

Currently, there's no API to manually purge specific cache entries. Use force_refresh if you need to update a cached page.

PDF & OCR Webhooks

Last updated: March 2026

Guide

Caching

Reduce costs and improve performance by caching scrape results. Pay once, reuse many times.

Opt-In Caching

Caching is disabled by default. You must explicitly enable it with cache: true. This ensures you always get fresh data unless you specifically want cached results.

How Caching Works

First Request

When you make a request with cache: true, we scrape the page, store the result, and return it. You're charged normally.

Subsequent Requests

Within the TTL window, same URL requests return cached data instantly. Free for cache hits.

Cache Expiry

After TTL expires, the next request fetches fresh data and refreshes the cache.

Cost Savings

Cache hits are completely free. If you're scraping the same pages repeatedly (e.g., monitoring, testing), caching can reduce your costs by 90%+.

Enabling Cache

Add cache: true to your request:

Bash

curl -X POST https://api.alterlab.io/api/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "cache": true
  }'

Cache TTL

Control how long results are cached with cache_ttl (in seconds):

TTL Value	Duration	Use Case
`60`	1 minute	Real-time data, stock prices
`900`	15 minutes (default)	General scraping
`3600`	1 hour	Product pages, articles
`86400`	24 hours (max)	Static content, documentation

Python

# Cache for 1 hour
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "url": "https://example.com/blog/article",
        "cache": True,
        "cache_ttl": 3600  # 1 hour in seconds
    }
)

# Cache for 24 hours (maximum)
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "url": "https://docs.example.com/api-reference",
        "cache": True,
        "cache_ttl": 86400  # 24 hours
    }
)

TTL Limits

Minimum: 60 seconds
Maximum: 86400 seconds (24 hours)
Default: 900 seconds (15 minutes) when not specified

Cache Key Strategy

Cache keys are generated from the combination of your API key, the target URL, and scraping options. This means:

Same URL + same options = cache hit (free)

Same URL + different options (e.g., different extraction_profile) = separate cache entry

Different API keys = separate cache entries (isolated per user)

Force Refresh

Need fresh data but want to update the cache? Use force_refresh: true:

Python

# Force a fresh scrape and update the cache
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "url": "https://example.com/products",
        "cache": True,
        "force_refresh": True  # Bypass cache, fetch fresh, update cache
    }
)

# This is charged normally (not a cache hit)
print(f"Cost: {data['credits_used']}")  # Normal pricing

When to Use Force Refresh:

You know the page content has changed
User explicitly requests fresh data
Debugging/testing cache behavior

Cache Response Headers

The response includes information about cache status:

Response

JSON

{
  "content": "...",
  "cached": true,                          // true if served from cache
  "cached_at": "2026-03-24T10:30:00Z",    // when result was cached
  "expires_at": "2026-03-24T10:45:00Z",   // when cache entry expires
  "stale_cache": false,                    // true if stale cache served after scrape failure
  "response_time_ms": 12,                 // much faster for cache hits
  "billing": {
    "total_credits": 0,                   // $0 for cache hits
    "tier_used": "cache"
  }
}

Best Practices

1. Match TTL to Content Freshness

Set TTL based on how often the content changes. News sites need shorter TTLs than documentation.

2. Use Consistent URLs

Cache keys are based on the exact URL. example.com/page and example.com/page/ are different cache entries.

3. Cache Static Resources Aggressively

Documentation, help pages, and rarely-changing content can use 24-hour TTL.

4. Don't Cache Dynamic Content

Search results, personalized pages, and time-sensitive data should use cache: false.

Cache Invalidation

To invalidate a cached entry before TTL expires:

Option 1: Force Refresh

Use force_refresh: true to fetch fresh data and update the cache.

Option 2: Disable Cache

Set cache: false to bypass cache entirely (doesn't update cache).

Option 3: Wait for TTL

Cache automatically expires after TTL. No action needed.

No Manual Purge

Currently, there's no API to manually purge specific cache entries. Use force_refresh if you need to update a cached page.

PDF & OCR Webhooks

Last updated: March 2026