Caching
Reduce costs and improve performance by caching scrape results. Pay once, reuse many times.
Opt-In Caching
cache: true. This ensures you always get fresh data unless you specifically want cached results.How Caching Works
First Request
When you make a request with cache: true, we scrape the page, store the result, and return it. You're charged normally.
Subsequent Requests
Within the TTL window, same URL requests return cached data instantly. Free for cache hits.
Cache Expiry
After TTL expires, the next request fetches fresh data and refreshes the cache.
Cost Savings
Cache hits are completely free. If you're scraping the same pages repeatedly (e.g., monitoring, testing), caching can reduce your costs by 90%+.
Enabling Cache
Add cache: true to your request:
curl -X POST https://api.alterlab.io/api/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"cache": true
}'Cache TTL
Control how long results are cached with cache_ttl (in seconds):
| TTL Value | Duration | Use Case |
|---|---|---|
60 | 1 minute | Real-time data, stock prices |
900 | 15 minutes (default) | General scraping |
3600 | 1 hour | Product pages, articles |
86400 | 24 hours (max) | Static content, documentation |
# Cache for 1 hour
response = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"url": "https://example.com/blog/article",
"cache": True,
"cache_ttl": 3600 # 1 hour in seconds
}
)
# Cache for 24 hours (maximum)
response = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"url": "https://docs.example.com/api-reference",
"cache": True,
"cache_ttl": 86400 # 24 hours
}
)TTL Limits
- Minimum: 60 seconds
- Maximum: 86400 seconds (24 hours)
- Default: 900 seconds (15 minutes) when not specified
Force Refresh
Need fresh data but want to update the cache? Use force_refresh: true:
# Force a fresh scrape and update the cache
response = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"url": "https://example.com/products",
"cache": True,
"force_refresh": True # Bypass cache, fetch fresh, update cache
}
)
# This is charged normally (not a cache hit)
print(f"Credits used: {data['credits_used']}") # Normal pricingWhen to Use Force Refresh:
- You know the page content has changed
- User explicitly requests fresh data
- Debugging/testing cache behavior
Cache Response Headers
The response includes information about cache status:
{
"success": true,
"content": "...",
"cached": true, // true if served from cache
"cache_age": 342, // seconds since cached (if cached)
"cache_ttl": 900, // original TTL setting
"credits_used": 0, // $0 for cache hits
"timing": {
"total_ms": 12 // Much faster for cache hits
}
}Best Practices
1. Match TTL to Content Freshness
Set TTL based on how often the content changes. News sites need shorter TTLs than documentation.
2. Use Consistent URLs
Cache keys are based on the exact URL. example.com/page and example.com/page/ are different cache entries.
3. Cache Static Resources Aggressively
Documentation, help pages, and rarely-changing content can use 24-hour TTL.
4. Don't Cache Dynamic Content
Search results, personalized pages, and time-sensitive data should use cache: false.
Cache Invalidation
To invalidate a cached entry before TTL expires:
Option 1: Force Refresh
Use force_refresh: true to fetch fresh data and update the cache.
Option 2: Disable Cache
Set cache: false to bypass cache entirely (doesn't update cache).
Option 3: Wait for TTL
Cache automatically expires after TTL. No action needed.
No Manual Purge