
Web Scraping API Pricing Compared: Cut Costs 90%
Compare web scraping API pricing models and learn how tiered architecture reduces costs by 90% while maintaining 99%+ success rates for production pipelines.
March 28, 2026
The Real Cost of Web Scraping at Scale
Most engineering teams overspend on web scraping by 5-10x because they use the same infrastructure for every request. Scraping a static HTML documentation page shouldn't cost the same as extracting data from a JavaScript-heavy e-commerce site with Cloudflare protection.
The solution: tiered scraping architecture. By matching request complexity to infrastructure level, teams routinely cut scraping costs by 80-90% while maintaining or improving success rates.
This post breaks down scraping API pricing models, shows how tiered systems work, and provides production-ready code for implementing cost-optimized scraping pipelines.
How Scraping API Pricing Actually Works
Scraping APIs charge based on infrastructure cost per request. Understanding these tiers is critical for cost optimization:
T1 — Basic HTTP Requests
- No JavaScript execution
- Standard headers and cookies
- Cost: ~$0.001-0.003 per request
- Use case: Static HTML, documentation sites, simple blogs
T2 — Enhanced HTTP
- Custom headers, cookies, user agents
- Basic anti-detection
- Cost: ~$0.003-0.005 per request
- Use case: Sites with basic bot detection
T3 — Headless Browser
- Full JavaScript execution (Playwright/Puppeteer)
- Browser fingerprint rotation
- Cost: ~$0.01-0.02 per request
- Use case: SPAs, dynamic content, infinite scroll
T4 — Advanced Anti-Bot
- All T3 features plus
- Advanced fingerprint spoofing
- Behavioral automation
- Cost: ~$0.02-0.04 per request
- Use case: Cloudflare, PerimeterX, DataDome
T5 — CAPTCHA Solving
- All T4 features plus
- Human CAPTCHA solving
- Cost: ~$0.05-0.10 per request
- Use case: Sites with hCaptcha, reCAPTCHA challenges
The cost difference between T1 and T5 is 50-100x. Using T5 for every request when 70% of your targets only need T1 is financial waste.
Pricing Model Comparison
Most scraping services use one of three pricing models. Here's how they compare for production workloads:
Flat Rate Plans charge a fixed monthly fee for a request quota. Simple to budget, but you pay the same rate regardless of target complexity. Often includes overage charges that spike unexpectedly.
Pay-Per-Success charges only for successful extractions. Transparent, but success rate definitions vary. A 95% success rate means you're paying for 5% failures indirectly through higher per-request pricing.
Tiered Usage (like AlterLab's pricing) charges based on infrastructure tier used. This is where significant savings happen—you control which tier each request uses, optimizing for cost per target.
For teams scraping 50+ different domains with varying complexity, tiered pricing typically costs 60-90% less than flat-rate alternatives.
Implementing Tiered Scraping in Production
The key to cost optimization is automatic tier escalation: start with the cheapest tier, escalate only when needed. Here's a production-ready implementation:
import alterlab
from typing import Optional
client = alterlab.Client(
api_key="YOUR_API_KEY",
auto_escalate=True # Auto-escalate on failure
)
def scrape_with_tier_optimization(url: str, min_tier: int = 1) -> dict:
"""
Scrape URL starting at minimum tier, escalate only if needed.
Reduces costs by 70-90% compared to always using T5.
"""
response = client.scrape(
url=url,
min_tier=min_tier, # Start at T1 for static sites
max_tier=5, # Escalate up to T5 if needed
formats=["json"]
)
return {
"url": url,
"tier_used": response.tier,
"cost": response.cost,
"success": response.success,
"data": response.data
}
# Example: Scrape 100 mixed-complexity sites
urls = [
"https://docs.python.org/3/library/", # T1 sufficient
"https://www.amazon.com/dp/B08N5WRWNW", # T4 required
"https://github.com/trending", # T2-3 needed
]
results = [scrape_with_tier_optimization(url) for url in urls]
total_cost = sum(r["cost"] for r in results)
print(f"Total cost: ${total_cost:.4f} for {len(results)} requests")The min_tier parameter is critical. Setting min_tier=1 tells the API to attempt T1 first, escalating only on failure. For known complex sites, set min_tier=4 to skip wasted T1-T3 attempts.
For JavaScript-heavy sites, use the Python SDK which handles tier selection automatically based on response analysis.
Cost Comparison: Before and After Tiered Architecture
Let's compare actual costs for a realistic scraping workload: 10,000 requests/month across mixed-complexity targets.
Scenario: E-commerce Price Monitoring
- 40% static product pages (T1 sufficient)
- 35% JavaScript-rendered prices (T3 required)
- 20% moderate anti-bot (T4 required)
- 5% CAPTCHA-protected (T5 required)
Flat Rate (Always T5):
10,000 requests × $0.045 (avg T5) = $450/monthTiered Architecture:
4,000 × $0.002 (T1) = $8.00
3,500 × $0.015 (T3) = $52.50
2,000 × $0.030 (T4) = $60.00
500 × $0.080 (T5) = $40.00
─────────────────────────────────
Total: $160.50/monthWith Auto-Escalation Optimization: Smart tier selection (starting low, escalating only on failure) typically reduces the T4/T5 portion by 40-50% because many sites that appear complex actually respond to simpler requests.
Optimized Total: ~$87/month (81% savings vs flat rate)The quickstart guide shows how to configure auto-escalation in under 5 minutes.
Node.js Implementation for High-Volume Pipelines
For teams running scraping jobs in Node.js environments, here's a production pattern with built-in cost tracking:
import { AlterLabClient } from '@alterlab/sdk';
const client = new AlterLabClient({
apiKey: process.env.ALTERLAB_API_KEY,
autoEscalate: true,
maxRetries: 3,
onTierEscalation: (from, to, url) => {
console.log(`Escalated T${from} → T${to} for ${url}`);
}
});
async function scrapeWithCostTracking(urls) {
const results = await Promise.all(
urls.map(async (url) => {
const response = await client.scrape(url, {
minTier: 1,
formats: ['json'],
timeout: 30000
});
return {
url,
tier: response.tier,
cost: response.cost,
success: response.success,
timestamp: new Date().toISOString()
};
})
);
const totalCost = results.reduce((sum, r) => sum + r.cost, 0);
const tierDistribution = results.reduce((acc, r) => {
acc[`T${r.tier}`] = (acc[`T${r.tier}`] || 0) + 1;
return acc;
}, {});
return {
results,
summary: {
totalRequests: results.length,
totalCost: totalCost.toFixed(4),
avgCostPerRequest: (totalCost / results.length).toFixed(6),
tierDistribution
}
};
}
// Usage
const urls = [
'https://example-shop.com/product/123',
'https://competitor-site.com/pricing',
];
scrapeWithCostTracking(urls).then(({ summary }) => {
console.log(`Cost: $${summary.totalCost} for ${summary.totalRequests} requests`);
console.log('Tier distribution:', summary.tierDistribution);
});This pattern gives you visibility into tier distribution—critical for identifying optimization opportunities. If 80% of requests escalate to T4+, your min_tier defaults may be too conservative.
When to Use Each Tier: Decision Framework
Use this decision tree to set appropriate min_tier values for your targets:
Quick Tier Selection Guide:
| Target Type | Recommended min_tier | Why |
|---|---|---|
| Documentation sites | 1 | Static HTML, no JS |
| News articles | 1-2 | Mostly static, some lazy load |
| E-commerce product pages | 3-4 | JS rendering, anti-bot common |
| Social media profiles | 4-5 | Heavy anti-bot, login walls |
| Government sites | 1-2 | Usually simple, occasional CAPTCHA |
| Job boards | 2-3 | Mix of static and dynamic |
| Real estate listings | 3-4 | Images, maps, dynamic pricing |
Test new targets with min_tier=1 first. Log the tier that succeeds, then set that as your baseline for future scrapes. The API reference documents all tier-specific parameters.
Monitoring and Alerting for Cost Optimization
Cost optimization requires visibility. Set up monitoring to catch tier escalation spikes:
import alterlab
from datetime import datetime, timedelta
client = alterlab.Client(api_key="YOUR_API_KEY")
def analyze_tier_distribution(hours: int = 24) -> dict:
"""Analyze tier distribution over time window."""
cutoff = datetime.now() - timedelta(hours=hours)
# Query your scrape logs (implementation depends on your storage)
scrapes = get_scrapes_since(cutoff)
tier_counts = {}
tier_costs = {}
for scrape in scrapes:
tier = f"T{scrape.tier}"
tier_counts[tier] = tier_counts.get(tier, 0) + 1
tier_costs[tier] = tier_costs.get(tier, 0) + scrape.cost
total_cost = sum(tier_costs.values())
return {
"period_hours": hours,
"total_requests": len(scrapes),
"total_cost": total_cost,
"tier_distribution": tier_counts,
"cost_by_tier": tier_costs,
"avg_cost_per_request": total_cost / len(scrapes) if scrapes else 0
}
# Alert if T5 usage exceeds 10%
def check_tier_alerts():
analysis = analyze_tier_distribution(hours=1)
t5_ratio = analysis["tier_distribution"].get("T5", 0) / analysis["total_requests"]
if t5_ratio > 0.10:
send_alert(f"T5 usage spike: {t5_ratio:.1%} in last hour")Set up alerts for:
- T5 usage > 10% of requests (indicates potential blocking)
- Average cost per request increasing > 20% week-over-week
- Success rate dropping below 95% for any tier
Common Cost Optimization Mistakes
Mistake 1: Always Using Headless Browsers Running every request through Playwright when 60% of targets are static HTML wastes 50-70% of your budget. Start with T1, escalate on failure.
Mistake 2: Not Caching Results Re-scraping unchanged pages burns budget. Implement ETag-based caching or use monitoring features that only return data when pages change.
Mistake 3: Ignoring Retry Logic Transient failures happen. Blind retries at the same tier waste money. Implement exponential backoff with tier escalation on repeated failures.
Mistake 4: No Target Classification
Treating all URLs the same ignores known patterns. Classify targets by domain, set appropriate min_tier per domain, and track success rates.
Takeaway
Tiered scraping architecture is the single most effective cost optimization for production scraping pipelines. Key points:
- Match tier to complexity — T1 for static sites, T5 only when necessary
- Auto-escalate on failure — Start cheap, escalate only when needed
- Monitor tier distribution — Alert on unusual T4/T5 spikes
- Cache aggressively — Don't re-scrape unchanged pages
- Classify targets — Set
min_tierper domain based on historical data
Teams implementing these practices typically see 70-90% cost reduction while maintaining 99%+ success rates. The FAQ covers common implementation questions.
For more technical deep-dives, check out the AlterLab blog for posts on anti-bot bypass strategies and large-scale data extraction patterns.
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.


