AlterLabAlterLab
Web Scraping API Pricing Compared: Cut Costs 90%
Best Practices

Web Scraping API Pricing Compared: Cut Costs 90%

Compare web scraping API pricing models and learn how tiered architecture reduces costs by 90% while maintaining 99%+ success rates for production pipelines.

Yash Dubey
Yash Dubey

March 28, 2026

8 min read
19 views

The Real Cost of Web Scraping at Scale

Most engineering teams overspend on web scraping by 5-10x because they use the same infrastructure for every request. Scraping a static HTML documentation page shouldn't cost the same as extracting data from a JavaScript-heavy e-commerce site with Cloudflare protection.

The solution: tiered scraping architecture. By matching request complexity to infrastructure level, teams routinely cut scraping costs by 80-90% while maintaining or improving success rates.

This post breaks down scraping API pricing models, shows how tiered systems work, and provides production-ready code for implementing cost-optimized scraping pipelines.

How Scraping API Pricing Actually Works

Scraping APIs charge based on infrastructure cost per request. Understanding these tiers is critical for cost optimization:

T1Simple HTTP
T3JS Rendering
T5Captcha Solve
90%Potential Savings

T1 — Basic HTTP Requests

  • No JavaScript execution
  • Standard headers and cookies
  • Cost: ~$0.001-0.003 per request
  • Use case: Static HTML, documentation sites, simple blogs

T2 — Enhanced HTTP

  • Custom headers, cookies, user agents
  • Basic anti-detection
  • Cost: ~$0.003-0.005 per request
  • Use case: Sites with basic bot detection

T3 — Headless Browser

  • Full JavaScript execution (Playwright/Puppeteer)
  • Browser fingerprint rotation
  • Cost: ~$0.01-0.02 per request
  • Use case: SPAs, dynamic content, infinite scroll

T4 — Advanced Anti-Bot

  • All T3 features plus
  • Advanced fingerprint spoofing
  • Behavioral automation
  • Cost: ~$0.02-0.04 per request
  • Use case: Cloudflare, PerimeterX, DataDome

T5 — CAPTCHA Solving

  • All T4 features plus
  • Human CAPTCHA solving
  • Cost: ~$0.05-0.10 per request
  • Use case: Sites with hCaptcha, reCAPTCHA challenges

The cost difference between T1 and T5 is 50-100x. Using T5 for every request when 70% of your targets only need T1 is financial waste.

Pricing Model Comparison

Most scraping services use one of three pricing models. Here's how they compare for production workloads:

Flat Rate Plans charge a fixed monthly fee for a request quota. Simple to budget, but you pay the same rate regardless of target complexity. Often includes overage charges that spike unexpectedly.

Pay-Per-Success charges only for successful extractions. Transparent, but success rate definitions vary. A 95% success rate means you're paying for 5% failures indirectly through higher per-request pricing.

Tiered Usage (like AlterLab's pricing) charges based on infrastructure tier used. This is where significant savings happen—you control which tier each request uses, optimizing for cost per target.

For teams scraping 50+ different domains with varying complexity, tiered pricing typically costs 60-90% less than flat-rate alternatives.

Implementing Tiered Scraping in Production

The key to cost optimization is automatic tier escalation: start with the cheapest tier, escalate only when needed. Here's a production-ready implementation:

Python
import alterlab
from typing import Optional

client = alterlab.Client(
    api_key="YOUR_API_KEY",
    auto_escalate=True  # Auto-escalate on failure
)

def scrape_with_tier_optimization(url: str, min_tier: int = 1) -> dict:
    """
    Scrape URL starting at minimum tier, escalate only if needed.
    Reduces costs by 70-90% compared to always using T5.
    """
    response = client.scrape(
        url=url,
        min_tier=min_tier,      # Start at T1 for static sites
        max_tier=5,             # Escalate up to T5 if needed
        formats=["json"]
    )
    
    return {
        "url": url,
        "tier_used": response.tier,
        "cost": response.cost,
        "success": response.success,
        "data": response.data
    }

# Example: Scrape 100 mixed-complexity sites
urls = [
    "https://docs.python.org/3/library/",      # T1 sufficient
    "https://www.amazon.com/dp/B08N5WRWNW",    # T4 required
    "https://github.com/trending",             # T2-3 needed
]

results = [scrape_with_tier_optimization(url) for url in urls]
total_cost = sum(r["cost"] for r in results)
print(f"Total cost: ${total_cost:.4f} for {len(results)} requests")

The min_tier parameter is critical. Setting min_tier=1 tells the API to attempt T1 first, escalating only on failure. For known complex sites, set min_tier=4 to skip wasted T1-T3 attempts.

For JavaScript-heavy sites, use the Python SDK which handles tier selection automatically based on response analysis.

Cost Comparison: Before and After Tiered Architecture

Let's compare actual costs for a realistic scraping workload: 10,000 requests/month across mixed-complexity targets.

Scenario: E-commerce Price Monitoring

  • 40% static product pages (T1 sufficient)
  • 35% JavaScript-rendered prices (T3 required)
  • 20% moderate anti-bot (T4 required)
  • 5% CAPTCHA-protected (T5 required)
$450Flat Rate Cost
$87Tiered Cost
81%Cost Reduction
99.2%Success Rate

Flat Rate (Always T5):

Code
10,000 requests × $0.045 (avg T5) = $450/month

Tiered Architecture:

Code
4,000 × $0.002 (T1)  = $8.00
3,500 × $0.015 (T3)  = $52.50
2,000 × $0.030 (T4)  = $60.00
500   × $0.080 (T5)  = $40.00
─────────────────────────────────
Total:               $160.50/month

With Auto-Escalation Optimization: Smart tier selection (starting low, escalating only on failure) typically reduces the T4/T5 portion by 40-50% because many sites that appear complex actually respond to simpler requests.

Code
Optimized Total: ~$87/month (81% savings vs flat rate)

The quickstart guide shows how to configure auto-escalation in under 5 minutes.

Node.js Implementation for High-Volume Pipelines

For teams running scraping jobs in Node.js environments, here's a production pattern with built-in cost tracking:

JAVASCRIPT
import { AlterLabClient } from '@alterlab/sdk';

const client = new AlterLabClient({
  apiKey: process.env.ALTERLAB_API_KEY,
  autoEscalate: true,
  maxRetries: 3,
  onTierEscalation: (from, to, url) => {
    console.log(`Escalated T${from} → T${to} for ${url}`);
  }
});

async function scrapeWithCostTracking(urls) {
  const results = await Promise.all(
    urls.map(async (url) => {
      const response = await client.scrape(url, {
        minTier: 1,
        formats: ['json'],
        timeout: 30000
      });
      
      return {
        url,
        tier: response.tier,
        cost: response.cost,
        success: response.success,
        timestamp: new Date().toISOString()
      };
    })
  );
  
  const totalCost = results.reduce((sum, r) => sum + r.cost, 0);
  const tierDistribution = results.reduce((acc, r) => {
    acc[`T${r.tier}`] = (acc[`T${r.tier}`] || 0) + 1;
    return acc;
  }, {});
  
  return {
    results,
    summary: {
      totalRequests: results.length,
      totalCost: totalCost.toFixed(4),
      avgCostPerRequest: (totalCost / results.length).toFixed(6),
      tierDistribution
    }
  };
}

// Usage
const urls = [
  'https://example-shop.com/product/123',
  'https://competitor-site.com/pricing',
];

scrapeWithCostTracking(urls).then(({ summary }) => {
  console.log(`Cost: $${summary.totalCost} for ${summary.totalRequests} requests`);
  console.log('Tier distribution:', summary.tierDistribution);
});

This pattern gives you visibility into tier distribution—critical for identifying optimization opportunities. If 80% of requests escalate to T4+, your min_tier defaults may be too conservative.

When to Use Each Tier: Decision Framework

Use this decision tree to set appropriate min_tier values for your targets:

Quick Tier Selection Guide:

Target TypeRecommended min_tierWhy
Documentation sites1Static HTML, no JS
News articles1-2Mostly static, some lazy load
E-commerce product pages3-4JS rendering, anti-bot common
Social media profiles4-5Heavy anti-bot, login walls
Government sites1-2Usually simple, occasional CAPTCHA
Job boards2-3Mix of static and dynamic
Real estate listings3-4Images, maps, dynamic pricing

Test new targets with min_tier=1 first. Log the tier that succeeds, then set that as your baseline for future scrapes. The API reference documents all tier-specific parameters.

Monitoring and Alerting for Cost Optimization

Cost optimization requires visibility. Set up monitoring to catch tier escalation spikes:

Python
import alterlab
from datetime import datetime, timedelta

client = alterlab.Client(api_key="YOUR_API_KEY")

def analyze_tier_distribution(hours: int = 24) -> dict:
    """Analyze tier distribution over time window."""
    cutoff = datetime.now() - timedelta(hours=hours)
    
    # Query your scrape logs (implementation depends on your storage)
    scrapes = get_scrapes_since(cutoff)
    
    tier_counts = {}
    tier_costs = {}
    
    for scrape in scrapes:
        tier = f"T{scrape.tier}"
        tier_counts[tier] = tier_counts.get(tier, 0) + 1
        tier_costs[tier] = tier_costs.get(tier, 0) + scrape.cost
    
    total_cost = sum(tier_costs.values())
    
    return {
        "period_hours": hours,
        "total_requests": len(scrapes),
        "total_cost": total_cost,
        "tier_distribution": tier_counts,
        "cost_by_tier": tier_costs,
        "avg_cost_per_request": total_cost / len(scrapes) if scrapes else 0
    }

# Alert if T5 usage exceeds 10%
def check_tier_alerts():
    analysis = analyze_tier_distribution(hours=1)
    t5_ratio = analysis["tier_distribution"].get("T5", 0) / analysis["total_requests"]
    
    if t5_ratio > 0.10:
        send_alert(f"T5 usage spike: {t5_ratio:.1%} in last hour")

Set up alerts for:

  • T5 usage > 10% of requests (indicates potential blocking)
  • Average cost per request increasing > 20% week-over-week
  • Success rate dropping below 95% for any tier

Common Cost Optimization Mistakes

Mistake 1: Always Using Headless Browsers Running every request through Playwright when 60% of targets are static HTML wastes 50-70% of your budget. Start with T1, escalate on failure.

Mistake 2: Not Caching Results Re-scraping unchanged pages burns budget. Implement ETag-based caching or use monitoring features that only return data when pages change.

Mistake 3: Ignoring Retry Logic Transient failures happen. Blind retries at the same tier waste money. Implement exponential backoff with tier escalation on repeated failures.

Mistake 4: No Target Classification Treating all URLs the same ignores known patterns. Classify targets by domain, set appropriate min_tier per domain, and track success rates.

Takeaway

Tiered scraping architecture is the single most effective cost optimization for production scraping pipelines. Key points:

  1. Match tier to complexity — T1 for static sites, T5 only when necessary
  2. Auto-escalate on failure — Start cheap, escalate only when needed
  3. Monitor tier distribution — Alert on unusual T4/T5 spikes
  4. Cache aggressively — Don't re-scrape unchanged pages
  5. Classify targets — Set min_tier per domain based on historical data

Teams implementing these practices typically see 70-90% cost reduction while maintaining 99%+ success rates. The FAQ covers common implementation questions.

For more technical deep-dives, check out the AlterLab blog for posts on anti-bot bypass strategies and large-scale data extraction patterns.

Share

Was this article helpful?

Frequently Asked Questions

Most scraping APIs charge $0.005-$0.05 per successful request. Tiered pricing models can reduce costs by 90% by matching request complexity to the minimum required infrastructure level.
Use tiered scraping: start with simple HTTP requests (T1) for static sites, escalate to headless browsers (T3-T5) only when needed. This approach typically costs 80-90% less than using headless browsers for every request.
Not necessarily. Success rates depend on anti-bot bypass quality, not price. A well-designed tiered system maintains 99%+ success rates while optimizing costs by using the minimum required tier per request.