Web Crawling API
Crawl entire websites with depth control, link discovery, and sitemap-aware traversal. AlterLab's crawling API handles anti-bot protection, JavaScript rendering, and proxy rotation automatically. Extract structured data from thousands of pages in a single job.
Simple API, Powerful Results
Get started in minutes with our intuitive API. One request gives you structured data, screenshots, PDFs, and more. No browser management, no infrastructure headaches.
Multi-Format Output
Markdown, JSON, HTML, text
Adaptive Rendering
JS, SPAs, shadow DOM
3 Lines to Integrate
Any language, any stack
Up to 5,000 free scrapes included. No credit card required.
How Web Crawling Works
Submit a seed URL — AlterLab handles link discovery, anti-bot protection, and structured data extraction automatically.
Seed URL Submission
Submit your starting URL along with depth limits, page count caps, and include/exclude patterns. AlterLab begins crawling immediately — no setup required. Sitemap.xml files are parsed automatically to discover all indexable URLs before the crawl begins.
Link Discovery & Queueing
Each scraped page is parsed for outbound links. Links matching your include patterns and within the depth limit are queued for crawling. AlterLab deduplicates URLs and respects robots.txt by default — override when needed for competitive intelligence use cases.
Automatic Website Compatibility Per Page
Each URL in the crawl queue is scraped through AlterLab's 5-tier pipeline. Basic pages use lightweight TLS fingerprinting at $0.0002/page. JavaScript-heavy pages automatically escalate to Playwright browser rendering. Anti-bot protection is handled automatically per page — no configuration needed.
Structured Results via Webhook
Results are delivered to your webhook when the crawl completes. Each page includes HTML, Markdown, extracted metadata, discovered links, and cost per page. Failed pages are retried automatically — you only pay for successful scrapes.
Built for Production Crawls
Enterprise-grade crawling with automatic anti-bot handling and structured output.
Depth Control
Set crawl depth from 1 to unlimited. Crawl single pages or full site hierarchies.
Sitemap-Aware
Automatically parses sitemap.xml for efficient, complete site coverage.
Automatic Website Compatibility
5-tier escalation handles complex access controls and website protections on every crawled page.
Include/Exclude Patterns
Glob patterns to scope crawls to specific sections — /blog/*, /products/*, etc.
Crawling Use Cases
From content indexing to competitive intelligence — web crawling at any scale.
Content Indexing
Crawl entire sites to build searchable indexes for internal search engines or AI knowledge bases
Competitive Monitoring
Track competitor sites for pricing changes, product launches, and content updates
Data Pipeline Feeds
Build automated crawl pipelines that feed structured data into databases and analytics platforms
SEO Auditing
Discover broken links, missing metadata, and crawl errors across entire site structures
Part of a Bigger Pipeline
Crawl is most powerful when combined with AlterLab's Search and Map APIs. Discover URLs first, then crawl only what you need.
Map → Crawl: Index an Entire Site
Use Map API ($0.001/call) to discover all URLs, then Crawl only the pages you need — no wasted scraping on irrelevant pages.
import alterlab
client = alterlab.Client(api_key="YOUR_KEY")
# Step 1: Discover all URLs on the site ($0.001 flat)
map_result = client.map(
"https://docs.example.com",
max_urls=2000,
include_patterns=["/docs/*"]
)
doc_urls = [u.url for u in map_result.urls if "/docs/" in u.url]
print(f"Found {len(doc_urls)} documentation pages")
# Step 2: Crawl them all for full content ($0.0002+/page)
crawl_result = client.crawl(
url="https://docs.example.com",
urls=doc_urls, # target only what Map found
formats=["markdown"],
max_pages=500
)
for page in crawl_result.pages:
print(page.url, len(page.markdown), "chars")Crawl Cost Estimator
See how much a full-site crawl costs with AlterLab's map-first approach vs competitors' blind crawl pricing.
AlterLab
1. Map the site
Discover all URLs first
2. Crawl 10K pages
$0.0003/page (Standard)
Map once, crawl selectively. Only pay for pages you actually need.
Same crawl on competitors
Firecrawl
$0.0063/page flat rate
ScrapingBee
$0.0033/page effective rate
Save 95% vs Firecrawl
$60.00 saved on 10K pages
Crawl Pricing Comparison
Feature-by-feature comparison for full-site crawling. AlterLab's map-first approach and tier routing deliver the lowest effective cost.
| Feature | AlterLab | Firecrawl | ScrapingBee | Apify |
|---|---|---|---|---|
Simple page crawl cost HTML-only pages, static content | $0.0002/page | $0.0063/page | $0.00066/page | ~$0.0025/page |
JS-rendered page crawl cost SPAs, dynamic content requiring browser | $0.004/page | $0.0063/page | $0.0033/page | ~$0.005/page |
Map-first recon Discover all URLs before crawling | $0.001/call | Not available | Not available | Not available |
Selective crawling Crawl only matched URL patterns | Included | URL filters only | Not available | Actor config |
Smart tier routing Auto-selects cheapest tier per page | 5-tier auto | Single tier | Manual | Manual |
Anti-bot handling (crawl) all major anti-bot protections during crawl | Included | Not available | Extra cost | Extra cost |
Pricing model How you pay for crawls | Pay per page | $19-99/month | $49-249/month | $49-499/month |
Minimum spend Lowest entry point for crawling | $10 one-time | $19/month | $49/month | $49/month |
Failed pages billed? Do you pay for pages that fail to scrape | Never | Counted as credits | Counted as credits | Compute time billed |
Depth control Limit crawl by link depth | Included | Included | Included | Actor config |
Sitemap-aware crawling Uses sitemap.xml for URL discovery | Included | Included | Not available | Actor-dependent |
AlterLab
Firecrawl
ScrapingBee
Apify
Prices sourced from public pricing pages as of April 2026. Apify costs are approximate (compute-time-based billing varies by actor configuration).
Web Crawling API FAQ
Crawling & Scraping Resources
Batch Scraping API
Submit up to 10,000 URLs at once with webhook delivery and auto-retries.
Anti-Bot Handling API
Automatic challenge handling on every crawled page — no extra configuration.
JavaScript Rendering API
Render SPAs and dynamic content with headless Chromium.
View Pricing
From $0.0002/page. No subscription. Pay only for what you scrape.
Your first scrape.
Sixty seconds.
$1 free balance. No credit card. No SDK.
Just a POST request.
No credit card required · Up to 5,000 free scrapes · Balance never expire