Crawl4AI to AlterLab
Crawl4AI is an open-source Python library for LLM-friendly crawling. AlterLab provides the same LLM-ready output as a managed API, so you skip the infrastructure, proxy management, and browser maintenance.
Why Switch
Self-Hosted vs Managed
Crawl4AI Limitations
- Self-hosted: you manage browsers, proxies, and scaling
- Proxy rotation requires external providers and configuration
- No built-in anti-bot bypass beyond basic browser automation
- Scaling requires Docker/Kubernetes infrastructure
- No built-in scheduling, webhooks, or batch processing
AlterLab Advantages
- Fully managed: no servers, browsers, or proxies to maintain
- Built-in proxy rotation with residential and mobile proxies
- 5-tier anti-bot escalation with fingerprinting and CAPTCHA solving
- Auto-scales to handle any volume
- Scheduling, webhooks, batch scraping, and search built in
Concept Mapping
Crawl4AI concepts map to AlterLab API parameters:
| Crawl4AI | AlterLab | Notes |
|---|---|---|
AsyncWebCrawler | POST /v1/scrape | HTTP request replaces Python class instantiation |
BFSDeepCrawlStrategy | POST /v1/crawl with max_depth | BFS crawling with depth control |
BestFirstCrawlingStrategy | Content-aware prioritization | AlterLab prioritizes pages by relevance |
FilterChain | include_patterns / exclude_patterns | Regex patterns to include/exclude URLs |
LLMExtractionStrategy | extraction_schema + extraction_profile | JSON Schema extraction with optional LLM |
JsonCssExtractionStrategy | extraction_schema with CSS selectors | Use selector hints in your JSON Schema |
result.markdown | content.markdown | Clean markdown output in both |
browser_config | Automatic | No browser config needed; AlterLab manages browser infrastructure |
| Proxy providers (manual) | Built-in (or BYOP) | Proxies included in tier pricing; BYOP for discount |
Code Migration
Library to API
Before (Crawl4AI)
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy
async def main():
browser_config = BrowserConfig(
headless=True,
extra_args=["--disable-gpu", "--no-sandbox"],
)
extraction = LLMExtractionStrategy(
provider="openai/gpt-4o-mini",
schema={
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "string"},
}
}
)
run_config = CrawlerRunConfig(
extraction_strategy=extraction,
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://example.com/product",
config=run_config,
)
print(result.markdown[:200])
print(result.extracted_content)
asyncio.run(main())After (AlterLab)
import requests
response = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={
"X-API-Key": "sk_live_xxx",
"Content-Type": "application/json",
},
json={
"url": "https://example.com/product",
"formats": ["markdown"],
"extraction_schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "string"},
}
},
}
)
data = response.json()
print(data["content"]["markdown"][:200])
print(data["content"]["extracted"])Deep Crawl Migration
Before (Crawl4AI Deep Crawl)
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
from crawl4ai.content_filter import PruningContentFilter
strategy = BFSDeepCrawlStrategy(
max_depth=3,
include_external=False,
url_filter=lambda url: "/blog/" in url,
)
config = CrawlerRunConfig(
deep_crawl_strategy=strategy,
content_filter=PruningContentFilter(threshold=0.5),
)
async with AsyncWebCrawler(config=BrowserConfig()) as crawler:
results = await crawler.arun("https://example.com", config=config)
for result in results:
print(result.url, result.markdown[:100])After (AlterLab Crawl)
import requests
response = requests.post(
"https://api.alterlab.io/api/v1/crawl",
headers={
"X-API-Key": "sk_live_xxx",
"Content-Type": "application/json",
},
json={
"url": "https://example.com",
"max_depth": 3,
"include_patterns": ["/blog/"],
"formats": ["markdown"],
}
)
data = response.json()
for page in data["pages"]:
print(page["url"], page["content"]["markdown"][:100])import requests
# No Playwright, no browser, no async — just an HTTP request
response = requests.post(
"https://api.alterlab.io/api/v1/scrape",
headers={
"X-API-Key": "sk_live_xxx",
"Content-Type": "application/json",
},
json={
"url": "https://example.com",
"formats": ["markdown", "html"],
}
)
data = response.json()
if data.get("success"):
print(f"Markdown: {data['content']['markdown'][:200]}...")
print(f"Tier: {data['tier_used']}, Cost: {data['credits_charged']}")Feature Comparison
| Feature | Crawl4AI | AlterLab |
|---|---|---|
| Deployment | Self-hosted | Managed API |
| Language support | Python only | Any (REST API) |
| Deep crawling | Yes | Yes |
| LLM extraction | BYO provider | Built-in (no LLM key needed) |
| CSS extraction | Yes | Yes |
| Anti-bot bypass | Basic (browser only) | 5-tier escalation |
| Proxy rotation | Manual setup | Built-in |
| Markdown output | Yes | Yes |
| URL filtering | Yes | Yes |
| Scheduling | DIY (cron/celery) | Built-in |
| Webhooks | No | Yes |
| Batch processing | DIY (async loops) | Batch API |
| Cost | Free (+ infra costs) | Pay per request |
Pricing Comparison
Free vs Managed
| Cost Component | Crawl4AI | AlterLab |
|---|---|---|
| Software license | Free (Apache 2.0) | Pay per request |
| Server hosting | $20-100+/mo | Included |
| Proxy rotation | $50-500+/mo (residential) | Included |
| Browser infra | RAM/CPU costs | Included |
| LLM API calls | Your provider, your cost | Included (Cortex) |
| Simple HTML scrape | Infra cost only | $0.0002/req (Tier 1) |
| JS-rendered scrape | Infra cost only | $0.002/req (Tier 3) |
For small-scale scraping (<10K pages/month), self-hosting Crawl4AI can be cheaper. At scale, AlterLab's per-request pricing eliminates infrastructure management overhead and proxy costs that grow with volume.
Common Gotchas
1. No async context manager needed
Crawl4AI requires async with AsyncWebCrawler(). AlterLab is a stateless HTTP call with no session management. You can use synchronous HTTP clients like requests.
2. No browser config
Crawl4AI needs BrowserConfig for headless settings, viewport, etc. AlterLab handles all browser configuration automatically. Browser rendering activates when the target site requires it.
3. LLM extraction does not require your own API key
Crawl4AI's LLMExtractionStrategy needs your OpenAI/Anthropic key. AlterLab uses its own LLM infrastructure (Cortex) at no extra cost for extraction. Just provide extraction_schema.
4. Filter chains become URL patterns
Crawl4AI filter chains with FilterChain and PruningContentFilter become include_patterns and exclude_patterns (regex arrays) in the crawl request body.
5. Response format is JSON, not Python objects
Crawl4AI returns CrawlResult objects with .markdown, .html, .extracted_content. AlterLab returns JSON with content.markdown, content.html, content.extracted.