Guide

Migration

Crawl4AI to AlterLab

Crawl4AI is an open-source Python library for LLM-friendly crawling. AlterLab provides the same LLM-ready output as a managed API, so you skip the infrastructure, proxy management, and browser maintenance.

Why Switch

Self-Hosted vs Managed

Crawl4AI is a library you run yourself. AlterLab is a hosted API. If you value zero infrastructure and built-in proxy rotation over self-hosting control, AlterLab is the better fit. You can also use both together: Crawl4AI for local development and AlterLab for production.

Crawl4AI Limitations

Self-hosted: you manage browsers, proxies, and scaling
Proxy rotation requires external providers and configuration
No built-in anti-bot bypass beyond basic browser automation
Scaling requires Docker/Kubernetes infrastructure
No built-in scheduling, webhooks, or batch processing

AlterLab Advantages

Fully managed: no servers, browsers, or proxies to maintain
Built-in proxy rotation with residential and mobile proxies
5-tier anti-bot escalation with fingerprinting and CAPTCHA solving
Auto-scales to handle any volume
Scheduling, webhooks, batch scraping, and search built in

Concept Mapping

Crawl4AI concepts map to AlterLab API parameters:

Crawl4AI	AlterLab	Notes
`AsyncWebCrawler`	`POST /v1/scrape`	HTTP request replaces Python class instantiation
`BFSDeepCrawlStrategy`	`POST /v1/crawl` with `max_depth`	BFS crawling with depth control
`BestFirstCrawlingStrategy`	Content-aware prioritization	AlterLab prioritizes pages by relevance
`FilterChain`	`include_patterns` / `exclude_patterns`	Regex patterns to include/exclude URLs
`LLMExtractionStrategy`	`extraction_schema` + `extraction_profile`	JSON Schema extraction with optional LLM
`JsonCssExtractionStrategy`	`extraction_schema` with CSS selectors	Use selector hints in your JSON Schema
`result.markdown`	`content.markdown`	Clean markdown output in both
`browser_config`	Automatic	No browser config needed; AlterLab manages browser infrastructure
Proxy providers (manual)	Built-in (or BYOP)	Proxies included in tier pricing; BYOP for discount

Code Migration

Library to API

Crawl4AI requires installing Playwright, managing browser instances, and writing Python code. AlterLab is a single HTTP call from any language.

Before (Crawl4AI)

Python

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy

async def main():
    browser_config = BrowserConfig(
        headless=True,
        extra_args=["--disable-gpu", "--no-sandbox"],
    )

    extraction = LLMExtractionStrategy(
        provider="openai/gpt-4o-mini",
        schema={
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "price": {"type": "string"},
            }
        }
    )

    run_config = CrawlerRunConfig(
        extraction_strategy=extraction,
    )

    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url="https://example.com/product",
            config=run_config,
        )
        print(result.markdown[:200])
        print(result.extracted_content)

asyncio.run(main())

After (AlterLab)

Python

import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={
        "X-API-Key": "sk_live_xxx",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://example.com/product",
        "formats": ["markdown"],
        "extraction_schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "price": {"type": "string"},
            }
        },
    }
)

data = response.json()
print(data["content"]["markdown"][:200])
print(data["content"]["extracted"])

Deep Crawl Migration

Before (Crawl4AI Deep Crawl)

Python

from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
from crawl4ai.content_filter import PruningContentFilter

strategy = BFSDeepCrawlStrategy(
    max_depth=3,
    include_external=False,
    url_filter=lambda url: "/blog/" in url,
)

config = CrawlerRunConfig(
    deep_crawl_strategy=strategy,
    content_filter=PruningContentFilter(threshold=0.5),
)

async with AsyncWebCrawler(config=BrowserConfig()) as crawler:
    results = await crawler.arun("https://example.com", config=config)
    for result in results:
        print(result.url, result.markdown[:100])

After (AlterLab Crawl)

Python

import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/crawl",
    headers={
        "X-API-Key": "sk_live_xxx",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://example.com",
        "max_depth": 3,
        "include_patterns": ["/blog/"],
        "formats": ["markdown"],
    }
)

data = response.json()
for page in data["pages"]:
    print(page["url"], page["content"]["markdown"][:100])

Python

import requests

# No Playwright, no browser, no async — just an HTTP request
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={
        "X-API-Key": "sk_live_xxx",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://example.com",
        "formats": ["markdown", "html"],
    }
)

data = response.json()
if data.get("success"):
    print(f"Markdown: {data['content']['markdown'][:200]}...")
    print(f"Tier: {data['tier_used']}, Cost: {data['credits_charged']}")

Feature Comparison

Feature	Crawl4AI	AlterLab
Deployment	Self-hosted	Managed API
Language support	Python only	Any (REST API)
Deep crawling	Yes	Yes
LLM extraction	BYO provider	Built-in (no LLM key needed)
CSS extraction	Yes	Yes
Anti-bot bypass	Basic (browser only)	5-tier escalation
Proxy rotation	Manual setup	Built-in
Markdown output	Yes	Yes
URL filtering	Yes	Yes
Scheduling	DIY (cron/celery)	Built-in
Webhooks	No	Yes
Batch processing	DIY (async loops)	Batch API
Cost	Free (+ infra costs)	Pay per request

Pricing Comparison

Free vs Managed

Crawl4AI is free and open-source, but you pay for infrastructure (servers, proxies, browser instances). AlterLab charges per request with everything included.

Cost Component	Crawl4AI	AlterLab
Software license	Free (Apache 2.0)	Pay per request
Server hosting	$20-100+/mo	Included
Proxy rotation	$50-500+/mo (residential)	Included
Browser infra	RAM/CPU costs	Included
LLM API calls	Your provider, your cost	Included (Cortex)
Simple HTML scrape	Infra cost only	$0.0002/req (Tier 1)
JS-rendered scrape	Infra cost only	$0.002/req (Tier 3)

For small-scale scraping (<10K pages/month), self-hosting Crawl4AI can be cheaper. At scale, AlterLab's per-request pricing eliminates infrastructure management overhead and proxy costs that grow with volume.

Common Gotchas

1. No async context manager needed

Crawl4AI requires async with AsyncWebCrawler(). AlterLab is a stateless HTTP call with no session management. You can use synchronous HTTP clients like requests.

2. No browser config

Crawl4AI needs BrowserConfig for headless settings, viewport, etc. AlterLab handles all browser configuration automatically. Browser rendering activates when the target site requires it.

3. LLM extraction does not require your own API key

Crawl4AI's LLMExtractionStrategy needs your OpenAI/Anthropic key. AlterLab uses its own LLM infrastructure (Cortex) at no extra cost for extraction. Just provide extraction_schema.

4. Filter chains become URL patterns

Crawl4AI filter chains with FilterChain and PruningContentFilter become include_patterns and exclude_patterns (regex arrays) in the crawl request body.

5. Response format is JSON, not Python objects

Crawl4AI returns CrawlResult objects with .markdown, .html, .extracted_content. AlterLab returns JSON with content.markdown, content.html, content.extracted.

← From Spider API Reference →

Last updated: March 2026

Guide

Migration

Crawl4AI to AlterLab

Why Switch

Self-Hosted vs Managed

Crawl4AI Limitations

Self-hosted: you manage browsers, proxies, and scaling
Proxy rotation requires external providers and configuration
No built-in anti-bot bypass beyond basic browser automation
Scaling requires Docker/Kubernetes infrastructure
No built-in scheduling, webhooks, or batch processing

AlterLab Advantages

Fully managed: no servers, browsers, or proxies to maintain
Built-in proxy rotation with residential and mobile proxies
5-tier anti-bot escalation with fingerprinting and CAPTCHA solving
Auto-scales to handle any volume
Scheduling, webhooks, batch scraping, and search built in

Concept Mapping

Crawl4AI concepts map to AlterLab API parameters:

Crawl4AI	AlterLab	Notes
`AsyncWebCrawler`	`POST /v1/scrape`	HTTP request replaces Python class instantiation
`BFSDeepCrawlStrategy`	`POST /v1/crawl` with `max_depth`	BFS crawling with depth control
`BestFirstCrawlingStrategy`	Content-aware prioritization	AlterLab prioritizes pages by relevance
`FilterChain`	`include_patterns` / `exclude_patterns`	Regex patterns to include/exclude URLs
`LLMExtractionStrategy`	`extraction_schema` + `extraction_profile`	JSON Schema extraction with optional LLM
`JsonCssExtractionStrategy`	`extraction_schema` with CSS selectors	Use selector hints in your JSON Schema
`result.markdown`	`content.markdown`	Clean markdown output in both
`browser_config`	Automatic	No browser config needed; AlterLab manages browser infrastructure
Proxy providers (manual)	Built-in (or BYOP)	Proxies included in tier pricing; BYOP for discount

Code Migration

Library to API

Crawl4AI requires installing Playwright, managing browser instances, and writing Python code. AlterLab is a single HTTP call from any language.

Before (Crawl4AI)

Python

import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy

async def main():
    browser_config = BrowserConfig(
        headless=True,
        extra_args=["--disable-gpu", "--no-sandbox"],
    )

    extraction = LLMExtractionStrategy(
        provider="openai/gpt-4o-mini",
        schema={
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "price": {"type": "string"},
            }
        }
    )

    run_config = CrawlerRunConfig(
        extraction_strategy=extraction,
    )

    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url="https://example.com/product",
            config=run_config,
        )
        print(result.markdown[:200])
        print(result.extracted_content)

asyncio.run(main())

After (AlterLab)

Python

import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={
        "X-API-Key": "sk_live_xxx",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://example.com/product",
        "formats": ["markdown"],
        "extraction_schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "price": {"type": "string"},
            }
        },
    }
)

data = response.json()
print(data["content"]["markdown"][:200])
print(data["content"]["extracted"])

Deep Crawl Migration

Before (Crawl4AI Deep Crawl)

Python

from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
from crawl4ai.content_filter import PruningContentFilter

strategy = BFSDeepCrawlStrategy(
    max_depth=3,
    include_external=False,
    url_filter=lambda url: "/blog/" in url,
)

config = CrawlerRunConfig(
    deep_crawl_strategy=strategy,
    content_filter=PruningContentFilter(threshold=0.5),
)

async with AsyncWebCrawler(config=BrowserConfig()) as crawler:
    results = await crawler.arun("https://example.com", config=config)
    for result in results:
        print(result.url, result.markdown[:100])

After (AlterLab Crawl)

Python

import requests

response = requests.post(
    "https://api.alterlab.io/api/v1/crawl",
    headers={
        "X-API-Key": "sk_live_xxx",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://example.com",
        "max_depth": 3,
        "include_patterns": ["/blog/"],
        "formats": ["markdown"],
    }
)

data = response.json()
for page in data["pages"]:
    print(page["url"], page["content"]["markdown"][:100])

Python

import requests

# No Playwright, no browser, no async — just an HTTP request
response = requests.post(
    "https://api.alterlab.io/api/v1/scrape",
    headers={
        "X-API-Key": "sk_live_xxx",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://example.com",
        "formats": ["markdown", "html"],
    }
)

data = response.json()
if data.get("success"):
    print(f"Markdown: {data['content']['markdown'][:200]}...")
    print(f"Tier: {data['tier_used']}, Cost: {data['credits_charged']}")

Feature Comparison

Feature	Crawl4AI	AlterLab
Deployment	Self-hosted	Managed API
Language support	Python only	Any (REST API)
Deep crawling	Yes	Yes
LLM extraction	BYO provider	Built-in (no LLM key needed)
CSS extraction	Yes	Yes
Anti-bot bypass	Basic (browser only)	5-tier escalation
Proxy rotation	Manual setup	Built-in
Markdown output	Yes	Yes
URL filtering	Yes	Yes
Scheduling	DIY (cron/celery)	Built-in
Webhooks	No	Yes
Batch processing	DIY (async loops)	Batch API
Cost	Free (+ infra costs)	Pay per request

Pricing Comparison

Free vs Managed

Crawl4AI is free and open-source, but you pay for infrastructure (servers, proxies, browser instances). AlterLab charges per request with everything included.

Cost Component	Crawl4AI	AlterLab
Software license	Free (Apache 2.0)	Pay per request
Server hosting	$20-100+/mo	Included
Proxy rotation	$50-500+/mo (residential)	Included
Browser infra	RAM/CPU costs	Included
LLM API calls	Your provider, your cost	Included (Cortex)
Simple HTML scrape	Infra cost only	$0.0002/req (Tier 1)
JS-rendered scrape	Infra cost only	$0.002/req (Tier 3)

Common Gotchas

1. No async context manager needed

Crawl4AI requires async with AsyncWebCrawler(). AlterLab is a stateless HTTP call with no session management. You can use synchronous HTTP clients like requests.

2. No browser config

Crawl4AI needs BrowserConfig for headless settings, viewport, etc. AlterLab handles all browser configuration automatically. Browser rendering activates when the target site requires it.

3. LLM extraction does not require your own API key

Crawl4AI's LLMExtractionStrategy needs your OpenAI/Anthropic key. AlterLab uses its own LLM infrastructure (Cortex) at no extra cost for extraction. Just provide extraction_schema.

4. Filter chains become URL patterns

Crawl4AI filter chains with FilterChain and PruningContentFilter become include_patterns and exclude_patterns (regex arrays) in the crawl request body.

5. Response format is JSON, not Python objects

Crawl4AI returns CrawlResult objects with .markdown, .html, .extracted_content. AlterLab returns JSON with content.markdown, content.html, content.extracted.

← From Spider API Reference →

Last updated: March 2026