AlterLabAlterLab
PricingComparePlaygroundBlogDocs
    AlterLabAlterLab
    PricingPlaygroundBlogDocsChangelog
    IntroductionInstallationYour First Request
    REST APIJob PollingAPI Keys
    OverviewPythonNode.js
    JavaScript RenderingOutput FormatsNewPDF & OCRCachingWebhooksWebSocket Real-TimeNewBring Your Own ProxyProWeb CrawlingNewBatch ScrapingNewSchedulerNewChange DetectionNewCloud Storage ExportNewSpend LimitsNewOrganizations & TeamsNewAlerts & NotificationsNew
    Structured ExtractionAIE-commerce ScrapingNews MonitoringPrice MonitoringNewMulti-Page CrawlingNewMonitoring DashboardNewAI Agent / MCPMCPData Pipeline to CloudNew
    PricingRate LimitsError Codes
    From FirecrawlFrom ApifyNewFrom ScrapingBee / ScraperAPINew
    PlaygroundPricingStatus
    Tutorial
    MCP

    AI Agent & MCP Integration

    Connect AI agents to the web using AlterLab's MCP server. Give Claude, Cursor, or any MCP-compatible tool the ability to scrape, extract, and screenshot any website.

    What is MCP?

    The Model Context Protocol (MCP) is an open standard for connecting AI models to external tools and data sources. AlterLab's MCP server exposes web scraping as a set of tools that any MCP-compatible client can call.

    Overview

    Traditional scraping requires writing code for every target. MCP integration flips this: your AI agent decides what to scrape, how to extract data, and what to do with results — all through natural language.

    9 Tools

    Scrape, extract, screenshot, estimate costs, check balance, and manage authenticated sessions.

    Zero Code

    Ask your AI agent in plain English. It picks the right tool, parameters, and output format.

    Full Anti-Bot

    Every tool call goes through AlterLab's tier escalation. Protected sites are handled automatically.

    Step 1: Install the MCP Server

    The AlterLab MCP server is published on npm. Install it globally so MCP clients can find the binary:

    npm install -g alterlab-mcp-server

    Requirements

    Node.js 18+ is required. Get your API key from the AlterLab Dashboard under Settings → API Keys.

    Step 2: Configure Your MCP Client

    Add AlterLab to your MCP client's configuration file. Below are examples for popular clients.

    Claude Desktop

    Edit your Claude Desktop config file:

    • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
    {
      "mcpServers": {
        "alterlab": {
          "command": "npx",
          "args": ["-y", "alterlab-mcp-server"],
          "env": {
            "ALTERLAB_API_KEY": "sk_live_your_api_key_here"
          }
        }
      }
    }

    Keep Your API Key Secret

    Never commit your API key to version control. Use environment variables or a secrets manager in production environments.

    Step 3: Available Tools

    Once configured, your AI agent has access to 9 tools. Here's what each one does:

    Core Tools

    alterlab_scrape

    Scrape a URL and return content as markdown, text, HTML, or JSON. Automatically handles anti-bot protection with tier escalation.

    url (required) — URL to scrape
    formats (default: ["markdown"]) — Output formats: text, json, html, markdown
    render_js (default: false) — Render JavaScript with headless browser (+3 credits)
    use_proxy (default: false) — Route through premium proxy (+1 credit)
    session_id (optional) — UUID of a stored session for authenticated scraping
    alterlab_extract

    Extract structured data using pre-built profiles, custom JSON schemas, or natural language prompts.

    url (required) — URL to extract from
    extraction_profile (default: "auto") — auto, product, article, job_posting, faq, recipe, event
    extraction_schema (optional) — Custom JSON Schema for precise field extraction
    extraction_prompt (optional) — Natural language instructions
    alterlab_screenshot

    Take a full-page screenshot of any URL. Returns a PNG image directly in the conversation.

    url (required) — URL to screenshot
    wait_for (optional) — CSS selector to wait for before capturing

    Utility Tools

    alterlab_estimate_cost

    Estimate the credit cost of scraping a URL without actually scraping it. Returns predicted tier, cost, and confidence level.

    alterlab_check_balance

    Check your account balance, total deposited, and total spent. No parameters needed.

    Session Management

    Sessions let you scrape authenticated pages by storing cookies across requests.

    alterlab_create_session

    Create a new session with cookies for authenticated scraping.

    alterlab_list_sessions

    List all stored sessions and their domains.

    alterlab_validate_session

    Check if a session's cookies are still valid.

    alterlab_delete_session

    Delete a stored session when no longer needed.

    Step 4: Basic Usage Patterns

    With the MCP server configured, you can ask your AI agent to scrape in natural language. The agent translates your request into the right tool call.

    Simple Scraping

    You say:

    "Scrape https://news.ycombinator.com and give me the top 10 stories."

    The agent calls alterlab_scrape with the URL, gets the markdown content, and parses out the story titles and links.

    Structured Extraction

    You say:

    "Extract the product name, price, and rating from https://example.com/product/123"

    The agent calls alterlab_extract with extraction_profile: "product" and returns structured JSON with the requested fields.

    Cost-Aware Scraping

    You say:

    "How much would it cost to scrape these 50 URLs? Check a few first."

    The agent calls alterlab_estimate_cost on a sample, then alterlab_check_balance to verify you have enough credits before proceeding.

    Example: Autonomous Research Agent

    This example shows how to build an AI agent that autonomously researches a topic by scraping multiple sources, extracting key data, and producing a summary report.

    How It Works

    The agent uses AlterLab tools as "senses" — it decides which URLs to visit, what data to extract, and when it has enough information. You provide the goal; the agent handles execution.

    System Prompt

    Give your AI agent this system prompt to enable autonomous research:

    You are a research agent with web scraping capabilities.
    
    When asked to research a topic:
    
    1. Plan: Identify 3-5 authoritative sources to investigate
    2. Scrape: Use alterlab_scrape to fetch each source as markdown
    3. Extract: If pages have structured data, use alterlab_extract
    4. Verify: Cross-reference facts across multiple sources
    5. Synthesize: Produce a structured report with citations
    
    Rules:
    - Always check costs with alterlab_estimate_cost before large batches
    - Use alterlab_check_balance to monitor credit usage
    - Prefer markdown format for scraping (best for analysis)
    - Use render_js=true only for JavaScript-heavy sites (SPAs, dashboards)
    - Cite sources with URLs in your final report

    Agent Conversation

    Here is what a research session looks like:

    User: Research the current state of web scraping regulations
          in the EU. Include recent court rulings.
    
    Agent: I'll research this topic across multiple sources.
    
    [Calls alterlab_scrape: "https://gdpr.eu/tag/web-scraping/"]
    [Calls alterlab_scrape: "https://techcrunch.com/tag/web-scraping/"]
    [Calls alterlab_scrape: "https://eur-lex.europa.eu/..." with render_js=true]
    [Calls alterlab_extract on legal database with extraction_prompt:
     "Extract case name, date, ruling, and implications for web scraping"]
    
    Agent: Here's my research report:
    
    ## EU Web Scraping Regulations - March 2026
    
    ### Key Findings
    1. GDPR Article 6(1)(f) — legitimate interest basis...
    2. Recent CJEU ruling in Case C-xxx/24...
    3. ...
    
    ### Sources
    - [GDPR.eu](https://gdpr.eu/...) — Retrieved March 2026
    - [TechCrunch](https://techcrunch.com/...) — Retrieved March 2026
    - [EUR-Lex](https://eur-lex.europa.eu/...) — Official text

    Python Agent Implementation

    If you prefer to build your own agent in code, here is a Python implementation using the AlterLab SDK directly:

    import alterlab
    import json
    
    client = alterlab.AlterLab(api_key="sk_live_your_key")
    
    def research_agent(topic: str, max_sources: int = 5) -> dict:
        """Autonomous research agent that scrapes and synthesizes."""
    
        # Step 1: Scrape a search-oriented page for source discovery
        search_result = client.scrape(
            url=f"https://news.ycombinator.com/",
            formats=["markdown"],
        )
    
        # Step 2: Scrape each source and collect content
        sources = []
        for url in discovered_urls[:max_sources]:
            # Check cost first
            estimate = client.estimate(url=url)
            print(f"Estimated cost for {url}: {estimate.credits} credits")
    
            result = client.scrape(
                url=url,
                formats=["markdown"],
                advanced={"render_js": estimate.needs_js},
            )
            sources.append({
                "url": url,
                "content": result.markdown[:5000],  # Trim for context window
                "title": result.metadata.get("title", ""),
            })
    
        # Step 3: Extract structured data where applicable
        for source in sources:
            extraction = client.scrape(
                url=source["url"],
                formats=["json"],
                extraction_prompt=f"Extract key facts about {topic}",
            )
            source["structured_data"] = extraction.json_data
    
        return {
            "topic": topic,
            "source_count": len(sources),
            "sources": sources,
        }

    Example: RAG Pipeline with AlterLab

    Retrieval-Augmented Generation (RAG) combines web scraping with LLM reasoning. Use AlterLab to fetch fresh web content and feed it into your LLM as context for grounded, up-to-date answers.

    Architecture

    1

    Query

    User asks a question

    2

    Retrieve

    Scrape relevant pages via AlterLab

    3

    Augment

    Inject scraped content as LLM context

    4

    Generate

    LLM answers with cited sources

    MCP-Based RAG

    The simplest RAG pipeline uses MCP directly — no code required. Just instruct your AI agent:

    When I ask a question, follow this process:
    
    1. Identify 2-3 authoritative URLs that would answer the question
    2. Use alterlab_scrape to fetch each URL as markdown
    3. Read the scraped content carefully
    4. Answer my question using ONLY information from the scraped pages
    5. Cite each claim with the source URL
    
    Always scrape fresh content — don't rely on your training data for
    facts that may have changed.

    Code-Based RAG Pipeline

    import alterlab
    from openai import OpenAI
    
    scraper = alterlab.AlterLab(api_key="sk_live_your_key")
    llm = OpenAI()
    
    def rag_answer(question: str, source_urls: list[str]) -> str:
        """Answer a question using fresh web content as context."""
    
        # Step 1: Scrape all source URLs
        context_parts = []
        for url in source_urls:
            result = scraper.scrape(
                url=url,
                formats=["markdown"],
            )
            context_parts.append(
                f"## Source: {url}\n\n{result.markdown[:3000]}"
            )
    
        context = "\n\n---\n\n".join(context_parts)
    
        # Step 2: Send to LLM with scraped context
        response = llm.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Answer questions using ONLY the provided context. "
                        "Cite sources with URLs. If the context doesn't "
                        "contain the answer, say so."
                    ),
                },
                {
                    "role": "user",
                    "content": f"Context:\n{context}\n\nQuestion: {question}",
                },
            ],
        )
    
        return response.choices[0].message.content
    
    # Usage
    answer = rag_answer(
        question="What are the latest changes to robots.txt standards?",
        source_urls=[
            "https://developers.google.com/search/docs/crawling-indexing/robots-txt",
            "https://www.rfc-editor.org/rfc/rfc9309",
        ],
    )
    print(answer)

    Best Practices

    1. Use Markdown Format for LLM Context

    Request formats: ["markdown"] when scraping for AI consumption. Markdown preserves structure (headings, lists, links) while being token-efficient compared to HTML.

    2. Estimate Before Batch Scraping

    Always call alterlab_estimate_cost on a sample of URLs before scraping hundreds of pages. This prevents unexpected credit consumption, especially for sites that require JavaScript rendering.

    3. Trim Content for Context Windows

    Scraped pages can be long. Truncate content to the first 3,000 to 5,000 characters per source, or use alterlab_extract with a focused schema to get only the data you need.

    4. Use Sessions for Authenticated Content

    For gated content, create a session with alterlab_create_session and pass the session_id to subsequent scrape calls. The session persists cookies across requests.

    5. Enable JS Rendering Only When Needed

    JavaScript rendering adds 3 credits per request and increases latency. Most news sites, blogs, and documentation pages work fine without it. Reserve render_js: true for SPAs, dashboards, and dynamic content.

    6. Cross-Reference Multiple Sources

    For research agents, scrape at least 3 sources per claim. LLMs can hallucinate when given thin context. More sources means better fact-checking and higher confidence in the final output.

    ← Previous: Price MonitoringNext: Pricing Reference →
    Last updated: March 2026

    On this page

    AlterLabAlterLab

    AlterLab is the modern web scraping platform for developers. Reliable, scalable, and easy to use.

    Product

    • Pricing
    • Documentation
    • Changelog
    • Status

    Solutions

    • Python API
    • JS Rendering
    • Anti-Bot Bypass
    • Compare APIs

    Comparisons

    • Compare All
    • vs ScraperAPI
    • vs Firecrawl
    • vs ScrapingBee
    • vs Bright Data
    • vs Apify

    Company

    • About
    • Blog
    • Contact
    • FAQ

    Guides

    • Bypass Cloudflare
    • Playwright Anti-Detection
    • Puppeteer Bypass Guide
    • Selenium Detection Fix
    • Best Scraping APIs 2026

    Legal

    • Privacy
    • Terms
    • Acceptable Use
    • DPA
    • Cookie Policy
    • Licenses

    © 2026 RapierCraft Inc. All rights reserved.

    Middletown, DE