Tutorial

MCP

AI Agent & MCP Integration

Connect AI agents to the web using AlterLab's MCP server. Give Claude, Cursor, or any MCP-compatible tool the ability to scrape, extract, and screenshot any website.

What is MCP?

The Model Context Protocol (MCP) is an open standard for connecting AI models to external tools and data sources. AlterLab's MCP server exposes web scraping as a set of tools that any MCP-compatible client can call.

Overview

Traditional scraping requires writing code for every target. MCP integration flips this: your AI agent decides what to scrape, how to extract data, and what to do with results — all through natural language.

9 Tools

Scrape, extract, screenshot, estimate costs, check balance, and manage authenticated sessions.

Zero Code

Ask your AI agent in plain English. It picks the right tool, parameters, and output format.

Full Anti-Bot

Every tool call goes through AlterLab's tier escalation. Protected sites are handled automatically.

Step 1: Install the MCP Server

The AlterLab MCP server is published on npm. Install it globally so MCP clients can find the binary:

Bash

npm install -g alterlab-mcp-server

Requirements

Node.js 18+ is required. Get your API key from the AlterLab Dashboard under Settings → API Keys.

Step 2: Configure Your MCP Client

Add AlterLab to your MCP client's configuration file. Below are examples for popular clients.

Claude Desktop

Edit your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

JSON

{
  "mcpServers": {
    "alterlab": {
      "command": "npx",
      "args": ["-y", "alterlab-mcp-server"],
      "env": {
        "ALTERLAB_API_KEY": "sk_live_your_api_key_here"
      }
    }
  }
}

Keep Your API Key Secret

Never commit your API key to version control. Use environment variables or a secrets manager in production environments.

Step 3: Available Tools

Once configured, your AI agent has access to 9 tools. Here's what each one does:

Core Tools

alterlab_scrape

Scrape a URL and return content as markdown, text, HTML, or JSON. Automatically handles anti-bot protection with tier escalation.

url (required) — URL to scrape

formats (default: ["markdown"]) — Output formats: text, json, html, markdown

render_js (default: false) — Render JavaScript with headless browser (+3x cost)

use_proxy (default: false) — Route through premium proxy (+1x cost)

session_id (optional) — UUID of a stored session for authenticated scraping

alterlab_extract

Extract structured data using pre-built profiles, custom JSON schemas, or natural language prompts.

url (required) — URL to extract from

extraction_profile (default: "auto") — auto, product, article, job_posting, faq, recipe, event

extraction_schema (optional) — Custom JSON Schema for precise field extraction

extraction_prompt (optional) — Natural language instructions

alterlab_screenshot

Take a full-page screenshot of any URL. Returns a PNG image directly in the conversation.

url (required) — URL to screenshot

wait_for (optional) — CSS selector to wait for before capturing

Utility Tools

alterlab_estimate_cost

Estimate the cost of scraping a URL without actually scraping it. Returns predicted tier, cost, and confidence level.

alterlab_check_balance

Check your account balance, total deposited, and total spent. No parameters needed.

Session Management

Sessions let you scrape authenticated pages by storing cookies across requests.

alterlab_create_session

Create a new session with cookies for authenticated scraping.

alterlab_list_sessions

List all stored sessions and their domains.

alterlab_validate_session

Check if a session's cookies are still valid.

alterlab_delete_session

Delete a stored session when no longer needed.

Step 4: Basic Usage Patterns

With the MCP server configured, you can ask your AI agent to scrape in natural language. The agent translates your request into the right tool call.

Simple Scraping

You say:

"Scrape https://news.ycombinator.com and give me the top 10 stories."

The agent calls alterlab_scrape with the URL, gets the markdown content, and parses out the story titles and links.

Structured Extraction

You say:

"Extract the product name, price, and rating from https://example.com/product/123"

The agent calls alterlab_extract with extraction_profile: "product" and returns structured JSON with the requested fields.

Cost-Aware Scraping

You say:

"How much would it cost to scrape these 50 URLs? Check a few first."

The agent calls alterlab_estimate_cost on a sample, then alterlab_check_balance to verify you have enough balance before proceeding.

Example: Autonomous Research Agent

This example shows how to build an AI agent that autonomously researches a topic by scraping multiple sources, extracting key data, and producing a summary report.

How It Works

The agent uses AlterLab tools as "senses" — it decides which URLs to visit, what data to extract, and when it has enough information. You provide the goal; the agent handles execution.

System Prompt

Give your AI agent this system prompt to enable autonomous research:

TEXT

You are a research agent with web scraping capabilities.

When asked to research a topic:

1. Plan: Identify 3-5 authoritative sources to investigate
2. Scrape: Use alterlab_scrape to fetch each source as markdown
3. Extract: If pages have structured data, use alterlab_extract
4. Verify: Cross-reference facts across multiple sources
5. Synthesize: Produce a structured report with citations

Rules:
- Always check costs with alterlab_estimate_cost before large batches
- Use alterlab_check_balance to monitor balance
- Prefer markdown format for scraping (best for analysis)
- Use render_js=true only for JavaScript-heavy sites (SPAs, dashboards)
- Cite sources with URLs in your final report

Agent Conversation

Here is what a research session looks like:

TEXT

User: Research the current state of web scraping regulations
      in the EU. Include recent court rulings.

Agent: I'll research this topic across multiple sources.

[Calls alterlab_scrape: "https://gdpr.eu/tag/web-scraping/"]
[Calls alterlab_scrape: "https://techcrunch.com/tag/web-scraping/"]
[Calls alterlab_scrape: "https://eur-lex.europa.eu/..." with render_js=true]
[Calls alterlab_extract on legal database with extraction_prompt:
 "Extract case name, date, ruling, and implications for web scraping"]

Agent: Here's my research report:

## EU Web Scraping Regulations - March 2026

### Key Findings
1. GDPR Article 6(1)(f) — legitimate interest basis...
2. Recent CJEU ruling in Case C-xxx/24...
3. ...

### Sources
- [GDPR.eu](https://gdpr.eu/...) — Retrieved March 2026
- [TechCrunch](https://techcrunch.com/...) — Retrieved March 2026
- [EUR-Lex](https://eur-lex.europa.eu/...) — Official text

Python Agent Implementation

If you prefer to build your own agent in code, here is a Python implementation using the AlterLab SDK directly:

Python

import alterlab
import json

client = alterlab.AlterLab(api_key="sk_live_your_key")

def research_agent(topic: str, max_sources: int = 5) -> dict:
    """Autonomous research agent that scrapes and synthesizes."""

    # Step 1: Scrape a search-oriented page for source discovery
    search_result = client.scrape(
        url=f"https://news.ycombinator.com/",
        formats=["markdown"],
    )

    # Step 2: Scrape each source and collect content
    sources = []
    for url in discovered_urls[:max_sources]:
        # Check cost first
        estimate = client.estimate(url=url)
        print(f"Estimated cost for {url}: {estimate.credits} credits")

        result = client.scrape(
            url=url,
            formats=["markdown"],
            advanced={"render_js": estimate.needs_js},
        )
        sources.append({
            "url": url,
            "content": result.markdown[:5000],  # Trim for context window
            "title": result.metadata.get("title", ""),
        })

    # Step 3: Extract structured data where applicable
    for source in sources:
        extraction = client.scrape(
            url=source["url"],
            formats=["json"],
            extraction_prompt=f"Extract key facts about {topic}",
        )
        source["structured_data"] = extraction.json_data

    return {
        "topic": topic,
        "source_count": len(sources),
        "sources": sources,
    }

Example: RAG Pipeline with AlterLab

Retrieval-Augmented Generation (RAG) combines web scraping with LLM reasoning. Use AlterLab to fetch fresh web content and feed it into your LLM as context for grounded, up-to-date answers.

Architecture

Query

User asks a question

Retrieve

Scrape relevant pages via AlterLab

Augment

Inject scraped content as LLM context

Generate

LLM answers with cited sources

MCP-Based RAG

The simplest RAG pipeline uses MCP directly — no code required. Just instruct your AI agent:

TEXT

When I ask a question, follow this process:

1. Identify 2-3 authoritative URLs that would answer the question
2. Use alterlab_scrape to fetch each URL as markdown
3. Read the scraped content carefully
4. Answer my question using ONLY information from the scraped pages
5. Cite each claim with the source URL

Always scrape fresh content — don't rely on your training data for
facts that may have changed.

Code-Based RAG Pipeline

Python

import alterlab
from openai import OpenAI

scraper = alterlab.AlterLab(api_key="sk_live_your_key")
llm = OpenAI()

def rag_answer(question: str, source_urls: list[str]) -> str:
    """Answer a question using fresh web content as context."""

    # Step 1: Scrape all source URLs
    context_parts = []
    for url in source_urls:
        result = scraper.scrape(
            url=url,
            formats=["markdown"],
        )
        context_parts.append(
            f"## Source: {url}\n\n{result.markdown[:3000]}"
        )

    context = "\n\n---\n\n".join(context_parts)

    # Step 2: Send to LLM with scraped context
    response = llm.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Answer questions using ONLY the provided context. "
                    "Cite sources with URLs. If the context doesn't "
                    "contain the answer, say so."
                ),
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}",
            },
        ],
    )

    return response.choices[0].message.content

# Usage
answer = rag_answer(
    question="What are the latest changes to robots.txt standards?",
    source_urls=[
        "https://developers.google.com/search/docs/crawling-indexing/robots-txt",
        "https://www.rfc-editor.org/rfc/rfc9309",
    ],
)
print(answer)

Best Practices

1. Use Markdown Format for LLM Context

Request formats: ["markdown"] when scraping for AI consumption. Markdown preserves structure (headings, lists, links) while being token-efficient compared to HTML.

2. Estimate Before Batch Scraping

Always call alterlab_estimate_cost on a sample of URLs before scraping hundreds of pages. This prevents unexpected credit consumption, especially for sites that require JavaScript rendering.

3. Trim Content for Context Windows

Scraped pages can be long. Truncate content to the first 3,000 to 5,000 characters per source, or use alterlab_extract with a focused schema to get only the data you need.

4. Use Sessions for Authenticated Content

For gated content, create a session with alterlab_create_session and pass the session_id to subsequent scrape calls. The session persists cookies across requests.

5. Enable JS Rendering Only When Needed

JavaScript rendering adds 3 credits per request and increases latency. Most news sites, blogs, and documentation pages work fine without it. Reserve render_js: true for SPAs, dashboards, and dynamic content.

6. Cross-Reference Multiple Sources

For research agents, scrape at least 3 sources per claim. LLMs can hallucinate when given thin context. More sources means better fact-checking and higher confidence in the final output.

Monitoring Dashboard AI Research Agent

Last updated: June 2026

Tutorial

MCP

AI Agent & MCP Integration

Connect AI agents to the web using AlterLab's MCP server. Give Claude, Cursor, or any MCP-compatible tool the ability to scrape, extract, and screenshot any website.

What is MCP?

Overview

9 Tools

Scrape, extract, screenshot, estimate costs, check balance, and manage authenticated sessions.

Zero Code

Ask your AI agent in plain English. It picks the right tool, parameters, and output format.

Full Anti-Bot

Every tool call goes through AlterLab's tier escalation. Protected sites are handled automatically.

Step 1: Install the MCP Server

The AlterLab MCP server is published on npm. Install it globally so MCP clients can find the binary:

Bash

npm install -g alterlab-mcp-server

Requirements

Node.js 18+ is required. Get your API key from the AlterLab Dashboard under Settings → API Keys.

Step 2: Configure Your MCP Client

Add AlterLab to your MCP client's configuration file. Below are examples for popular clients.

Claude Desktop

Edit your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

JSON

{
  "mcpServers": {
    "alterlab": {
      "command": "npx",
      "args": ["-y", "alterlab-mcp-server"],
      "env": {
        "ALTERLAB_API_KEY": "sk_live_your_api_key_here"
      }
    }
  }
}

Keep Your API Key Secret

Never commit your API key to version control. Use environment variables or a secrets manager in production environments.

Step 3: Available Tools

Once configured, your AI agent has access to 9 tools. Here's what each one does:

Core Tools

alterlab_scrape

Scrape a URL and return content as markdown, text, HTML, or JSON. Automatically handles anti-bot protection with tier escalation.

url (required) — URL to scrape

formats (default: ["markdown"]) — Output formats: text, json, html, markdown

render_js (default: false) — Render JavaScript with headless browser (+3x cost)

use_proxy (default: false) — Route through premium proxy (+1x cost)

session_id (optional) — UUID of a stored session for authenticated scraping

alterlab_extract

Extract structured data using pre-built profiles, custom JSON schemas, or natural language prompts.

url (required) — URL to extract from

extraction_profile (default: "auto") — auto, product, article, job_posting, faq, recipe, event

extraction_schema (optional) — Custom JSON Schema for precise field extraction

extraction_prompt (optional) — Natural language instructions

alterlab_screenshot

Take a full-page screenshot of any URL. Returns a PNG image directly in the conversation.

url (required) — URL to screenshot

wait_for (optional) — CSS selector to wait for before capturing

Utility Tools

alterlab_estimate_cost

Estimate the cost of scraping a URL without actually scraping it. Returns predicted tier, cost, and confidence level.

alterlab_check_balance

Check your account balance, total deposited, and total spent. No parameters needed.

Session Management

Sessions let you scrape authenticated pages by storing cookies across requests.

alterlab_create_session

Create a new session with cookies for authenticated scraping.

alterlab_list_sessions

List all stored sessions and their domains.

alterlab_validate_session

Check if a session's cookies are still valid.

alterlab_delete_session

Delete a stored session when no longer needed.

Step 4: Basic Usage Patterns

With the MCP server configured, you can ask your AI agent to scrape in natural language. The agent translates your request into the right tool call.

Simple Scraping

You say:

"Scrape https://news.ycombinator.com and give me the top 10 stories."

The agent calls alterlab_scrape with the URL, gets the markdown content, and parses out the story titles and links.

Structured Extraction

You say:

"Extract the product name, price, and rating from https://example.com/product/123"

The agent calls alterlab_extract with extraction_profile: "product" and returns structured JSON with the requested fields.

Cost-Aware Scraping

You say:

"How much would it cost to scrape these 50 URLs? Check a few first."

The agent calls alterlab_estimate_cost on a sample, then alterlab_check_balance to verify you have enough balance before proceeding.

Example: Autonomous Research Agent

This example shows how to build an AI agent that autonomously researches a topic by scraping multiple sources, extracting key data, and producing a summary report.

How It Works

The agent uses AlterLab tools as "senses" — it decides which URLs to visit, what data to extract, and when it has enough information. You provide the goal; the agent handles execution.

System Prompt

Give your AI agent this system prompt to enable autonomous research:

TEXT

You are a research agent with web scraping capabilities.

When asked to research a topic:

1. Plan: Identify 3-5 authoritative sources to investigate
2. Scrape: Use alterlab_scrape to fetch each source as markdown
3. Extract: If pages have structured data, use alterlab_extract
4. Verify: Cross-reference facts across multiple sources
5. Synthesize: Produce a structured report with citations

Rules:
- Always check costs with alterlab_estimate_cost before large batches
- Use alterlab_check_balance to monitor balance
- Prefer markdown format for scraping (best for analysis)
- Use render_js=true only for JavaScript-heavy sites (SPAs, dashboards)
- Cite sources with URLs in your final report

Agent Conversation

Here is what a research session looks like:

TEXT

User: Research the current state of web scraping regulations
      in the EU. Include recent court rulings.

Agent: I'll research this topic across multiple sources.

[Calls alterlab_scrape: "https://gdpr.eu/tag/web-scraping/"]
[Calls alterlab_scrape: "https://techcrunch.com/tag/web-scraping/"]
[Calls alterlab_scrape: "https://eur-lex.europa.eu/..." with render_js=true]
[Calls alterlab_extract on legal database with extraction_prompt:
 "Extract case name, date, ruling, and implications for web scraping"]

Agent: Here's my research report:

## EU Web Scraping Regulations - March 2026

### Key Findings
1. GDPR Article 6(1)(f) — legitimate interest basis...
2. Recent CJEU ruling in Case C-xxx/24...
3. ...

### Sources
- [GDPR.eu](https://gdpr.eu/...) — Retrieved March 2026
- [TechCrunch](https://techcrunch.com/...) — Retrieved March 2026
- [EUR-Lex](https://eur-lex.europa.eu/...) — Official text

Python Agent Implementation

If you prefer to build your own agent in code, here is a Python implementation using the AlterLab SDK directly:

Python

import alterlab
import json

client = alterlab.AlterLab(api_key="sk_live_your_key")

def research_agent(topic: str, max_sources: int = 5) -> dict:
    """Autonomous research agent that scrapes and synthesizes."""

    # Step 1: Scrape a search-oriented page for source discovery
    search_result = client.scrape(
        url=f"https://news.ycombinator.com/",
        formats=["markdown"],
    )

    # Step 2: Scrape each source and collect content
    sources = []
    for url in discovered_urls[:max_sources]:
        # Check cost first
        estimate = client.estimate(url=url)
        print(f"Estimated cost for {url}: {estimate.credits} credits")

        result = client.scrape(
            url=url,
            formats=["markdown"],
            advanced={"render_js": estimate.needs_js},
        )
        sources.append({
            "url": url,
            "content": result.markdown[:5000],  # Trim for context window
            "title": result.metadata.get("title", ""),
        })

    # Step 3: Extract structured data where applicable
    for source in sources:
        extraction = client.scrape(
            url=source["url"],
            formats=["json"],
            extraction_prompt=f"Extract key facts about {topic}",
        )
        source["structured_data"] = extraction.json_data

    return {
        "topic": topic,
        "source_count": len(sources),
        "sources": sources,
    }

Example: RAG Pipeline with AlterLab

Retrieval-Augmented Generation (RAG) combines web scraping with LLM reasoning. Use AlterLab to fetch fresh web content and feed it into your LLM as context for grounded, up-to-date answers.

Architecture

Query

User asks a question

Retrieve

Scrape relevant pages via AlterLab

Augment

Inject scraped content as LLM context

Generate

LLM answers with cited sources

MCP-Based RAG

The simplest RAG pipeline uses MCP directly — no code required. Just instruct your AI agent:

TEXT

When I ask a question, follow this process:

1. Identify 2-3 authoritative URLs that would answer the question
2. Use alterlab_scrape to fetch each URL as markdown
3. Read the scraped content carefully
4. Answer my question using ONLY information from the scraped pages
5. Cite each claim with the source URL

Always scrape fresh content — don't rely on your training data for
facts that may have changed.

Code-Based RAG Pipeline

Python

import alterlab
from openai import OpenAI

scraper = alterlab.AlterLab(api_key="sk_live_your_key")
llm = OpenAI()

def rag_answer(question: str, source_urls: list[str]) -> str:
    """Answer a question using fresh web content as context."""

    # Step 1: Scrape all source URLs
    context_parts = []
    for url in source_urls:
        result = scraper.scrape(
            url=url,
            formats=["markdown"],
        )
        context_parts.append(
            f"## Source: {url}\n\n{result.markdown[:3000]}"
        )

    context = "\n\n---\n\n".join(context_parts)

    # Step 2: Send to LLM with scraped context
    response = llm.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Answer questions using ONLY the provided context. "
                    "Cite sources with URLs. If the context doesn't "
                    "contain the answer, say so."
                ),
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}",
            },
        ],
    )

    return response.choices[0].message.content

# Usage
answer = rag_answer(
    question="What are the latest changes to robots.txt standards?",
    source_urls=[
        "https://developers.google.com/search/docs/crawling-indexing/robots-txt",
        "https://www.rfc-editor.org/rfc/rfc9309",
    ],
)
print(answer)

Best Practices

1. Use Markdown Format for LLM Context

Request formats: ["markdown"] when scraping for AI consumption. Markdown preserves structure (headings, lists, links) while being token-efficient compared to HTML.

2. Estimate Before Batch Scraping

Always call alterlab_estimate_cost on a sample of URLs before scraping hundreds of pages. This prevents unexpected credit consumption, especially for sites that require JavaScript rendering.

3. Trim Content for Context Windows

Scraped pages can be long. Truncate content to the first 3,000 to 5,000 characters per source, or use alterlab_extract with a focused schema to get only the data you need.

4. Use Sessions for Authenticated Content

For gated content, create a session with alterlab_create_session and pass the session_id to subsequent scrape calls. The session persists cookies across requests.

5. Enable JS Rendering Only When Needed

6. Cross-Reference Multiple Sources

For research agents, scrape at least 3 sources per claim. LLMs can hallucinate when given thin context. More sources means better fact-checking and higher confidence in the final output.

Monitoring Dashboard AI Research Agent

Last updated: June 2026