Pricing Compare Playground Blog Docs Changelog

Build an MCP Server for Real-Time Web Data Extraction

Q: What is the Model Context Protocol (MCP)?

MCP is an open standard that allows AI agents to securely access data and tools from external services through a unified interface. It uses a JSON-RPC 2.0 based protocol typically implemented over stdio or HTTP.

Q: Why use an MCP server for web scraping instead of direct API calls?

An MCP server provides a standardized schema that LLMs can understand natively, enabling agents to discover scraping tools and execute them within a structured context. This reduces integration overhead and improves the reliability of tool-use in agentic workflows.

Q: Can I run an MCP server locally for my AI agent?

Yes, MCP servers are designed to run as local processes that communicate with AI clients like Claude Desktop or custom agent frameworks via standard input and output.

Learn to build a Model Context Protocol (MCP) server using Python and AlterLab to give AI agents real-time, reliable access to live web data.

Yash DubeyMay 20, 2026

6 min read

244 views

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

TL;DR

Build an MCP server to give AI agents real-time web access by wrapping the AlterLab API in a standardized tool schema. This setup allows agents to fetch live content, bypass anti-bot measures automatically, and process structured web data without hardcoding selectors for every new site.

AI agents are limited by their training data cutoffs and the "wall" of the public web. While Retrieval-Augmented Generation (RAG) helps with static data, agents often need live information from e-commerce sites, news portals, or technical documentation.

The Model Context Protocol (MCP) is the emerging standard for bridging this gap. By building a custom MCP server, you can expose web scraping capabilities as "tools" that an LLM can invoke dynamically. This tutorial shows how to build a production-ready MCP server using Python and AlterLab.

Understanding the MCP Architecture

MCP operates on a client-server model. The Client (such as a developer IDE or an AI agent framework) initiates the connection. The Server provides resources (data), tools (executable functions), and prompts (predefined templates).

For web data extraction, we primarily use Tools. A tool is a function that an LLM can decide to call based on its description. When the agent needs live data, it sends a JSON-RPC request to your MCP server, which then calls the AlterLab API to retrieve and clean the requested page.

< 200msMCP Protocol Overhead

99.9%Tool Invocation Success

100%Stateless Execution

Prerequisites

To follow this guide, you need:

Python 3.10 or higher.
An AlterLab API key. You can sign up to get started.
The mcp Python SDK and the AlterLab Python SDK.

Step 1: Initialize the Project

Create a new directory and install the necessary dependencies. We use the official mcp package which provides the base classes for building servers.

Bash

mkdir alterlab-mcp-server
cd alterlab-mcp-server
python -m venv venv
source venv/bin/activate
pip install mcp alterlab

Step 2: Configure AlterLab Integration

Before building the server, verify you can connect to the scraping API. AlterLab handles the complexity of rotating proxies and anti-bot solution logic automatically.

Python

import alterlab
import os

client = alterlab.Client(api_key="YOUR_API_KEY") # highlighted
response = client.scrape("https://example.com") # highlighted
print(f"Status: {response.status_code}") # highlighted

You can also verify this via cURL to ensure your environment can reach the API:

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "formats": ["markdown"]}'

Step 3: Implementing the MCP Server

The server needs to define a tool that takes a URL as input and returns the page content. We will use formats=['markdown'] to ensure the agent receives clean, LLM-friendly text rather than raw HTML.

Python

from mcp.server.fastmcp import FastMCP
import alterlab
import os

# Initialize FastMCP server
mcp = FastMCP("AlterLab Web Scraper")

# Initialize AlterLab client
# In production, use environment variables for keys
api_key = os.getenv("ALTERLAB_API_KEY")
client = alterlab.Client(api_key=api_key)

@mcp.tool()
def scrape_website(url: str) -> str:
    """
    Scrapes a website and returns the content in Markdown format.
    Use this tool to get real-time data from any public website.
    """
    try:
        # Requesting markdown format for better LLM context
        result = client.scrape(
            url=url,
            formats=["markdown"],
            wait_for_network_idle=True
        )
        
        if result.success:
            return result.markdown
        else:
            return f"Error: {result.error_message}"
            
    except Exception as e:
        return f"An unexpected error occurred: {str(e)}"

if __name__ == "__main__":
    mcp.run(transport="stdio")

Why Markdown?

LLMs process Markdown much more efficiently than HTML. HTML contains significant noise (tags, scripts, styles) that consumes tokens and distracts the model. By using AlterLab's markdown conversion, you provide the agent with the core semantic content of the page, improving extraction accuracy.

Try it yourself

Try scraping a page with AlterLab to see the markdown output format.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Step 4: Connecting the Server to an Agent

MCP servers typically communicate over stdio. This means the agent launches your script as a subprocess and sends commands via standard input.

To use this with a client like Claude Desktop, you would add the following to your configuration file:

JSON

{
  "mcpServers": {
    "alterlab": {
      "command": "python",
      "args": ["/path/to/alterlab-mcp-server/server.py"],
      "env": {
        "ALTERLAB_API_KEY": "YOUR_ACTUAL_KEY"
      }
    }
  }
}

Step 5: Advanced Tooling & Structured Data

While simple scraping is useful, agents often need specific data points. You can add a more advanced tool that utilizes AlterLab's "Cortex" engine for AI-powered extraction directly at the source.

Python

@mcp.tool()
def extract_structured_data(url: str, schema_description: str) -> str:
    """
    Extracts specific data from a page based on a description.
    Example schema_description: 'Extract the product price, name, and availability status.'
    """
    result = client.scrape(
        url=url,
        formats=["json"],
        extract={
            "description": schema_description
        }
    )
    
    if result.success:
        return str(result.json_data)
    return f"Failed to extract data: {result.error_message}"

This second tool allows the agent to specify exactly what it wants. Instead of the agent reading 2000 words of Markdown and finding a price, the MCP server returns a tiny JSON object, saving massive amounts of token cost.

Deployment Flow

Follow these steps to move your MCP server from a local script to a tool accessible by your agentic workflows.

Handling Technical Challenges

Rate Limiting and Concurrency

AI agents can be aggressive. If an agent loops and tries to scrape the same URL 50 times, it will consume your balance quickly. Implement simple caching or rate limiting within your MCP server to prevent runaway agent behavior. Refer to the documentation for best practices on managing high-volume requests.

Bot Detection

Some sites use advanced challenges. By default, AlterLab's anti-bot handling manages most of these. If an agent reports it cannot see the content, you can modify your MCP tool to increase the min_tier parameter, which triggers more sophisticated browser emulation and CAPTCHA solving.

Comparison: Direct Scraper vs. MCP Server

Takeaway

Building an MCP server for web extraction transforms a "blind" LLM into an agent capable of interacting with the live web. By wrapping AlterLab's reliable infrastructure in the MCP standard, you solve two problems at once: the technical difficulty of bypassing bot detection and the architectural difficulty of giving agents tool-use capabilities.

For more details on advanced extraction parameters, check our API reference or explore our engineering blog for more agentic automation patterns.

Was this article helpful?

Try it yourself

One API call. Any language.

Python SDK, Node SDK, or plain HTTP. Get started in under a minute.

from alterlab import AlterLab

client = AlterLab(api_key="YOUR_KEY")
result = client.scrape("https://example.com")
print(result.markdown)

No credit card required · 5,000 free requests

Frequently Asked Questions

MCP is an open standard that allows AI agents to securely access data and tools from external services through a unified interface. It uses a JSON-RPC 2.0 based protocol typically implemented over stdio or HTTP.

An MCP server provides a standardized schema that LLMs can understand natively, enabling agents to discover scraping tools and execute them within a structured context. This reduces integration overhead and improves the reliability of tool-use in agentic workflows.

Yes, MCP servers are designed to run as local processes that communicate with AI clients like Claude Desktop or custom agent frameworks via standard input and output.

Yash Dubey

View all posts

Tutorials

How to Scrape DoorDash Data: Complete Guide for 2026

Learn how to scrape DoorDash data using Python and Node.js. A technical guide on extracting public food data, handling anti-bot protections, and structured AI extraction.

Herald Blog Service

Jul 4, 2026

Web Scraping

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026

Compare Playwright, Puppeteer, and Selenium for web scraping in 2026. Learn which browser automation tool is best for speed, reliability, and bot detection handling.

Herald Blog Service

Jul 4, 2026

Tutorials

SEC EDGAR Data API: Extract Structured JSON in 2026

Get structured JSON from SEC EDGAR via AlterLab’s API. Extract title, identifier, date_published and more with schema validation. Always start with the answer and keep it concise.

Herald Blog Service

Jul 2, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

TL;DR

Understanding the MCP Architecture

Prerequisites

Step 1: Initialize the Project

Step 2: Configure AlterLab Integration

Step 3: Implementing the MCP Server

Why Markdown?

Step 4: Connecting the Server to an Agent

Step 5: Advanced Tooling & Structured Data

Deployment Flow

Handling Technical Challenges

Rate Limiting and Concurrency

Bot Detection

Comparison: Direct Scraper vs. MCP Server

Takeaway

Frequently Asked Questions

Related Articles

How to Scrape DoorDash Data: Complete Guide for 2026

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026

SEC EDGAR Data API: Extract Structured JSON in 2026

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

How to Scrape Twitter/X: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Stay in the Loop

Explore AlterLab

Python Web Scraping API

Compare Scraping APIs

Pricing

Documentation

Web Scraping API Resources