
Build an MCP Server for Real-Time Web Data Extraction
Learn to build a Model Context Protocol (MCP) server using Python and AlterLab to give AI agents real-time, reliable access to live web data.
May 20, 2026
TL;DR
Build an MCP server to give AI agents real-time web access by wrapping the AlterLab API in a standardized tool schema. This setup allows agents to fetch live content, bypass anti-bot measures automatically, and process structured web data without hardcoding selectors for every new site.
AI agents are limited by their training data cutoffs and the "wall" of the public web. While Retrieval-Augmented Generation (RAG) helps with static data, agents often need live information from e-commerce sites, news portals, or technical documentation.
The Model Context Protocol (MCP) is the emerging standard for bridging this gap. By building a custom MCP server, you can expose web scraping capabilities as "tools" that an LLM can invoke dynamically. This tutorial shows how to build a production-ready MCP server using Python and AlterLab.
Understanding the MCP Architecture
MCP operates on a client-server model. The Client (such as a developer IDE or an AI agent framework) initiates the connection. The Server provides resources (data), tools (executable functions), and prompts (predefined templates).
For web data extraction, we primarily use Tools. A tool is a function that an LLM can decide to call based on its description. When the agent needs live data, it sends a JSON-RPC request to your MCP server, which then calls the AlterLab API to retrieve and clean the requested page.
Prerequisites
To follow this guide, you need:
- Python 3.10 or higher.
- An AlterLab API key. You can sign up to get started.
- The
mcpPython SDK and the AlterLab Python SDK.
Step 1: Initialize the Project
Create a new directory and install the necessary dependencies. We use the official mcp package which provides the base classes for building servers.
mkdir alterlab-mcp-server
cd alterlab-mcp-server
python -m venv venv
source venv/bin/activate
pip install mcp alterlabStep 2: Configure AlterLab Integration
Before building the server, verify you can connect to the scraping API. AlterLab handles the complexity of rotating proxies and anti-bot solution logic automatically.
import alterlab
import os
client = alterlab.Client(api_key="YOUR_API_KEY") # highlighted
response = client.scrape("https://example.com") # highlighted
print(f"Status: {response.status_code}") # highlightedYou can also verify this via cURL to ensure your environment can reach the API:
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'Step 3: Implementing the MCP Server
The server needs to define a tool that takes a URL as input and returns the page content. We will use formats=['markdown'] to ensure the agent receives clean, LLM-friendly text rather than raw HTML.
from mcp.server.fastmcp import FastMCP
import alterlab
import os
# Initialize FastMCP server
mcp = FastMCP("AlterLab Web Scraper")
# Initialize AlterLab client
# In production, use environment variables for keys
api_key = os.getenv("ALTERLAB_API_KEY")
client = alterlab.Client(api_key=api_key)
@mcp.tool()
def scrape_website(url: str) -> str:
"""
Scrapes a website and returns the content in Markdown format.
Use this tool to get real-time data from any public website.
"""
try:
# Requesting markdown format for better LLM context
result = client.scrape(
url=url,
formats=["markdown"],
wait_for_network_idle=True
)
if result.success:
return result.markdown
else:
return f"Error: {result.error_message}"
except Exception as e:
return f"An unexpected error occurred: {str(e)}"
if __name__ == "__main__":
mcp.run(transport="stdio")Why Markdown?
LLMs process Markdown much more efficiently than HTML. HTML contains significant noise (tags, scripts, styles) that consumes tokens and distracts the model. By using AlterLab's markdown conversion, you provide the agent with the core semantic content of the page, improving extraction accuracy.
Try scraping a page with AlterLab to see the markdown output format.
Step 4: Connecting the Server to an Agent
MCP servers typically communicate over stdio. This means the agent launches your script as a subprocess and sends commands via standard input.
To use this with a client like Claude Desktop, you would add the following to your configuration file:
{
"mcpServers": {
"alterlab": {
"command": "python",
"args": ["/path/to/alterlab-mcp-server/server.py"],
"env": {
"ALTERLAB_API_KEY": "YOUR_ACTUAL_KEY"
}
}
}
}Step 5: Advanced Tooling & Structured Data
While simple scraping is useful, agents often need specific data points. You can add a more advanced tool that utilizes AlterLab's "Cortex" engine for AI-powered extraction directly at the source.
@mcp.tool()
def extract_structured_data(url: str, schema_description: str) -> str:
"""
Extracts specific data from a page based on a description.
Example schema_description: 'Extract the product price, name, and availability status.'
"""
result = client.scrape(
url=url,
formats=["json"],
extract={
"description": schema_description
}
)
if result.success:
return str(result.json_data)
return f"Failed to extract data: {result.error_message}"This second tool allows the agent to specify exactly what it wants. Instead of the agent reading 2000 words of Markdown and finding a price, the MCP server returns a tiny JSON object, saving massive amounts of token cost.
Deployment Flow
Follow these steps to move your MCP server from a local script to a tool accessible by your agentic workflows.
Handling Technical Challenges
Rate Limiting and Concurrency
AI agents can be aggressive. If an agent loops and tries to scrape the same URL 50 times, it will consume your balance quickly. Implement simple caching or rate limiting within your MCP server to prevent runaway agent behavior. Refer to the documentation for best practices on managing high-volume requests.
Bot Detection
Some sites use advanced challenges. By default, AlterLab's anti-bot handling manages most of these. If an agent reports it cannot see the content, you can modify your MCP tool to increase the min_tier parameter, which triggers more sophisticated browser emulation and CAPTCHA solving.
Comparison: Direct Scraper vs. MCP Server
Takeaway
Building an MCP server for web extraction transforms a "blind" LLM into an agent capable of interacting with the live web. By wrapping AlterLab's reliable infrastructure in the MCP standard, you solve two problems at once: the technical difficulty of bypassing bot detection and the architectural difficulty of giving agents tool-use capabilities.
For more details on advanced extraction parameters, check our API reference or explore our engineering blog for more agentic automation patterns.
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape Amazon in 2026: Engineering Guide

Selenium Bot Detection: Why You Get Flagged and How to Fix It

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Indeed: Complete Guide for 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

