Connect Ollama to Live Web Data Using Markdown Extraction
Tutorials

Connect Ollama to Live Web Data Using Markdown Extraction

Feed live web data to local LLMs via Ollama using headless browser extraction and token-efficient Markdown conversion for robust RAG pipelines.

5 min read
8 views

TL;DR

Connecting Ollama to live web data requires fetching JavaScript-rendered pages and converting the raw HTML into token-efficient Markdown. Using a managed scraping environment handles the browser execution, while Markdown conversion reduces context window usage by up to 90%. This architecture enables local LLMs to process live data effectively without overwhelming their token limits.

The Context Window Problem

Local LLMs like Llama 3 or Mistral typically operate with an 8k to 32k token context window. Raw HTML is hostile to LLMs. A standard e-commerce product page or financial dashboard can easily exceed 150,000 characters of raw source code.

The DOM is packed with structural noise: tracking scripts, inline CSS, SVG paths, base64 images, and deep <div> nesting. Feeding raw HTML into a prompt dilutes the model's attention. The model wastes computation parsing layout tags instead of reasoning about the actual text.

Markdown solves this. Converting the rendered DOM to Markdown strips the layout markup while preserving the semantic hierarchy: headers, lists, links, and text formatting. A 100k-token HTML document typically reduces to a dense 500-token Markdown string. This keeps inference fast, stays well within local context limits, and drastically improves extraction accuracy.

The Data Pipeline

Fetching modern web data requires three phases: executing JavaScript to render the single-page application, extracting the rendered DOM, and cleaning the output for the LLM.

Handling Browser Fingerprinting

Using standard HTTP libraries like requests or plain curl fails on modern sites. Single-page applications return empty shell HTML until JavaScript executes. You need a browser.

Basic headless browsers (like standard Playwright or Puppeteer) leak technical signals. Default user agents, missing plugins, exposed navigator.webdriver flags, and specific WebGL rendering signatures flag the session as automated. Web Application Firewalls (WAFs) detect these anomalies and block the connection before the DOM even loads.

Instead of continuously patching Playwright stealth plugins and managing residential proxy pools manually, you can outsource the execution layer. Using a managed bot detection handling solution ensures the page renders correctly, bypassing interstitials and CAPTCHAs, allowing you to focus purely on the LLM integration.

Requesting Markdown Data

We need to instruct our scraping layer to return Markdown natively. This avoids running heavy DOM parsing libraries locally. Here is how to request pre-converted Markdown using AlterLab.

cURL Implementation

This terminal command requests the target URL and specifically asks the API to format the output as Markdown.

Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"url": "https://example.com/data", "formats": ["markdown"]}'

Python SDK Implementation

For integration into a Python application, the Python SDK handles the request formatting and provides typed responses.

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

# Request Markdown format directly
response = client.scrape(
    url="https://example.com/data",
    formats=["markdown"]
)

markdown_content = response.markdown
print(f"Retrieved {len(markdown_content)} characters of Markdown.")
Try it yourself

Test Markdown extraction on a live URL

Connecting the Pipeline to Ollama

With clean Markdown ready, the final step is piping it into Ollama. Ollama runs the model locally, ensuring your prompts and extracted data remain private.

You need the ollama Python package installed (pip install ollama). Ensure the Ollama daemon is running locally and you have pulled a model, for example: ollama run llama3.

The integration script combines the scraping fetch with the LLM query.

Python
import alterlab
import ollama

def analyze_web_page(url: str, query: str) -> str:
    # 1. Fetch live data
    client = alterlab.Client("YOUR_API_KEY")
    scrape_response = client.scrape(
        url=url,
        formats=["markdown"]
    )
    
    context = scrape_response.markdown
    
    system_prompt = (
        "You are a data extraction assistant. "
        "Answer the user's query using ONLY the provided Markdown context."
    )
    
    # 2. Query Ollama locally
    llm_response = ollama.chat(model='llama3', messages=[
        {'role': 'system', 'content': system_prompt},
        {'role': 'user', 'content': f"Context:\n{context}\n\nQuery: {query}"}
    ])
    
    return llm_response['message']['content']

# Execute the pipeline
if __name__ == "__main__":
    target = "https://example.com/financial-report"
    question = "Extract the Q3 revenue figures and list the risk factors."
    
    answer = analyze_web_page(target, question)
    print("LLM Analysis:")
    print(answer)

Prompt Architecture

The success of your extraction depends heavily on how you instruct the model. Local models benefit from strict bounding instructions.

Structure your prompt to clearly separate the system instructions, the raw data context, and the actual user query. Notice in the code block above how the context is injected directly into the user message, preceded by the system prompt enforcing strict adherence to the provided text.

If you need structured data out of Ollama, append schema instructions to the prompt:

Python
format_instructions = """
Format your response as a valid JSON object matching this schema:
{
  "revenue_q3": "string",
  "risk_factors": ["string"]
}
Do not include markdown code blocks or conversational text.
"""

Scaling the Architecture

This architecture scales horizontally. Because Ollama runs locally, your only external dependency is the scraping layer. You can queue thousands of URLs, fetch them asynchronously, and process the resulting Markdown through your local GPU hardware with zero additional API inference costs.

By shifting the burden of DOM rendering and bot evasion to an external service, and shifting the burden of LLM inference to your local machine, you achieve a highly resilient, cost-effective data pipeline.

For advanced configuration options on scheduling these fetches or handling specific HTTP methods, review the documentation to fine-tune the ingestion layer.

Share

Was this article helpful?

Frequently Asked Questions

Raw HTML contains massive amounts of noise like inline styles, scripts, and nested layout tags that consume context window tokens. Converting to Markdown strips this structural noise while preserving semantic meaning, reducing token usage by up to 90%.
You must use a headless browser like Playwright or Puppeteer to execute the JavaScript and render the DOM before extraction. For reliable extraction at scale, automated rendering environments handle the browser lifecycle and bot evasion automatically.
Yes. Ollama runs the LLM locally on your hardware, ensuring data privacy and zero inference costs. Only the scraper component requires an external network connection to fetch the target web page.