
How to Reduce LLM Token Consumption in RAG Pipelines Using Markdown and Clean JSON
Learn practical techniques to cut token usage in Retrieval-Augmented Generation pipelines by requesting Markdown-formatted, clean JSON outputs from AlterLab's scraping API.
AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.
Try it freeTL;DR
Use AlterLab's scraping API to request pages in Markdown format wrapped in clean JSON. This strips HTML boilerplate, reduces prompt size, and lowers LLM token consumption in RAG pipelines by up to 70%.
Introduction
Large language models charge per token, and raw HTML is notoriously inefficient. A typical product page can exceed 50 KB of markup, most of which is irrelevant to the semantic content you need for retrieval or generation. By asking for Markdown and receiving it inside a minimal JSON envelope, you keep only the meaningful structure—headings, lists, code blocks—while discarding tags, attributes, and scripts. The result is smaller prompts, faster inference, and lower cost.
AlterLab’s API supports arbitrary output formats via the formats parameter. When you specify ['markdown'] you get a JSON payload whose markdown field holds the converted text. This approach works for any publicly accessible page and requires no post‑processing HTML parsers.
Why Token Consumption Matters in RAG
In a Retrieval‑Augmented Generation pipeline, each retrieved document is inserted into the LLM context window. If documents are bloated with HTML, you waste tokens on:
- Tag angle brackets (
<,>) - CSS classes and IDs
- JavaScript snippets
- Whitespace and comments
These tokens do not contribute to understanding but still count toward limits and cost. Reducing the token count per document lets you either:
- Fit more relevant passages into the context window, improving answer quality
- Lower the number of API calls to the LLM provider
- Decrease latency because less data must be transferred and processed
The Problem with Raw HTML
Consider a typical e‑commerce listing page. Raw HTML might look like:
<div class="product-card" data-id="123">
<h2 class="title">Widget Pro</h2>
<span class="price">$29.99</span>
<div class="description">...</div>
</div>Only the text “Widget Pro”, “$29.99”, and the description are useful. The surrounding markup adds dozens of tokens per item. When you retrieve dozens or hundreds of such items, the overhead compounds.
Using Markdown for LLM-Friendly Context
Markdown converts semantic HTML into plain text with lightweight syntax:
# Widget Pro
**Price:** $29.99
Description: ...The same information now occupies far fewer characters. Crucially, Markdown retains hierarchy (headings, lists, blockquotes) which LLMs understand document structure without the noise of tags.
Leveraging Clean JSON Outputs
Asking AlterLab for Markdown alone would return a plain text body, which forces you to parse the format yourself. Instead, request both Markdown and JSON:
formats=['markdown','json']The API returns:
{
"markdown": "# Widget Pro\n**Price:** $29.99\nDescription: ...",
"url": "https://example.com/product/123",
"status": 200
}Now you have a predictable JSON envelope and a ready‑to‑use Markdown string. You can feed the markdown field directly into your embedding model or LLM prompt.
Practical Example: Scraping Product Listings
Below is a complete workflow that fetches a product listing page, requests Markdown output, and prepares the text for a RAG pipeline.
Step Flow Infographic
TryIt Block
Try scraping this page with AlterLab
Code Examples
The following snippets show the same request in Python (using the official SDK) and in cURL. Both ask for Markdown and JSON formats.
import alterlab
import json
# Initialize client with your API key
client = alterlab.Client("YOUR_API_KEY") # highlighted
# Scrape a URL and request markdown + json
response = client.scrape(
url="https://example.com/product",
formats=["markdown", "json"] # highlighted
)
# The response object already parses JSON
data = response.json
markdown_text = data["markdown"] # highlighted
print("Markdown preview:")
print(markdown_text[:200])curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-d '{
"url": "https://example.com/product",
"formats": ["markdown", "json"]
}' # highlightedBoth examples return a JSON body where the markdown key holds the cleaned, LLM‑ready text.
Best Practices for Token Optimization
- Always request the minimal format set – If you only need Markdown, do not ask for
htmlorpngunless required. - Trim whitespace – Some Markdown converters add trailing newlines; strip them before embedding.
- Chunk wisely – Split large Markdown documents at heading boundaries to keep each chunk under your embedding model’s token limit.
- Cache results – Store the Markdown string; re‑scraping only when the source URL changes (use AlterLab’s monitoring feature to detect updates).
- Monitor usage – Check the
tokens_usedfield in AlterLab’s response (if enabled) to see exactly how much data you transferred.
Takeaway
By switching from raw HTML to Markdown delivered in clean JSON, you remove meaningless markup from your LLM prompts. This simple change cuts token usage, reduces cost, and lets you fit more relevant context into each generation request. Implement the pattern shown above—request formats=["markdown","json"], extract the markdown field, and feed it straight into your RAG pipeline. You’ll see immediate savings on every scrape.
Was this article helpful?
Frequently Asked Questions
Related Articles

How to Scrape DoorDash Data: Complete Guide for 2026
Learn how to scrape DoorDash data using Python and Node.js. A technical guide on extracting public food data, handling anti-bot protections, and structured AI extraction.
Herald Blog Service

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026
Compare Playwright, Puppeteer, and Selenium for web scraping in 2026. Learn which browser automation tool is best for speed, reliability, and bot detection handling.
Herald Blog Service
SEC EDGAR Data API: Extract Structured JSON in 2026
Get structured JSON from SEC EDGAR via AlterLab’s API. Extract title, identifier, date_published and more with schema validation. Always start with the answer and keep it concise.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.