
Reducing LLM Token Usage in RAG via Structured Extraction
Learn how to optimize RAG pipelines by converting raw HTML into clean Markdown and structured JSON to significantly reduce LLM token consumption and costs.
AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.
Try it freeTL;DR
To reduce LLM token usage in RAG pipelines, replace raw HTML with clean Markdown or structured JSON. This removes non-semantic noise like <script> and <div> tags, lowering costs and improving retrieval accuracy.
In Retrieval-Augmented Generation (RAG) workflows, the quality of your context is directly tied to the density of semantic information. Most developers make the mistake of feeding raw HTML directly into their embedding models or LLMs. This is inefficient. HTML is noisy, filled with boilerplate,-and heavily penalizes your token budget.
By implementing a transformation layer that converts web content into Markdown or structured JSON, you can achieve higher accuracy with significantly lower latency and cost.
The Problem: HTML Token Bloat
When you scrape a page and pass the source code to an LLM, you are paying for characters that carry zero semantic meaning. A single <div> nested deep within a complex layout can consume dozens of tokens.
Consider the following comparison:
- Raw HTML: Contains tags, attributes, scripts, and styles. Often 10x larger than the visible text.
- Markdown: Retains semantic structure (headers, lists, links) using minimal characters.
- JSON: Extracts only the specific data points required for your application.
Strategy 1: Markdown for Semantic Context
Markdown is the "goldilously" formatted language for LLMs. It preserves the hierarchy of a page (H1, H2, lists) which helps the model understand the relationship between different pieces of text, but it strips away the heavy lifting of HTML attributes.
If you are building a knowledge base where the LLM needs to understand the relationship between a heading and a paragraph, Markdown is your best choice.
You can automate this by using a Python web scraping API that handles the heavy lifting of-rendering JavaScript before you perform the conversion.
Implementation Example
Here is how you can fetch a page and prepare it for an LLM using a Python client.
import alterlab
import markdownify # Library to convert HTML to Markdown
client = alterlab.Client("YOUR_API_KEY")
# Fetch the page content
response = client.scrape("https://example-news-site.com/article")
html_content = response.text
# Convert to clean Markdown
md_content = markdownify.markdownify(html_content)
print(md_content[:500]) # View the first 500 charactersFor high-scale production environments, you should use an extraction tool that performs this conversion server-side to minimize local processing.
Strategy 2: Structured JSON for Targeted Extraction
When your RAG pipeline doesn'0 need the entire article—only specific data points like prices, product names, or dates—do not use Markdown. Use structured extraction.
Instead of asking an LLM to "Read this HTML and tell me the price," you should use an extraction engine to turn the HTML into a JSON object. This moves the complexity from the LLM to the scraping layer, which is significantly cheaper.
Automating Extraction with cURL
You can define your desired schema directly in your request. This ensures that what enters your database is already clean, structured, and token-optimized.
curl -X POST https://api.alterlab.io/v1/extract \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example-store.com/product/123",
"schema": {
"product_name": "string",
"price": "number",
"availability": "boolean"
}
}'By requesting JSON directly, you bypass the need for a separate "Cleanup LLM" pass. This single architectural change can reduce your LLM-related costs by 60-80%.
View the documentation and try a request
Comparing Approaches
To decide which method to use, consider your end-use case:
| Feature | Raw HTML | Markdown | Structured JSON | | :--- | :--- | :COMP_END_TABLE_ROW | | | Token Usage | Extremely High | Low | Minimal | | Semantic Value | High (but noisy) | High | Targeted | | LLM Latency | High | Low | Minimal | | Implementation | Easy | Moderate | Advanced |
When dealing with complex-dynamic sites, ensure your pipeline includes robust anti-bot handling to prevent scraping failures from breaking your RAG ingestion.
Summary of Best Practices
- Never embed raw HTML in prompts: It is a waste of money and increases the chance of hallucinations.
- Use Markdown for unstructured text: If the content is long-form (blogs, news), Markdown preserves the structure LLMs need.
- Prompting for JSON: For data-driven RAG (e.1. product catalogs), always extract via JSON schema.
- Pre-process before embedding: Clean your text (remove extra whitespace, boilerplate footers) before sending it to your embedding model.
For more advanced implementation details, check our [API documentation](https actually refer to our documentation) or read our recent posts on the AlterLab blog.
Hit reply if you have questions.
AlterLab // Web Data, Simplified.
Was this article helpful?
Frequently Asked Questions
Related Articles

How to Give Your AI Agent Access to Capterra Data
Learn how to equip your AI agent with structured Capterra data for software research pipelines using AlterLab's Extract API. Get clean JSON without parsing HTML.
Herald Blog Service

ESPN Data API: Extract Structured JSON in 2026
Learn how to extract structured JSON data from ESPN using AlterLab's Extract API. Get team, score, date, venue and competition data with schema-based validation.
Herald Blog Service

Capterra Data API: Extract Structured JSON in 2026
Learn how to build a robust data pipeline to get structured Capterra data via API. Use schema-based JSON extraction to pull reviews, ratings, and product info.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.