How to Reduce LLM Token Consumption in RAG Pipelines Using Markdown and Clean JSON
Tutorials

How to Reduce LLM Token Consumption in RAG Pipelines Using Markdown and Clean JSON

Learn practical techniques to cut token usage in Retrieval-Augmented Generation pipelines by requesting Markdown-formatted, clean JSON outputs from AlterLab's scraping API.

4 min read
52 views

AlterLab handles this automaticallyscrape any URL with one API call. No infrastructure required.

Try it free

TL;DR

Use AlterLab's scraping API to request pages in Markdown format wrapped in clean JSON. This strips HTML boilerplate, reduces prompt size, and lowers LLM token consumption in RAG pipelines by up to 70%.

Introduction

Large language models charge per token, and raw HTML is notoriously inefficient. A typical product page can exceed 50 KB of markup, most of which is irrelevant to the semantic content you need for retrieval or generation. By asking for Markdown and receiving it inside a minimal JSON envelope, you keep only the meaningful structure—headings, lists, code blocks—while discarding tags, attributes, and scripts. The result is smaller prompts, faster inference, and lower cost.

AlterLab’s API supports arbitrary output formats via the formats parameter. When you specify ['markdown'] you get a JSON payload whose markdown field holds the converted text. This approach works for any publicly accessible page and requires no post‑processing HTML parsers.

Why Token Consumption Matters in RAG

In a Retrieval‑Augmented Generation pipeline, each retrieved document is inserted into the LLM context window. If documents are bloated with HTML, you waste tokens on:

  • Tag angle brackets (<, >)
  • CSS classes and IDs
  • JavaScript snippets
  • Whitespace and comments

These tokens do not contribute to understanding but still count toward limits and cost. Reducing the token count per document lets you either:

  • Fit more relevant passages into the context window, improving answer quality
  • Lower the number of API calls to the LLM provider
  • Decrease latency because less data must be transferred and processed

The Problem with Raw HTML

Consider a typical e‑commerce listing page. Raw HTML might look like:

HTML
<div class="product-card" data-id="123">
  <h2 class="title">Widget Pro</h2>
  <span class="price">$29.99</span>
  <div class="description">...</div>
</div>

Only the text “Widget Pro”, “$29.99”, and the description are useful. The surrounding markup adds dozens of tokens per item. When you retrieve dozens or hundreds of such items, the overhead compounds.

Using Markdown for LLM-Friendly Context

Markdown converts semantic HTML into plain text with lightweight syntax:

Code
# Widget Pro
**Price:** $29.99
Description: ...

The same information now occupies far fewer characters. Crucially, Markdown retains hierarchy (headings, lists, blockquotes) which LLMs understand document structure without the noise of tags.

Leveraging Clean JSON Outputs

Asking AlterLab for Markdown alone would return a plain text body, which forces you to parse the format yourself. Instead, request both Markdown and JSON:

Code
formats=['markdown','json']

The API returns:

JSON
{
  "markdown": "# Widget Pro\n**Price:** $29.99\nDescription: ...",
  "url": "https://example.com/product/123",
  "status": 200
}

Now you have a predictable JSON envelope and a ready‑to‑use Markdown string. You can feed the markdown field directly into your embedding model or LLM prompt.

Practical Example: Scraping Product Listings

Below is a complete workflow that fetches a product listing page, requests Markdown output, and prepares the text for a RAG pipeline.

Step Flow Infographic

TryIt Block

Try it yourself

Try scraping this page with AlterLab

Code Examples

The following snippets show the same request in Python (using the official SDK) and in cURL. Both ask for Markdown and JSON formats.

Python
import alterlab
import json

# Initialize client with your API key
client = alterlab.Client("YOUR_API_KEY")  # highlighted

# Scrape a URL and request markdown + json
response = client.scrape(
    url="https://example.com/product",
    formats=["markdown", "json"]          # highlighted
)

# The response object already parses JSON
data = response.json
markdown_text = data["markdown"]          # highlighted

print("Markdown preview:")
print(markdown_text[:200])
Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
        "url": "https://example.com/product",
        "formats": ["markdown", "json"]
      }'                                 # highlighted

Both examples return a JSON body where the markdown key holds the cleaned, LLM‑ready text.

Best Practices for Token Optimization

  1. Always request the minimal format set – If you only need Markdown, do not ask for html or png unless required.
  2. Trim whitespace – Some Markdown converters add trailing newlines; strip them before embedding.
  3. Chunk wisely – Split large Markdown documents at heading boundaries to keep each chunk under your embedding model’s token limit.
  4. Cache results – Store the Markdown string; re‑scraping only when the source URL changes (use AlterLab’s monitoring feature to detect updates).
  5. Monitor usage – Check the tokens_used field in AlterLab’s response (if enabled) to see exactly how much data you transferred.

Takeaway

By switching from raw HTML to Markdown delivered in clean JSON, you remove meaningless markup from your LLM prompts. This simple change cuts token usage, reduces cost, and lets you fit more relevant context into each generation request. Implement the pattern shown above—request formats=["markdown","json"], extract the markdown field, and feed it straight into your RAG pipeline. You’ll see immediate savings on every scrape.

Share

Was this article helpful?

Frequently Asked Questions

Markdown strips unnecessary HTML tags and whitespace, delivering only semantic structure and content. This yields shorter, more meaningful prompts for LLMs.
Clean JSON provides predictable fields and eliminates boilerplate, letting you extract exactly the data you need without paying tokens for markup or scripts.
Yes, AlterLab lets you request multiple formats; you receive a JSON object containing a Markdown string, giving you both structure and readability.