Pricing Compare Playground Blog Docs Changelog

How to Reduce LLM Token Consumption in RAG Pipelines Using Markdown and Clean JSON

Learn practical techniques to cut token usage in Retrieval-Augmented Generation pipelines by requesting Markdown-formatted, clean JSON outputs from AlterLab's scraping API.

Herald Blog ServiceJune 28, 2026

4 min read

52 views

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

TL;DR

Use AlterLab's scraping API to request pages in Markdown format wrapped in clean JSON. This strips HTML boilerplate, reduces prompt size, and lowers LLM token consumption in RAG pipelines by up to 70%.

Introduction

Large language models charge per token, and raw HTML is notoriously inefficient. A typical product page can exceed 50 KB of markup, most of which is irrelevant to the semantic content you need for retrieval or generation. By asking for Markdown and receiving it inside a minimal JSON envelope, you keep only the meaningful structure—headings, lists, code blocks—while discarding tags, attributes, and scripts. The result is smaller prompts, faster inference, and lower cost.

AlterLab’s API supports arbitrary output formats via the formats parameter. When you specify ['markdown'] you get a JSON payload whose markdown field holds the converted text. This approach works for any publicly accessible page and requires no post‑processing HTML parsers.

Why Token Consumption Matters in RAG

In a Retrieval‑Augmented Generation pipeline, each retrieved document is inserted into the LLM context window. If documents are bloated with HTML, you waste tokens on:

Tag angle brackets (<, >)
CSS classes and IDs
JavaScript snippets
Whitespace and comments

These tokens do not contribute to understanding but still count toward limits and cost. Reducing the token count per document lets you either:

Fit more relevant passages into the context window, improving answer quality
Lower the number of API calls to the LLM provider
Decrease latency because less data must be transferred and processed

The Problem with Raw HTML

Consider a typical e‑commerce listing page. Raw HTML might look like:

HTML

<div class="product-card" data-id="123">
  <h2 class="title">Widget Pro</h2>
  <span class="price">$29.99</span>
  <div class="description">...</div>
</div>

Only the text “Widget Pro”, “$29.99”, and the description are useful. The surrounding markup adds dozens of tokens per item. When you retrieve dozens or hundreds of such items, the overhead compounds.

Using Markdown for LLM-Friendly Context

Markdown converts semantic HTML into plain text with lightweight syntax:

Code

# Widget Pro
**Price:** $29.99
Description: ...

The same information now occupies far fewer characters. Crucially, Markdown retains hierarchy (headings, lists, blockquotes) which LLMs understand document structure without the noise of tags.

Leveraging Clean JSON Outputs

Asking AlterLab for Markdown alone would return a plain text body, which forces you to parse the format yourself. Instead, request both Markdown and JSON:

Code

formats=['markdown','json']

The API returns:

JSON

{
  "markdown": "# Widget Pro\n**Price:** $29.99\nDescription: ...",
  "url": "https://example.com/product/123",
  "status": 200
}

Now you have a predictable JSON envelope and a ready‑to‑use Markdown string. You can feed the markdown field directly into your embedding model or LLM prompt.

Practical Example: Scraping Product Listings

Below is a complete workflow that fetches a product listing page, requests Markdown output, and prepares the text for a RAG pipeline.

Step Flow Infographic

TryIt Block

Try it yourself

Try scraping this page with AlterLab

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/product"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Code Examples

The following snippets show the same request in Python (using the official SDK) and in cURL. Both ask for Markdown and JSON formats.

Python

import alterlab
import json

# Initialize client with your API key
client = alterlab.Client("YOUR_API_KEY")  # highlighted

# Scrape a URL and request markdown + json
response = client.scrape(
    url="https://example.com/product",
    formats=["markdown", "json"]          # highlighted
)

# The response object already parses JSON
data = response.json
markdown_text = data["markdown"]          # highlighted

print("Markdown preview:")
print(markdown_text[:200])

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -d '{
        "url": "https://example.com/product",
        "formats": ["markdown", "json"]
      }'                                 # highlighted

Both examples return a JSON body where the markdown key holds the cleaned, LLM‑ready text.

Best Practices for Token Optimization

Always request the minimal format set – If you only need Markdown, do not ask for html or png unless required.
Trim whitespace – Some Markdown converters add trailing newlines; strip them before embedding.
Chunk wisely – Split large Markdown documents at heading boundaries to keep each chunk under your embedding model’s token limit.
Cache results – Store the Markdown string; re‑scraping only when the source URL changes (use AlterLab’s monitoring feature to detect updates).
Monitor usage – Check the tokens_used field in AlterLab’s response (if enabled) to see exactly how much data you transferred.

Takeaway

By switching from raw HTML to Markdown delivered in clean JSON, you remove meaningless markup from your LLM prompts. This simple change cuts token usage, reduces cost, and lets you fit more relevant context into each generation request. Implement the pattern shown above—request formats=["markdown","json"], extract the markdown field, and feed it straight into your RAG pipeline. You’ll see immediate savings on every scrape.

Was this article helpful?

Try it yourself

Feed your AI pipeline with fresh web data

AlterLab returns clean Markdown from any URL — ready to chunk, embed, and store in your vector DB. One API call, no parsing.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://docs.example.com/page", "output": "markdown"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Markdown strips unnecessary HTML tags and whitespace, delivering only semantic structure and content. This yields shorter, more meaningful prompts for LLMs.

Clean JSON provides predictable fields and eliminates boilerplate, letting you extract exactly the data you need without paying tokens for markup or scripts.

Yes, AlterLab lets you request multiple formats; you receive a JSON object containing a Markdown string, giving you both structure and readability.

Herald Blog Service

View all posts

Tutorials

How to Scrape DoorDash Data: Complete Guide for 2026

Learn how to scrape DoorDash data using Python and Node.js. A technical guide on extracting public food data, handling anti-bot protections, and structured AI extraction.

Herald Blog Service

Jul 4, 2026

Web Scraping

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026

Compare Playwright, Puppeteer, and Selenium for web scraping in 2026. Learn which browser automation tool is best for speed, reliability, and bot detection handling.

Herald Blog Service

Jul 4, 2026

Tutorials

SEC EDGAR Data API: Extract Structured JSON in 2026

Get structured JSON from SEC EDGAR via AlterLab’s API. Extract title, identifier, date_published and more with schema validation. Always start with the answer and keep it concise.

Herald Blog Service

Jul 2, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

TL;DR

Introduction

Why Token Consumption Matters in RAG

The Problem with Raw HTML

Using Markdown for LLM-Friendly Context

Leveraging Clean JSON Outputs

Practical Example: Scraping Product Listings

Step Flow Infographic

TryIt Block

Code Examples

Best Practices for Token Optimization

Takeaway

Frequently Asked Questions

Related Articles

How to Scrape DoorDash Data: Complete Guide for 2026

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026

SEC EDGAR Data API: Extract Structured JSON in 2026

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

How to Scrape Twitter/X: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Stay in the Loop

Explore AlterLab

Python Web Scraping API

Compare Scraping APIs

Pricing

Documentation

Web Scraping API Resources