ai-agent

LLM (Large Language Model)

A Large Language Model is a neural network trained on vast text corpora that can generate, summarise, translate, and reason over natural language at human level.

Large Language Models (LLMs) are transformer-based neural networks with billions of parameters, trained by predicting the next token in massive text datasets. The training process imparts broad world knowledge, language understanding, and reasoning capability. Leading models include GPT-4o (OpenAI), Claude 4 (Anthropic), Gemini 2 (Google), and open-weight models like Llama 3 and Mistral.

LLMs can be prompted to perform a wide range of tasks without fine-tuning: summarisation, translation, question answering, code generation, classification, and data extraction. In the context of web scraping, LLMs are used to extract structured fields from messy HTML, classify scraped content, summarise long articles, and generate queries for subsequent scraping steps.

Because LLMs have a finite context window (the amount of text they can process in one call), large scraped documents must be chunked before being passed to the model. The cost of LLM inference per token makes it important to pre-filter content and pass only the relevant sections rather than entire page HTML.

Examples

# Use an LLM to extract structured data from scraped HTML
import anthropic, json

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": f"Extract product name, price, and SKU from:\n{html_snippet}\nReturn JSON only."}]
)
data = json.loads(response.content[0].text)

Related Terms

Extract LLM (Large Language Model) data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    LLM (Large Language Model) — Web Scraping Glossary | AlterLab