Beginner4 steps

How to Extract Data from a Website Without an API

Most websites don't expose their data through a formal API. When you need structured data from a site that has no API, web scraping is the standard approach — fetching the publicly visible pages and extracting the data programmatically.

Step-by-Step Guide

1

Inspect the page for data location

Open the page in your browser and use DevTools (F12) to inspect the HTML around the data you want. Identify CSS selectors, data attributes, or JSON-LD blocks that contain the data.

2

Check for embedded JSON first

Many sites embed structured data in script tags (JSON-LD) or in a window.__INITIAL_STATE__ JavaScript variable. These are easier to parse than raw HTML and more stable under redesigns.

3

Fetch the page and parse

Use AlterLab to fetch the page HTML, then parse it with BeautifulSoup or Cheerio to extract the fields you identified in step 1.

4

Validate your extraction

Compare a sample of extracted values against the source page to verify accuracy. Check that your selectors work across multiple pages, not just the one you inspected.

Code Example

import requests
import json
import re
from bs4 import BeautifulSoup

def extract_embedded_json(url: str, api_key: str) -> dict | None:
    response = requests.post(
        "https://alterlab.io/api/v1/scrape",
        headers={"X-API-Key": api_key, "Content-Type": "application/json"},
        json={"url": url, "render_js": True},
    )
    html = response.json().get("html", "")
    soup = BeautifulSoup(html, "html.parser")

    # Try JSON-LD first
    for script in soup.find_all("script", type="application/ld+json"):
        try:
            return json.loads(script.string or "")
        except json.JSONDecodeError:
            pass

    # Try window.__INITIAL_STATE__ or similar
    match = re.search(r'window.__DATA__s*=s*({.+?});', html, re.DOTALL)
    if match:
        return json.loads(match.group(1))

    return None

Replace YOUR_API_KEY with your key from the dashboard. No credit card required.

Ready to try it?

Run this tutorial on live websites with AlterLab's API. Start free — no credit card required.

Frequently Asked Questions

Responsible Use

AlterLab is designed for extracting publicly available data. Always review the terms of service for any website you access, respect robots.txt directives, and ensure your use case complies with applicable laws in your jurisdiction.

More tutorials

Browse all how-to guides for web scraping — from beginner extractions to advanced multi-page pipelines.

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expire

    How to Extract Data from a Website Without an API | AlterLab | AlterLab