Beginner4 steps

How to Extract Data from a Website Without an API

Most websites don't expose their data through a formal API. When you need structured data from a site that has no API, web scraping is the standard approach — fetching the publicly visible pages and extracting the data programmatically.

Step-by-Step Guide

Inspect the page for data location

Open the page in your browser and use DevTools (F12) to inspect the HTML around the data you want. Identify CSS selectors, data attributes, or JSON-LD blocks that contain the data.

Check for embedded JSON first

Many sites embed structured data in script tags (JSON-LD) or in a window.__INITIAL_STATE__ JavaScript variable. These are easier to parse than raw HTML and more stable under redesigns.

Fetch the page and parse

Use AlterLab to fetch the page HTML, then parse it with BeautifulSoup or Cheerio to extract the fields you identified in step 1.

Validate your extraction

Compare a sample of extracted values against the source page to verify accuracy. Check that your selectors work across multiple pages, not just the one you inspected.

Code Example

Python

import requests
import json
import re
from bs4 import BeautifulSoup

def extract_embedded_json(url: str, api_key: str) -> dict | None:
    response = requests.post(
        "https://alterlab.io/api/v1/scrape",
        headers={"X-API-Key": api_key, "Content-Type": "application/json"},
        json={"url": url, "render_js": True},
    )
    html = response.json().get("html", "")
    soup = BeautifulSoup(html, "html.parser")

    # Try JSON-LD first
    for script in soup.find_all("script", type="application/ld+json"):
        try:
            return json.loads(script.string or "")
        except json.JSONDecodeError:
            pass

    # Try window.__INITIAL_STATE__ or similar
    match = re.search(r'window.__DATA__s*=s*({.+?});', html, re.DOTALL)
    if match:
        return json.loads(match.group(1))

    return None

Replace YOUR_API_KEY with your key from the . No credit card required.

Try this yourself with AlterLab

Run this tutorial on live websites with AlterLab's API. Free tier includes 5,000 requests — no credit card required.

View API docs

Frequently Asked Questions

Is it always legal to scrape a website that has no API?

The legality of scraping publicly accessible information varies by jurisdiction and depends on the site's terms of service, the nature of the data, and how it is used. Always review the target site's terms of service and applicable data protection regulations.

Responsible Use

AlterLab is designed for extracting publicly available data. Always review the terms of service for any website you access, respect robots.txt directives, and ensure your use case complies with applicable laws in your jurisdiction.

Your first scrape.
Sixty seconds.

$1 free credit — up to 5,000 scrapes. No credit card.
Just a POST request.

terminal

curl -X POST https://api.alterlab.io/v1/scrape \

-H "X-API-Key: YOUR_KEY" \

-H "Content-Type: application/json" \

-d '{"url": "https://example.com", "formats": ["markdown"]}'

Start building free

No credit card required · $1 free credit, up to 5,000 scrapes · Balance never expires

How to Extract Data from a Website Without an API

Step-by-Step Guide

Inspect the page for data location

Check for embedded JSON first

Fetch the page and parse

Validate your extraction

Code Example

Try this yourself with AlterLab

Frequently Asked Questions

Is it always legal to scrape a website that has no API?

Is it always legal to scrape a website that has no API?

Responsible Use

More tutorials

Your first scrape. Sixty seconds.

Your first scrape.
Sixty seconds.