How to Extract Data from a Website Without an API
Most websites don't expose their data through a formal API. When you need structured data from a site that has no API, web scraping is the standard approach — fetching the publicly visible pages and extracting the data programmatically.
Step-by-Step Guide
Inspect the page for data location
Open the page in your browser and use DevTools (F12) to inspect the HTML around the data you want. Identify CSS selectors, data attributes, or JSON-LD blocks that contain the data.
Check for embedded JSON first
Many sites embed structured data in script tags (JSON-LD) or in a window.__INITIAL_STATE__ JavaScript variable. These are easier to parse than raw HTML and more stable under redesigns.
Fetch the page and parse
Use AlterLab to fetch the page HTML, then parse it with BeautifulSoup or Cheerio to extract the fields you identified in step 1.
Validate your extraction
Compare a sample of extracted values against the source page to verify accuracy. Check that your selectors work across multiple pages, not just the one you inspected.
Code Example
import requests
import json
import re
from bs4 import BeautifulSoup
def extract_embedded_json(url: str, api_key: str) -> dict | None:
response = requests.post(
"https://alterlab.io/api/v1/scrape",
headers={"X-API-Key": api_key, "Content-Type": "application/json"},
json={"url": url, "render_js": True},
)
html = response.json().get("html", "")
soup = BeautifulSoup(html, "html.parser")
# Try JSON-LD first
for script in soup.find_all("script", type="application/ld+json"):
try:
return json.loads(script.string or "")
except json.JSONDecodeError:
pass
# Try window.__INITIAL_STATE__ or similar
match = re.search(r'window.__DATA__s*=s*({.+?});', html, re.DOTALL)
if match:
return json.loads(match.group(1))
return NoneReplace YOUR_API_KEY with your key from the dashboard. No credit card required.
Ready to try it?
Run this tutorial on live websites with AlterLab's API. Start free — no credit card required.
Frequently Asked Questions
Responsible Use
AlterLab is designed for extracting publicly available data. Always review the terms of service for any website you access, respect robots.txt directives, and ensure your use case complies with applicable laws in your jurisdiction.
More tutorials
Browse all how-to guides for web scraping — from beginner extractions to advanced multi-page pipelines.
Your first scrape.
Sixty seconds.
$1 free balance. No credit card. No SDK.
Just a POST request.
No credit card required · Up to 5,000 free scrapes · Balance never expire