Pricing Compare Playground Blog Docs Changelog

How to Scrape Zillow: Complete Guide for 2026

Learn how to scrape Zillow property listings with Python in 2026. Beat Cloudflare protection, handle JS rendering, and extract real estate data at scale.

Yash DubeyMarch 28, 2026

9 min read

182 views

Zillow blocks most scrapers within seconds. It runs Cloudflare's Enterprise Bot Management, renders all listing data client-side via React, and fingerprints TLS connections to identify non-browser clients. Standard tooling—requests, basic Selenium, unpatched Playwright—fails before the first listing loads.

This guide covers everything you need to extract property listings, prices, and details from Zillow reliably in 2026: what protections you're dealing with, how to bypass them, where the data actually lives in the page, and how to scale to thousands of requests without hitting rate limits.

Why Scrape Zillow?

Three high-value use cases drive most Zillow scraping pipelines:

Real estate price monitoring. Track listing prices, days on market, and price reductions across specific ZIP codes or neighborhoods. Feed this into dashboards or alerting systems that fire when a property hits a target price point or reduces by more than a threshold percentage.

Lead generation for agents and investors. Pull new listings as they appear, including seller context, listing agent details, and price history. Build automated CRM workflows or outreach pipelines that act on fresh inventory before it gets competitive.

Market research and academic analysis. Zillow covers over 100 million US properties with historical price data, Zestimate valuations, and tax records. This dataset underpins housing market studies, investment underwriting models, and economic research that would otherwise require expensive licensed data feeds.

100M+Homes on Zillow

99.2%Success Rate on Zillow

1.2sAvg Response Time

50+Proxy Countries Available

Anti-Bot Challenges on zillow.com

Understanding the protection stack is necessary before writing a single line of scraping code.

Cloudflare Enterprise Bot Management. Every request passes through Cloudflare's bot score evaluation. Suspicious clients—those with mismatched TLS fingerprints, missing browser APIs, or mechanical request timing—receive JavaScript challenges or managed CAPTCHAs. This happens before any Zillow application code runs.

TLS and HTTP/2 fingerprinting. Cloudflare inspects the TLS handshake: cipher suite ordering, extension presence and order, ALPN negotiation values. Python's requests library (backed by urllib3) produces a fingerprint that differs measurably from Chrome or Firefox. Cloudflare maintains fingerprint databases and blocks known non-browser patterns.

JavaScript-rendered content. Zillow's search and detail pages are Next.js applications. The raw HTML from a basic HTTP fetch contains scaffolding and metadata but virtually no listing data. The actual property information is either embedded in a <script id="__NEXT_DATA__"> tag after JS execution or injected into the DOM during React hydration. You need a real browser context to get populated HTML.

Behavioral fingerprinting. Request velocity, scroll events, mouse movement patterns, and time-between-clicks are analyzed. Pipelines that hit pages too fast or with perfectly uniform intervals trigger soft blocks—you'll see 429 responses or silently empty result sets.

IP reputation. Datacenter IP ranges are blocked at the edge. Residential or ISP proxies, rotated per-request or per-session, are required for consistent access.

Building this stack yourself—custom TLS fingerprints, maintained residential proxy pools, behavioral simulation, and Cloudflare rule updates—is a months-long engineering project with ongoing maintenance overhead. The AlterLab anti-bot bypass API handles all of it transparently, including headless browser execution on demand.

Quick Start with AlterLab API

Install the SDK and make your first Zillow request in under two minutes. Full environment setup is in the getting started guide.

Bash

pip install alterlab beautifulsoup4

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")

response = client.scrape(
    "https://www.zillow.com/homes/for_sale/Seattle-WA/",
    render_js=True,
    country="us"
)

print(response.status_code)   # 200
print(len(response.text))     # ~800KB rendered HTML

The render_js=True parameter routes the request through a headless browser that executes JavaScript and waits for the React application to hydrate before returning HTML. This is required for every Zillow page—search results and detail pages alike. country="us" ensures a US residential proxy is used; Zillow geo-blocks non-US IPs at the application layer.

For cURL:

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.zillow.com/homes/for_sale/Seattle-WA/",
    "render_js": true,
    "country": "us"
  }'

The response body is the fully rendered HTML. Status 200 with populated __NEXT_DATA__ means you have usable listing data. Status 403 or an empty listResults array usually indicates a session issue or incorrect country routing.

Extracting Structured Data

Zillow embeds all listing and property data in a <script id="__NEXT_DATA__"> tag. Parsing this JSON is more reliable than targeting CSS selectors, which change with every React component update.

Search Results Pages

Python

import alterlab
import json
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")

def get_zillow_listings(search_url: str) -> list[dict]:
    response = client.scrape(search_url, render_js=True, country="us")
    soup = BeautifulSoup(response.text, "html.parser")

    next_data_tag = soup.find("script", {"id": "__NEXT_DATA__"})
    if not next_data_tag:
        raise ValueError("__NEXT_DATA__ not found — JS may not have rendered")

    next_data = json.loads(next_data_tag.string)

    # Path for search result pages as of March 2026
    search_results = (
        next_data
        .get("props", {})
        .get("pageProps", {})
        .get("searchPageState", {})
        .get("cat1", {})
        .get("searchResults", {})
        .get("listResults", [])
    )

    listings = []
    for result in search_results:
        listings.append({
            "zpid":          result.get("zpid"),
            "address":       result.get("address"),
            "price":         result.get("price"),
            "beds":          result.get("beds"),
            "baths":         result.get("baths"),
            "area_sqft":     result.get("area"),
            "status":        result.get("statusType"),
            "days_on_zillow": result.get("daysOnZillow"),
            "detail_url":    result.get("detailUrl"),
            "latitude":      result.get("latLong", {}).get("latitude"),
            "longitude":     result.get("latLong", {}).get("longitude"),
        })

    return listings

listings = get_zillow_listings("https://www.zillow.com/homes/for_sale/Seattle-WA/")
print(f"Found {len(listings)} listings")
print(json.dumps(listings[0], indent=2))

Property Detail Pages

The detail page JSON uses a different path via gdpClientCache:

Python

import alterlab
import json
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")

def get_property_detail(detail_url: str) -> dict:
    response = client.scrape(detail_url, render_js=True, country="us")
    soup = BeautifulSoup(response.text, "html.parser")

    next_data = json.loads(
        soup.find("script", {"id": "__NEXT_DATA__"}).string
    )

    # gdpClientCache is keyed by a composite ID; grab the first value
    gdp_cache = (
        next_data
        .get("props", {})
        .get("pageProps", {})
        .get("componentProps", {})
        .get("gdpClientCache", {})
    )
    property_data = next(iter(gdp_cache.values()), {}).get("property", {})

    return {
        "zpid":          property_data.get("zpid"),
        "address":       property_data.get("streetAddress"),
        "city":          property_data.get("city"),
        "state":         property_data.get("state"),
        "zip":           property_data.get("zipcode"),
        "price":         property_data.get("price"),
        "home_type":     property_data.get("homeType"),
        "year_built":    property_data.get("yearBuilt"),
        "lot_size":      property_data.get("lotSize"),
        "zestimate":     property_data.get("zestimate"),
        "tax_history":   property_data.get("taxHistory", []),
        "price_history": property_data.get("priceHistory", []),
        "description":   property_data.get("description"),
    }

detail = get_property_detail(
    "https://www.zillow.com/homedetails/123-Main-St-Seattle-WA-98101/12345678_zpid/"
)
print(json.dumps(detail, indent=2))

Common Pitfalls

`__NEXT_DATA__` path changes

Zillow ships frontend updates frequently. The JSON path from props.pageProps down to listResults or gdpClientCache can change without notice. The paths in this guide are accurate as of March 2026, but you should build defensive traversal rather than chaining raw .get() calls:

Python

from typing import Any

def safe_get(data: dict, *keys: str, default: Any = None) -> Any:
    """Traverse a nested dict without raising KeyError."""
    for key in keys:
        if not isinstance(data, dict):
            return default
        data = data.get(key, default)
        if data is None:
            return default
    return data

# Resilient path access
listings = safe_get(
    next_data,
    "props", "pageProps", "searchPageState",
    "cat1", "searchResults", "listResults",
    default=[]
)

if not listings:
    # Log the full structure to diagnose path changes
    import logging
    logging.warning("Empty listResults — dumping keys: %s", list(next_data.keys()))

Logging the top-level keys when results are empty is the fastest way to identify a path change after a Zillow frontend deployment.

Pagination and cursor encoding

Zillow returns 20 listings per search page and uses searchQueryState URL parameters for pagination. Manually constructing page 2+ URLs requires modifying the pagination key in that parameter:

Python

import json
import urllib.parse

def build_page_url(base_url: str, page: int) -> str:
    parsed = urllib.parse.urlparse(base_url)
    params = urllib.parse.parse_qs(parsed.query)

    state = json.loads(params.get("searchQueryState", ["{}"])[0])
    state["pagination"] = {"currentPage": page}

    new_query = urllib.parse.urlencode(
        {"searchQueryState": json.dumps(state, separators=(",", ":"))},
        quote_via=urllib.parse.quote
    )
    return urllib.parse.urlunparse(parsed._replace(query=new_query))

page_3_url = build_page_url(
    "https://www.zillow.com/homes/for_sale/Seattle-WA/?searchQueryState=%7B%22pagination%22%3A%7B%7D%7D",
    page=3
)

Rate limiting and empty result sets

Zillow doesn't always return an obvious 429 when rate-limiting. Instead, listResults silently returns an empty array. If you're getting valid HTML with __NEXT_DATA__ present but listResults: [], slow your request rate—1 to 3 seconds between search page requests is a safe baseline. Per-request proxy rotation (the default) handles IP-level limits; the inter-request delay handles session-level behavioral analysis.

Scaling Up

Async batch processing

For large pipelines, use bounded async concurrency rather than sequential requests:

Python

import alterlab
import asyncio
import json
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")

async def scrape_detail(url: str) -> dict | None:
    try:
        response = await client.scrape_async(url, render_js=True, country="us")
        soup = BeautifulSoup(response.text, "html.parser")
        tag = soup.find("script", {"id": "__NEXT_DATA__"})
        return json.loads(tag.string) if tag else None
    except Exception as exc:
        print(f"Failed {url}: {exc}")
        return None

async def scrape_batch(urls: list[str], concurrency: int = 5) -> list[dict]:
    sem = asyncio.Semaphore(concurrency)

    async def bounded(url: str):
        async with sem:
            return await scrape_detail(url)

    results = await asyncio.gather(*[bounded(u) for u in urls])
    return [r for r in results if r is not None]

# Run
detail_urls = [
    "https://www.zillow.com/homedetails/...",
    # ... up to thousands of URLs
]
results = asyncio.run(scrape_batch(detail_urls, concurrency=5))
print(f"Successfully scraped {len(results)}/{len(detail_urls)}")

Keep concurrency at 3–5 for Zillow. Higher values don't improve throughput meaningfully and increase the probability of triggering behavioral rate limits even with proxy rotation.

Incremental updates for price monitoring

Re-scraping every listing on every run is expensive and unnecessary. Use daysOnZillow and priceHistory to build an incremental update strategy:

Python

from datetime import datetime, timezone

def needs_rescrape(zpid: str, last_scraped_at: datetime, status: str) -> bool:
    age_hours = (datetime.now(timezone.utc) - last_scraped_at).total_seconds() / 3600
    # Active listings: check daily. Off-market: check weekly.
    threshold = 24 if status in ("FOR_SALE", "FOR_RENT") else 168
    return age_hours >= threshold

def extract_price_change(stored_history: list, fresh_history: list) -> dict | None:
    if not fresh_history or not stored_history:
        return None
    latest = fresh_history[0]
    previous = stored_history[0]
    if latest.get("price") != previous.get("price"):
        return {
            "from": previous.get("price"),
            "to":   latest.get("price"),
            "date": latest.get("date"),
            "event": latest.get("event"),
        }
    return None

Store the raw __NEXT_DATA__ JSON blob alongside your normalized records. When Zillow's JSON schema changes, you can re-parse historical raw payloads without re-hitting the site.

Cost planning

Zillow requires headless browser requests for every page type, which is priced higher than standard fetches. A typical real estate monitoring pipeline looks like:

Discovery pass: ~500 search result pages (20 listings each = 10,000 listings) per metro area
Detail enrichment: 10,000 detail page requests for full property data
Daily delta: ~200–400 requests for price change detection on active inventory

The search-then-detail pattern—collect ZPIDs from search pages, then scrape only the detail pages that match your filter criteria—is the most cost-efficient approach. See AlterLab's pricing page for current per-request rates and volume discount tiers.

Try it yourself

Try scraping a Zillow search results page with AlterLab — see the raw __NEXT_DATA__ JSON in seconds

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.zillow.com/homes/for_sale/Seattle-WA/"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Key Takeaways

requests and basic headless Chromium both fail. Zillow's Cloudflare layer blocks non-browser TLS fingerprints before serving any content. You need proper fingerprint spoofing, residential proxies, and JS execution—not just a user-agent header.
Parse __NEXT_DATA__, not the DOM. The embedded JSON is structured, complete, and far more stable than CSS class selectors on a rapidly-deployed React frontend. Use safe_get wrappers and log raw payloads on empty results.
Always pass country="us". Non-US IPs get geo-blocked at the application layer, returning a redirect or an empty state rather than listing data.
Keep async concurrency at 3–5. Higher concurrency doesn't meaningfully improve throughput and risks triggering behavioral rate limits even with per-request proxy rotation.
Store raw JSON alongside normalized records. Schema paths in __NEXT_DATA__ change with Zillow deployments. Raw payload storage lets you re-parse without re-scraping.

Scraping other real estate platforms or e-commerce sites? These guides cover the same techniques for adjacent targets:

Was this article helpful?

Try it yourself

Skip the proxy management overhead

AlterLab handles proxy rotation, browser environments, and challenge resolution for you.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Scraping publicly accessible Zillow data is generally protected under the Ninth Circuit's ruling in hiQ v. LinkedIn, which upheld the legality of scraping public web pages. Zillow's Terms of Service prohibit automated access, so violations can result in account termination or legal notices even absent criminal liability. Avoid scraping behind authentication, don't republish raw data commercially without reviewing Zillow's data licensing terms, and consult legal counsel for production deployments.

Zillow uses Cloudflare's Enterprise Bot Management, which inspects TLS fingerprints, JavaScript execution context, and behavioral signals—standard Python requests or basic headless Chromium are blocked within seconds. AlterLab's anti-bot bypass API handles TLS fingerprint spoofing, residential proxy rotation, and full JS rendering transparently, achieving a 99.2% success rate on Zillow without any manual fingerprint maintenance on your end.

Zillow requires headless browser rendering for all search and detail pages, which carries a higher per-request rate than standard fetches. A pipeline scraping 10,000 Zillow detail pages daily runs at 10,000 headless requests per day; using the search-then-detail pattern reduces this significantly by batching 20 listings per search page request. See AlterLab's pricing page for current rates and volume tiers—batch pricing substantially lowers the per-request cost for high-volume pipelines.

Yash Dubey

View all posts

Tutorials

Target Data API: Extract Structured JSON in 2026

Learn how to extract structured JSON data from Target using AlterLab's Target Data API. Skip HTML parsing and get typed e-commerce data instantly.

Herald Blog Service

Jun 26, 2026

Tutorials

GitHub Data API: Extract Structured JSON in 2026

Learn how to get structured GitHub data via API using AlterLab's Extract API for reliable JSON extraction of public repo info.

Herald Blog Service

Jun 26, 2026

Tutorials

How to Scrape Expedia Data: Complete Guide for 2026

Learn how to scrape Expedia travel data using Python and AlterLab's API in 2026, handling JavaScript, anti-bot measures, and extracting structured hotel & flight info.

Herald Blog Service

Jun 26, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Why Scrape Zillow?

Anti-Bot Challenges on zillow.com

Quick Start with AlterLab API

Extracting Structured Data

Search Results Pages

Property Detail Pages

Common Pitfalls

__NEXT_DATA__ path changes

Pagination and cursor encoding

Rate limiting and empty result sets

Scaling Up

Async batch processing

Incremental updates for price monitoring

Cost planning

Key Takeaways

Related Guides

Frequently Asked Questions

Related Articles

Target Data API: Extract Structured JSON in 2026

GitHub Data API: Extract Structured JSON in 2026

How to Scrape Expedia Data: Complete Guide for 2026

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X: Complete Guide for 2026

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources

`__NEXT_DATA__` path changes