Pricing Compare Playground Blog Docs Changelog

How to Scrape Realtor.com: Complete Guide for 2026

Step-by-step guide to scraping Realtor.com in 2026. Extract property listings, prices, and agent data with Python while bypassing anti-bot protections at scale.

Yash DubeyMarch 29, 2026

8 min read

133 views

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

Realtor.com publishes MLS data refreshed every 15 minutes across 100 million+ active and historical listings. It's one of the most comprehensive real estate data sources publicly accessible — and one of the more aggressively protected ones.

This guide covers how to scrape Realtor.com reliably in 2026: what anti-bot protections you'll hit, how to extract structured listing data, how to handle pagination without losing session state, and how to scale to bulk collection without burning through retries.

Why Scrape Realtor.com?

Three use cases drive the majority of Realtor.com scraping projects:

Price monitoring and market analysis. Tracking median list prices, price reductions, and days-on-market across ZIP codes or metro areas. Fintech companies building Automated Valuation Models (AVMs) and hedge funds building housing supply indexes are the main consumers here — the 15-minute MLS refresh cadence makes Realtor.com one of the freshest public data sources available.

Lead generation for agents and lenders. New listings, FSBO properties, and recently reduced inventory all appear on Realtor.com before aggregating elsewhere. Pulling listing agent contact data alongside property details feeds CRMs and outreach pipelines for real estate teams and mortgage brokers.

Competitive and academic research. iBuyers, proptech companies, and academic researchers track neighborhood price trajectories, inventory levels, and absorption rates at scale. Realtor.com's geographic coverage and data depth make it the preferred source over smaller regional MLS portals.

Anti-Bot Challenges on Realtor.com

Realtor.com is a Next.js application backed by active bot detection. Here's what you'll hit:

JavaScript fingerprinting. The site evaluates browser environment signals on load — canvas fingerprint, WebGL renderer string, navigator properties, and timing anomalies. A plain requests.get() call returns a 403 or silent redirect within seconds. Even many headless browser setups get caught if the browser profile isn't properly configured.

TLS fingerprinting. Realtor.com's edge infrastructure inspects the TLS ClientHello before any HTTP-level logic runs. Python's default ssl module produces a fingerprint that diverges from Chrome's in measurable ways, making requests trivially identifiable at the connection layer.

IP-based rate limiting. Search and listing endpoints aggressively block datacenter IP ranges. Residential proxies are required; even with those, high-frequency requests from a single IP trigger rate limits within minutes.

Session cookie requirements. Several data-heavy endpoints require cookies established by a prior JavaScript page load. Without a valid session, paginated results return empty arrays or redirect to the homepage.

Building reliable bypass for all of this from scratch — patching TLS fingerprints, sourcing residential proxy pools, managing headless browser profiles — takes weeks and requires constant maintenance. AlterLab's Anti-Bot Bypass API handles the entire stack transparently.

99.2%Success Rate on Realtor.com

1.4sAvg Response Time

50M+Monthly Requests Processed

0Proxy Infrastructure to Manage

Quick Start with AlterLab API

Install the SDK:

Bash

pip install alterlab beautifulsoup4

The getting started guide covers API key generation and environment setup if you're starting from scratch.

Scrape a Realtor.com search page

Python

import alterlab
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")

response = client.scrape(
    "https://www.realtor.com/realestateandhomes-search/Austin_TX",
    render_js=True,       # required — Realtor.com requires JS execution
    country="us",         # route through US residential proxies
    premium_proxy=True,   # residential pool (datacenter IPs get blocked)
)

soup = BeautifulSoup(response.text, "html.parser")
print(f"Status: {response.status_code}")
print(soup.title.string)

The render_js=True flag provisions a headless Chromium instance with a properly fingerprinted browser environment and valid TLS profile. No local browser setup required.

cURL equivalent

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.realtor.com/realestateandhomes-search/Austin_TX",
    "render_js": true,
    "country": "us",
    "premium_proxy": true
  }'

Try it yourself

Try scraping a Realtor.com search results page live with AlterLab

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://www.realtor.com/realestateandhomes-search/Austin_TX"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Extracting Structured Data

Because Realtor.com is a Next.js application, it embeds its full data payload inside a <script id="__NEXT_DATA__"> tag on every page. This is far more reliable than scraping rendered HTML — the JSON structure is stable across UI redesigns and doesn't depend on CSS class names that change frequently.

Method 1: Parse `__NEXT_DATA__` (recommended for production)

Python

import alterlab
import json
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")


def extract_listings(search_url: str) -> list[dict]:
    response = client.scrape(search_url, render_js=True, country="us", premium_proxy=True)

    soup = BeautifulSoup(response.text, "html.parser")
    next_data_tag = soup.find("script", {"id": "__NEXT_DATA__"})

    if not next_data_tag:
        raise ValueError("__NEXT_DATA__ not found — page may not have fully rendered")

    data = json.loads(next_data_tag.string)

    # Path varies slightly by page type; search results use this structure
    properties = (
        data.get("props", {})
            .get("pageProps", {})
            .get("properties", [])
    )

    results = []
    for prop in properties:
        location = prop.get("location", {}).get("address", {})
        description = prop.get("description", {})
        agent = prop.get("advertisers", [{}])[0]

        results.append({
            "listing_id":   prop.get("property_id"),
            "address":      location.get("line"),
            "city":         location.get("city"),
            "state":        location.get("state_code"),
            "zip":          location.get("postal_code"),
            "price":        prop.get("list_price"),
            "beds":         description.get("beds"),
            "baths":        description.get("baths_consolidated"),
            "sqft":         description.get("sqft"),
            "status":       prop.get("status"),
            "list_date":    prop.get("list_date"),
            "agent_name":   agent.get("name"),
            "agent_phone":  agent.get("phones", [{}])[0].get("number"),
        })

    return results


listings = extract_listings(
    "https://www.realtor.com/realestateandhomes-search/Austin_TX"
)
for listing in listings[:3]:
    print(listing)

Method 2: CSS selectors for property cards

If the __NEXT_DATA__ structure shifts (it does occasionally after major deployments), DOM selectors are a useful fallback:

Python

from bs4 import BeautifulSoup


def parse_property_cards(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select('[data-testid="property-card"]')

    results = []
    for card in cards:
        price_el   = card.select_one('[data-testid="card-price"]')
        address_el = card.select_one('[data-testid="card-address-1"]')
        city_el    = card.select_one('[data-testid="card-address-2"]')
        meta_els   = card.select('[data-testid^="property-meta-"]')

        results.append({
            "price":   price_el.get_text(strip=True) if price_el else None,
            "address": address_el.get_text(strip=True) if address_el else None,
            "city":    city_el.get_text(strip=True) if city_el else None,
            "meta":    [m.get_text(strip=True) for m in meta_els],
        })

    return results

Selector reference (as of early 2026):

Data Point	Selector
Property card container	`[data-testid="property-card"]`
List price	`[data-testid="card-price"]`
Street address	`[data-testid="card-address-1"]`
City / state / zip	`[data-testid="card-address-2"]`
Beds	`[data-testid="property-meta-beds"]`
Baths	`[data-testid="property-meta-baths"]`
Square footage	`[data-testid="property-meta-sqft"]`
Listing type badge	`[data-testid="card-description"]`

Realtor.com rotates data-testid attributes periodically. For pipelines that need to run unattended, the __NEXT_DATA__ path is the right choice.

Common Pitfalls

Pagination breaks without session continuity

Realtor.com paginates search results via URL suffixes (/pg-2, /pg-3), but pages beyond the first often require cookies set during the initial page load. Naive parallel fetches across pages will return empty results or redirect responses starting around page 3.

Maintain a session across requests using the session_id parameter:

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")


def scrape_all_pages(base_url: str, max_pages: int = 10) -> list[dict]:
    session_id = client.new_session(country="us", premium_proxy=True)
    all_results = []

    for page in range(1, max_pages + 1):
        url = f"{base_url}/pg-{page}" if page > 1 else base_url
        response = client.scrape(url, session_id=session_id, render_js=True)

        if response.status_code != 200:
            break

        # reuse extract_listings logic from earlier
        listings = extract_listings_from_html(response.text)
        if not listings:
            break  # past last page

        all_results.extend(listings)

    return all_results

Lazy-loaded images and deferred JS execution

Property images are lazy-loaded. If your pipeline needs image URLs, pass a wait_for selector to block until images resolve before the HTML snapshot is captured:

Python

response = client.scrape(
    url,
    render_js=True,
    wait_for='[data-testid="card-img-container"] img[src]',
)

Overly aggressive concurrency

Even with residential proxy rotation, flooding search endpoints triggers server-side rate limiting that persists across proxy IPs. Keep concurrent requests in the 5–10 range and use the SDK's rate_limit parameter to enable automatic exponential backoff on 429 responses.

Scaling Up

Batch requests across cities

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")

target_cities = [
    "https://www.realtor.com/realestateandhomes-search/Austin_TX",
    "https://www.realtor.com/realestateandhomes-search/Denver_CO",
    "https://www.realtor.com/realestateandhomes-search/Phoenix_AZ",
    "https://www.realtor.com/realestateandhomes-search/Nashville_TN",
    "https://www.realtor.com/realestateandhomes-search/Charlotte_NC",
    "https://www.realtor.com/realestateandhomes-search/Seattle_WA",
]

results = client.batch_scrape(
    urls=target_cities,
    render_js=True,
    country="us",
    premium_proxy=True,
    concurrency=5,
)

for result in results:
    if result.status_code == 200:
        listings = extract_listings_from_html(result.text)
        print(f"{result.url} → {len(listings)} listings")
    else:
        print(f"{result.url} → failed ({result.status_code})")

Cost planning

JS-rendered requests consume more credits than static fetches — account for this in your credit budget. At 50,000 requests/month (a reasonable baseline for monitoring 500 ZIP codes daily), you're within the standard Growth tier on AlterLab's pricing page. For pipelines exceeding 1M monthly requests, dedicated residential proxy pools on an enterprise plan substantially reduce per-request cost and improve throughput consistency.

Key Takeaways

requests will not work. Realtor.com uses JavaScript fingerprinting, TLS inspection, and IP-based rate limiting. You need a headless browser with proper browser fingerprinting and residential proxies.
Target __NEXT_DATA__ first. The embedded Next.js JSON payload is structured, stable, and doesn't break on UI redesigns. CSS selectors are a useful fallback, not a primary strategy.
Use sessions for pagination. Fetching pages 2+ without a valid session cookie returns empty results. Pass a session_id to maintain state across the full result set.
Throttle concurrency. 5–10 concurrent requests with automatic backoff on 429s is the right operating envelope for Realtor.com endpoints.
Schedule incrementally. MLS data refreshes every 15 minutes, but full re-scrapes are wasteful. Daily cycles for most use cases; 4-hour intervals for real-time price dashboards.

Scraping other real estate or high-volume e-commerce platforms? These guides follow the same pattern with site-specific details:

Was this article helpful?

Try it yourself

Skip the proxy management overhead

AlterLab handles proxy rotation, browser environments, and challenge resolution for you.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Scraping publicly visible data from Realtor.com — such as listing prices, addresses, and property attributes — generally falls within the scope of public data collection. However, Realtor.com's Terms of Use prohibit automated access, so you should consult legal counsel before building commercial products on this data, avoid scraping personal contact information at scale, and respect robots.txt and rate limits.

Realtor.com uses JavaScript fingerprinting, TLS inspection, and IP-based rate limiting that blocks naive scrapers almost immediately. AlterLab's [Anti-Bot Bypass API](/anti-bot-bypass-api) handles all of this automatically — it routes requests through a properly fingerprinted headless browser over rotating residential proxies, so you get the rendered HTML without managing any of the bypass infrastructure yourself.

Cost depends primarily on request volume and whether you need JavaScript rendering. JS-rendered requests (required for Realtor.com) consume more credits than static fetches. At 50,000 requests/month — enough to monitor 500 ZIP codes daily — you're within AlterLab's Growth tier. See the [pricing page](/pricing) for current tier limits and credit rates; enterprise plans with dedicated residential pools are available for 1M+ monthly requests.

Yash Dubey

View all posts

Tutorials

How to Scrape DoorDash Data: Complete Guide for 2026

Learn how to scrape DoorDash data using Python and Node.js. A technical guide on extracting public food data, handling anti-bot protections, and structured AI extraction.

Herald Blog Service

Jul 4, 2026

Web Scraping

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026

Compare Playwright, Puppeteer, and Selenium for web scraping in 2026. Learn which browser automation tool is best for speed, reliability, and bot detection handling.

Herald Blog Service

Jul 4, 2026

Tutorials

SEC EDGAR Data API: Extract Structured JSON in 2026

Get structured JSON from SEC EDGAR via AlterLab’s API. Extract title, identifier, date_published and more with schema validation. Always start with the answer and keep it concise.

Herald Blog Service

Jul 2, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Why Scrape Realtor.com?

Anti-Bot Challenges on Realtor.com

Quick Start with AlterLab API

Scrape a Realtor.com search page

cURL equivalent

Extracting Structured Data

Method 1: Parse __NEXT_DATA__ (recommended for production)

Method 2: CSS selectors for property cards

Common Pitfalls

Pagination breaks without session continuity

Lazy-loaded images and deferred JS execution

Overly aggressive concurrency

Scaling Up

Batch requests across cities

Cost planning

Key Takeaways

Related Guides

Frequently Asked Questions

Related Articles

How to Scrape DoorDash Data: Complete Guide for 2026

Playwright vs. Puppeteer vs. Selenium for Scraping in 2026

SEC EDGAR Data API: Extract Structured JSON in 2026

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

How to Scrape Twitter/X: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources

Method 1: Parse `__NEXT_DATA__` (recommended for production)