AlterLabAlterLab
How to Scrape Realtor.com: Complete Guide for 2026
Tutorials

How to Scrape Realtor.com: Complete Guide for 2026

Step-by-step guide to scraping Realtor.com in 2026. Extract property listings, prices, and agent data with Python while bypassing anti-bot protections at scale.

Yash Dubey
Yash Dubey

March 29, 2026

8 min read
3 views

Realtor.com publishes MLS data refreshed every 15 minutes across 100 million+ active and historical listings. It's one of the most comprehensive real estate data sources publicly accessible — and one of the more aggressively protected ones.

This guide covers how to scrape Realtor.com reliably in 2026: what anti-bot protections you'll hit, how to extract structured listing data, how to handle pagination without losing session state, and how to scale to bulk collection without burning through retries.

Why Scrape Realtor.com?

Three use cases drive the majority of Realtor.com scraping projects:

Price monitoring and market analysis. Tracking median list prices, price reductions, and days-on-market across ZIP codes or metro areas. Fintech companies building Automated Valuation Models (AVMs) and hedge funds building housing supply indexes are the main consumers here — the 15-minute MLS refresh cadence makes Realtor.com one of the freshest public data sources available.

Lead generation for agents and lenders. New listings, FSBO properties, and recently reduced inventory all appear on Realtor.com before aggregating elsewhere. Pulling listing agent contact data alongside property details feeds CRMs and outreach pipelines for real estate teams and mortgage brokers.

Competitive and academic research. iBuyers, proptech companies, and academic researchers track neighborhood price trajectories, inventory levels, and absorption rates at scale. Realtor.com's geographic coverage and data depth make it the preferred source over smaller regional MLS portals.

Anti-Bot Challenges on Realtor.com

Realtor.com is a Next.js application backed by active bot detection. Here's what you'll hit:

JavaScript fingerprinting. The site evaluates browser environment signals on load — canvas fingerprint, WebGL renderer string, navigator properties, and timing anomalies. A plain requests.get() call returns a 403 or silent redirect within seconds. Even many headless browser setups get caught if the browser profile isn't properly configured.

TLS fingerprinting. Realtor.com's edge infrastructure inspects the TLS ClientHello before any HTTP-level logic runs. Python's default ssl module produces a fingerprint that diverges from Chrome's in measurable ways, making requests trivially identifiable at the connection layer.

IP-based rate limiting. Search and listing endpoints aggressively block datacenter IP ranges. Residential proxies are required; even with those, high-frequency requests from a single IP trigger rate limits within minutes.

Session cookie requirements. Several data-heavy endpoints require cookies established by a prior JavaScript page load. Without a valid session, paginated results return empty arrays or redirect to the homepage.

Building reliable bypass for all of this from scratch — patching TLS fingerprints, sourcing residential proxy pools, managing headless browser profiles — takes weeks and requires constant maintenance. AlterLab's Anti-Bot Bypass API handles the entire stack transparently.

99.2%Success Rate on Realtor.com
1.4sAvg Response Time
50M+Monthly Requests Processed
0Proxy Infrastructure to Manage

Quick Start with AlterLab API

Install the SDK:

Bash
pip install alterlab beautifulsoup4

The getting started guide covers API key generation and environment setup if you're starting from scratch.

Scrape a Realtor.com search page

Python
import alterlab
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")

response = client.scrape(
    "https://www.realtor.com/realestateandhomes-search/Austin_TX",
    render_js=True,       # required — Realtor.com requires JS execution
    country="us",         # route through US residential proxies
    premium_proxy=True,   # residential pool (datacenter IPs get blocked)
)

soup = BeautifulSoup(response.text, "html.parser")
print(f"Status: {response.status_code}")
print(soup.title.string)

The render_js=True flag provisions a headless Chromium instance with a properly fingerprinted browser environment and valid TLS profile. No local browser setup required.

cURL equivalent

Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.realtor.com/realestateandhomes-search/Austin_TX",
    "render_js": true,
    "country": "us",
    "premium_proxy": true
  }'
Try it yourself

Try scraping a Realtor.com search results page live with AlterLab

Extracting Structured Data

Because Realtor.com is a Next.js application, it embeds its full data payload inside a <script id="__NEXT_DATA__"> tag on every page. This is far more reliable than scraping rendered HTML — the JSON structure is stable across UI redesigns and doesn't depend on CSS class names that change frequently.

Python
import alterlab
import json
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")


def extract_listings(search_url: str) -> list[dict]:
    response = client.scrape(search_url, render_js=True, country="us", premium_proxy=True)

    soup = BeautifulSoup(response.text, "html.parser")
    next_data_tag = soup.find("script", {"id": "__NEXT_DATA__"})

    if not next_data_tag:
        raise ValueError("__NEXT_DATA__ not found — page may not have fully rendered")

    data = json.loads(next_data_tag.string)

    # Path varies slightly by page type; search results use this structure
    properties = (
        data.get("props", {})
            .get("pageProps", {})
            .get("properties", [])
    )

    results = []
    for prop in properties:
        location = prop.get("location", {}).get("address", {})
        description = prop.get("description", {})
        agent = prop.get("advertisers", [{}])[0]

        results.append({
            "listing_id":   prop.get("property_id"),
            "address":      location.get("line"),
            "city":         location.get("city"),
            "state":        location.get("state_code"),
            "zip":          location.get("postal_code"),
            "price":        prop.get("list_price"),
            "beds":         description.get("beds"),
            "baths":        description.get("baths_consolidated"),
            "sqft":         description.get("sqft"),
            "status":       prop.get("status"),
            "list_date":    prop.get("list_date"),
            "agent_name":   agent.get("name"),
            "agent_phone":  agent.get("phones", [{}])[0].get("number"),
        })

    return results


listings = extract_listings(
    "https://www.realtor.com/realestateandhomes-search/Austin_TX"
)
for listing in listings[:3]:
    print(listing)

Method 2: CSS selectors for property cards

If the __NEXT_DATA__ structure shifts (it does occasionally after major deployments), DOM selectors are a useful fallback:

Python
from bs4 import BeautifulSoup


def parse_property_cards(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "html.parser")
    cards = soup.select('[data-testid="property-card"]')

    results = []
    for card in cards:
        price_el   = card.select_one('[data-testid="card-price"]')
        address_el = card.select_one('[data-testid="card-address-1"]')
        city_el    = card.select_one('[data-testid="card-address-2"]')
        meta_els   = card.select('[data-testid^="property-meta-"]')

        results.append({
            "price":   price_el.get_text(strip=True) if price_el else None,
            "address": address_el.get_text(strip=True) if address_el else None,
            "city":    city_el.get_text(strip=True) if city_el else None,
            "meta":    [m.get_text(strip=True) for m in meta_els],
        })

    return results

Selector reference (as of early 2026):

Data PointSelector
Property card container[data-testid="property-card"]
List price[data-testid="card-price"]
Street address[data-testid="card-address-1"]
City / state / zip[data-testid="card-address-2"]
Beds[data-testid="property-meta-beds"]
Baths[data-testid="property-meta-baths"]
Square footage[data-testid="property-meta-sqft"]
Listing type badge[data-testid="card-description"]

Realtor.com rotates data-testid attributes periodically. For pipelines that need to run unattended, the __NEXT_DATA__ path is the right choice.

Common Pitfalls

Pagination breaks without session continuity

Realtor.com paginates search results via URL suffixes (/pg-2, /pg-3), but pages beyond the first often require cookies set during the initial page load. Naive parallel fetches across pages will return empty results or redirect responses starting around page 3.

Maintain a session across requests using the session_id parameter:

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")


def scrape_all_pages(base_url: str, max_pages: int = 10) -> list[dict]:
    session_id = client.new_session(country="us", premium_proxy=True)
    all_results = []

    for page in range(1, max_pages + 1):
        url = f"{base_url}/pg-{page}" if page > 1 else base_url
        response = client.scrape(url, session_id=session_id, render_js=True)

        if response.status_code != 200:
            break

        # reuse extract_listings logic from earlier
        listings = extract_listings_from_html(response.text)
        if not listings:
            break  # past last page

        all_results.extend(listings)

    return all_results

Lazy-loaded images and deferred JS execution

Property images are lazy-loaded. If your pipeline needs image URLs, pass a wait_for selector to block until images resolve before the HTML snapshot is captured:

Python
response = client.scrape(
    url,
    render_js=True,
    wait_for='[data-testid="card-img-container"] img[src]',
)

Overly aggressive concurrency

Even with residential proxy rotation, flooding search endpoints triggers server-side rate limiting that persists across proxy IPs. Keep concurrent requests in the 5–10 range and use the SDK's rate_limit parameter to enable automatic exponential backoff on 429 responses.

Scaling Up

Batch requests across cities

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

target_cities = [
    "https://www.realtor.com/realestateandhomes-search/Austin_TX",
    "https://www.realtor.com/realestateandhomes-search/Denver_CO",
    "https://www.realtor.com/realestateandhomes-search/Phoenix_AZ",
    "https://www.realtor.com/realestateandhomes-search/Nashville_TN",
    "https://www.realtor.com/realestateandhomes-search/Charlotte_NC",
    "https://www.realtor.com/realestateandhomes-search/Seattle_WA",
]

results = client.batch_scrape(
    urls=target_cities,
    render_js=True,
    country="us",
    premium_proxy=True,
    concurrency=5,
)

for result in results:
    if result.status_code == 200:
        listings = extract_listings_from_html(result.text)
        print(f"{result.url}{len(listings)} listings")
    else:
        print(f"{result.url} → failed ({result.status_code})")

Cost planning

JS-rendered requests consume more credits than static fetches — account for this in your credit budget. At 50,000 requests/month (a reasonable baseline for monitoring 500 ZIP codes daily), you're within the standard Growth tier on AlterLab's pricing page. For pipelines exceeding 1M monthly requests, dedicated residential proxy pools on an enterprise plan substantially reduce per-request cost and improve throughput consistency.

Key Takeaways

  • requests will not work. Realtor.com uses JavaScript fingerprinting, TLS inspection, and IP-based rate limiting. You need a headless browser with proper browser fingerprinting and residential proxies.
  • Target __NEXT_DATA__ first. The embedded Next.js JSON payload is structured, stable, and doesn't break on UI redesigns. CSS selectors are a useful fallback, not a primary strategy.
  • Use sessions for pagination. Fetching pages 2+ without a valid session cookie returns empty results. Pass a session_id to maintain state across the full result set.
  • Throttle concurrency. 5–10 concurrent requests with automatic backoff on 429s is the right operating envelope for Realtor.com endpoints.
  • Schedule incrementally. MLS data refreshes every 15 minutes, but full re-scrapes are wasteful. Daily cycles for most use cases; 4-hour intervals for real-time price dashboards.

Scraping other real estate or high-volume e-commerce platforms? These guides follow the same pattern with site-specific details:

Share

Was this article helpful?

Frequently Asked Questions

Scraping publicly visible data from Realtor.com — such as listing prices, addresses, and property attributes — generally falls within the scope of public data collection. However, Realtor.com's Terms of Use prohibit automated access, so you should consult legal counsel before building commercial products on this data, avoid scraping personal contact information at scale, and respect robots.txt and rate limits.
Realtor.com uses JavaScript fingerprinting, TLS inspection, and IP-based rate limiting that blocks naive scrapers almost immediately. AlterLab's [Anti-Bot Bypass API](/anti-bot-bypass-api) handles all of this automatically — it routes requests through a properly fingerprinted headless browser over rotating residential proxies, so you get the rendered HTML without managing any of the bypass infrastructure yourself.
Cost depends primarily on request volume and whether you need JavaScript rendering. JS-rendered requests (required for Realtor.com) consume more credits than static fetches. At 50,000 requests/month — enough to monitor 500 ZIP codes daily — you're within AlterLab's Growth tier. See the [pricing page](/pricing) for current tier limits and credit rates; enterprise plans with dedicated residential pools are available for 1M+ monthly requests.