
How to Scrape Realtor.com: Complete Guide for 2026
Step-by-step guide to scraping Realtor.com in 2026. Extract property listings, prices, and agent data with Python while bypassing anti-bot protections at scale.
March 29, 2026
Realtor.com publishes MLS data refreshed every 15 minutes across 100 million+ active and historical listings. It's one of the most comprehensive real estate data sources publicly accessible — and one of the more aggressively protected ones.
This guide covers how to scrape Realtor.com reliably in 2026: what anti-bot protections you'll hit, how to extract structured listing data, how to handle pagination without losing session state, and how to scale to bulk collection without burning through retries.
Why Scrape Realtor.com?
Three use cases drive the majority of Realtor.com scraping projects:
Price monitoring and market analysis. Tracking median list prices, price reductions, and days-on-market across ZIP codes or metro areas. Fintech companies building Automated Valuation Models (AVMs) and hedge funds building housing supply indexes are the main consumers here — the 15-minute MLS refresh cadence makes Realtor.com one of the freshest public data sources available.
Lead generation for agents and lenders. New listings, FSBO properties, and recently reduced inventory all appear on Realtor.com before aggregating elsewhere. Pulling listing agent contact data alongside property details feeds CRMs and outreach pipelines for real estate teams and mortgage brokers.
Competitive and academic research. iBuyers, proptech companies, and academic researchers track neighborhood price trajectories, inventory levels, and absorption rates at scale. Realtor.com's geographic coverage and data depth make it the preferred source over smaller regional MLS portals.
Anti-Bot Challenges on Realtor.com
Realtor.com is a Next.js application backed by active bot detection. Here's what you'll hit:
JavaScript fingerprinting. The site evaluates browser environment signals on load — canvas fingerprint, WebGL renderer string, navigator properties, and timing anomalies. A plain requests.get() call returns a 403 or silent redirect within seconds. Even many headless browser setups get caught if the browser profile isn't properly configured.
TLS fingerprinting. Realtor.com's edge infrastructure inspects the TLS ClientHello before any HTTP-level logic runs. Python's default ssl module produces a fingerprint that diverges from Chrome's in measurable ways, making requests trivially identifiable at the connection layer.
IP-based rate limiting. Search and listing endpoints aggressively block datacenter IP ranges. Residential proxies are required; even with those, high-frequency requests from a single IP trigger rate limits within minutes.
Session cookie requirements. Several data-heavy endpoints require cookies established by a prior JavaScript page load. Without a valid session, paginated results return empty arrays or redirect to the homepage.
Building reliable bypass for all of this from scratch — patching TLS fingerprints, sourcing residential proxy pools, managing headless browser profiles — takes weeks and requires constant maintenance. AlterLab's Anti-Bot Bypass API handles the entire stack transparently.
Quick Start with AlterLab API
Install the SDK:
pip install alterlab beautifulsoup4The getting started guide covers API key generation and environment setup if you're starting from scratch.
Scrape a Realtor.com search page
import alterlab
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
"https://www.realtor.com/realestateandhomes-search/Austin_TX",
render_js=True, # required — Realtor.com requires JS execution
country="us", # route through US residential proxies
premium_proxy=True, # residential pool (datacenter IPs get blocked)
)
soup = BeautifulSoup(response.text, "html.parser")
print(f"Status: {response.status_code}")
print(soup.title.string)The render_js=True flag provisions a headless Chromium instance with a properly fingerprinted browser environment and valid TLS profile. No local browser setup required.
cURL equivalent
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.realtor.com/realestateandhomes-search/Austin_TX",
"render_js": true,
"country": "us",
"premium_proxy": true
}'Try scraping a Realtor.com search results page live with AlterLab
Extracting Structured Data
Because Realtor.com is a Next.js application, it embeds its full data payload inside a <script id="__NEXT_DATA__"> tag on every page. This is far more reliable than scraping rendered HTML — the JSON structure is stable across UI redesigns and doesn't depend on CSS class names that change frequently.
Method 1: Parse __NEXT_DATA__ (recommended for production)
import alterlab
import json
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
def extract_listings(search_url: str) -> list[dict]:
response = client.scrape(search_url, render_js=True, country="us", premium_proxy=True)
soup = BeautifulSoup(response.text, "html.parser")
next_data_tag = soup.find("script", {"id": "__NEXT_DATA__"})
if not next_data_tag:
raise ValueError("__NEXT_DATA__ not found — page may not have fully rendered")
data = json.loads(next_data_tag.string)
# Path varies slightly by page type; search results use this structure
properties = (
data.get("props", {})
.get("pageProps", {})
.get("properties", [])
)
results = []
for prop in properties:
location = prop.get("location", {}).get("address", {})
description = prop.get("description", {})
agent = prop.get("advertisers", [{}])[0]
results.append({
"listing_id": prop.get("property_id"),
"address": location.get("line"),
"city": location.get("city"),
"state": location.get("state_code"),
"zip": location.get("postal_code"),
"price": prop.get("list_price"),
"beds": description.get("beds"),
"baths": description.get("baths_consolidated"),
"sqft": description.get("sqft"),
"status": prop.get("status"),
"list_date": prop.get("list_date"),
"agent_name": agent.get("name"),
"agent_phone": agent.get("phones", [{}])[0].get("number"),
})
return results
listings = extract_listings(
"https://www.realtor.com/realestateandhomes-search/Austin_TX"
)
for listing in listings[:3]:
print(listing)Method 2: CSS selectors for property cards
If the __NEXT_DATA__ structure shifts (it does occasionally after major deployments), DOM selectors are a useful fallback:
from bs4 import BeautifulSoup
def parse_property_cards(html: str) -> list[dict]:
soup = BeautifulSoup(html, "html.parser")
cards = soup.select('[data-testid="property-card"]')
results = []
for card in cards:
price_el = card.select_one('[data-testid="card-price"]')
address_el = card.select_one('[data-testid="card-address-1"]')
city_el = card.select_one('[data-testid="card-address-2"]')
meta_els = card.select('[data-testid^="property-meta-"]')
results.append({
"price": price_el.get_text(strip=True) if price_el else None,
"address": address_el.get_text(strip=True) if address_el else None,
"city": city_el.get_text(strip=True) if city_el else None,
"meta": [m.get_text(strip=True) for m in meta_els],
})
return resultsSelector reference (as of early 2026):
| Data Point | Selector |
|---|---|
| Property card container | [data-testid="property-card"] |
| List price | [data-testid="card-price"] |
| Street address | [data-testid="card-address-1"] |
| City / state / zip | [data-testid="card-address-2"] |
| Beds | [data-testid="property-meta-beds"] |
| Baths | [data-testid="property-meta-baths"] |
| Square footage | [data-testid="property-meta-sqft"] |
| Listing type badge | [data-testid="card-description"] |
Realtor.com rotates data-testid attributes periodically. For pipelines that need to run unattended, the __NEXT_DATA__ path is the right choice.
Common Pitfalls
Pagination breaks without session continuity
Realtor.com paginates search results via URL suffixes (/pg-2, /pg-3), but pages beyond the first often require cookies set during the initial page load. Naive parallel fetches across pages will return empty results or redirect responses starting around page 3.
Maintain a session across requests using the session_id parameter:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
def scrape_all_pages(base_url: str, max_pages: int = 10) -> list[dict]:
session_id = client.new_session(country="us", premium_proxy=True)
all_results = []
for page in range(1, max_pages + 1):
url = f"{base_url}/pg-{page}" if page > 1 else base_url
response = client.scrape(url, session_id=session_id, render_js=True)
if response.status_code != 200:
break
# reuse extract_listings logic from earlier
listings = extract_listings_from_html(response.text)
if not listings:
break # past last page
all_results.extend(listings)
return all_resultsLazy-loaded images and deferred JS execution
Property images are lazy-loaded. If your pipeline needs image URLs, pass a wait_for selector to block until images resolve before the HTML snapshot is captured:
response = client.scrape(
url,
render_js=True,
wait_for='[data-testid="card-img-container"] img[src]',
)Overly aggressive concurrency
Even with residential proxy rotation, flooding search endpoints triggers server-side rate limiting that persists across proxy IPs. Keep concurrent requests in the 5–10 range and use the SDK's rate_limit parameter to enable automatic exponential backoff on 429 responses.
Scaling Up
Batch requests across cities
import alterlab
client = alterlab.Client("YOUR_API_KEY")
target_cities = [
"https://www.realtor.com/realestateandhomes-search/Austin_TX",
"https://www.realtor.com/realestateandhomes-search/Denver_CO",
"https://www.realtor.com/realestateandhomes-search/Phoenix_AZ",
"https://www.realtor.com/realestateandhomes-search/Nashville_TN",
"https://www.realtor.com/realestateandhomes-search/Charlotte_NC",
"https://www.realtor.com/realestateandhomes-search/Seattle_WA",
]
results = client.batch_scrape(
urls=target_cities,
render_js=True,
country="us",
premium_proxy=True,
concurrency=5,
)
for result in results:
if result.status_code == 200:
listings = extract_listings_from_html(result.text)
print(f"{result.url} → {len(listings)} listings")
else:
print(f"{result.url} → failed ({result.status_code})")Cost planning
JS-rendered requests consume more credits than static fetches — account for this in your credit budget. At 50,000 requests/month (a reasonable baseline for monitoring 500 ZIP codes daily), you're within the standard Growth tier on AlterLab's pricing page. For pipelines exceeding 1M monthly requests, dedicated residential proxy pools on an enterprise plan substantially reduce per-request cost and improve throughput consistency.
Key Takeaways
requestswill not work. Realtor.com uses JavaScript fingerprinting, TLS inspection, and IP-based rate limiting. You need a headless browser with proper browser fingerprinting and residential proxies.- Target
__NEXT_DATA__first. The embedded Next.js JSON payload is structured, stable, and doesn't break on UI redesigns. CSS selectors are a useful fallback, not a primary strategy. - Use sessions for pagination. Fetching pages 2+ without a valid session cookie returns empty results. Pass a
session_idto maintain state across the full result set. - Throttle concurrency. 5–10 concurrent requests with automatic backoff on 429s is the right operating envelope for Realtor.com endpoints.
- Schedule incrementally. MLS data refreshes every 15 minutes, but full re-scrapes are wasteful. Daily cycles for most use cases; 4-hour intervals for real-time price dashboards.
Related Guides
Scraping other real estate or high-volume e-commerce platforms? These guides follow the same pattern with site-specific details:
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.


