
How to Scrape Redfin: Complete Guide for 2026
Learn to scrape Redfin property listings with Python in 2026. Covers anti-bot bypass, CSS selectors, JSON-LD parsing, and building scalable pipelines.
March 29, 2026
Redfin exposes one of the most complete real estate datasets on the public web: active listings, price history, days on market, agent data, neighborhood stats, and walk scores — all attached to individual property records and updated multiple times per day. Getting that data out programmatically requires dealing with anti-bot protections that are meaningfully stricter than most content sites. This guide covers every layer: what protections exist, how to bypass them reliably, which selectors to target, and how to build a pipeline that scales.
Why Scrape Redfin?
Redfin's value isn't the listings themselves — it's the density of structured, frequently refreshed data attached to each one. Three workflows where this matters in practice:
Real-time price monitoring. Track asking price changes, reductions, and relisting events across specific zip codes or MLS regions. Redfin surfaces price history directly on the listing page with timestamps, giving you a dataset that most public MLS feeds don't expose at this granularity.
Investment lead generation. Investors screening for high days-on-market properties, recently price-reduced listings, or specific lot-size/price-per-sqft ratios can build automated pipelines that surface candidates before a human broker manually compiles a comparable list.
Housing market research and ML. Academics, data journalists, and engineers building price prediction models need labeled historical data with features like square footage, school district scores, walk score, and HOA status. Redfin exposes many of these as structured HTML or embedded JSON, making it one of the cleaner sources for feature engineering.
Anti-Bot Challenges on Redfin.com
Redfin runs several layers of protection that make naive scraping unreliable within minutes:
TLS fingerprinting. Redfin's CDN checks the TLS handshake profile of your HTTP client. Python's requests library produces a fingerprint that's trivially identified and blocked at the network edge — even with correct headers set, the handshake mismatch returns 403s before your request reaches the application server.
IP reputation scoring. Datacenter IP ranges from AWS, GCP, and DigitalOcean are blocked outright. Requests from these ranges return either a CAPTCHA challenge or a silent redirect to a bot detection page. Residential proxies with clean reputation histories are a hard requirement.
Behavioral analytics. Redfin tracks mouse movement, scroll velocity, and interaction timing for browser-based sessions. Headless Chromium without stealth patches triggers detection within a few page loads — well before you've collected anything useful.
Per-region rate limiting. Search result pages for high-demand markets (SF Bay Area, NYC, LA) appear to have tighter per-IP thresholds than lower-traffic markets. Burst patterns on these market endpoints trip limits faster than a naive rotating proxy setup can handle.
Building around all of this from scratch means maintaining residential proxy pools, patching TLS clients with curl_cffi or tls-client, implementing browser fingerprint spoofing, and writing CAPTCHA fallback logic — before you've written a single line of parsing code. The Anti-bot bypass API handles this infrastructure layer, so your code only needs to deal with the HTML.
Quick Start with AlterLab API
Install the SDK and parsing dependencies:
pip install alterlab beautifulsoup4 lxmlThe minimal working example — fetch a Redfin search results page and confirm you got real listing HTML back:
import alterlab
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
"https://www.redfin.com/city/30749/CA/San-Francisco/filter/max-price=1.5M",
render_js=True, # Required — search results load via React
country_code="us", # Use US residential proxies
)
soup = BeautifulSoup(response.text, "lxml")
cards = soup.select(".HomeCardContainer")
print(f"Found {len(cards)} listing cards")The same request as a cURL call, useful for smoke-testing from a pipeline:
curl -s -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.redfin.com/city/30749/CA/San-Francisco/filter/max-price=1.5M",
"render_js": true,
"country_code": "us"
}' | jq '.html | length'If you're setting up AlterLab for the first time, the Getting started guide covers API key setup, SDK installation, and your first request in under five minutes.
Try scraping a live Redfin search results page with AlterLab
Extracting Structured Data
Redfin renders listing cards as React components. After JavaScript execution, the DOM exposes consistent class names and data-rf-test-id attributes you can target reliably.
Search Result Listing Cards
import alterlab
from bs4 import BeautifulSoup
import json
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
"https://www.redfin.com/zipcode/94105",
render_js=True,
country_code="us",
)
soup = BeautifulSoup(response.text, "lxml")
listings = []
for card in soup.select(".HomeCardContainer"):
listing = {
"address": card.select_one(".homeAddressV2") and
card.select_one(".homeAddressV2").get_text(strip=True),
"price": card.select_one(".homePriceV2") and
card.select_one(".homePriceV2").get_text(strip=True),
"beds": card.select_one("[data-rf-test-id='abp-beds']") and
card.select_one("[data-rf-test-id='abp-beds']").get_text(strip=True),
"baths": card.select_one("[data-rf-test-id='abp-baths']") and
card.select_one("[data-rf-test-id='abp-baths']").get_text(strip=True),
"sqft": card.select_one("[data-rf-test-id='abp-sqft']") and
card.select_one("[data-rf-test-id='abp-sqft']").get_text(strip=True),
"dom": card.select_one(".daysOnRedfin") and
card.select_one(".daysOnRedfin").get_text(strip=True),
"url": card.select_one("a[href]") and
"https://www.redfin.com" + card.select_one("a[href]")["href"],
}
listings.append(listing)
print(json.dumps(listings[:3], indent=2))JSON-LD from Property Detail Pages
Individual listing pages embed structured data in a <script type="application/ld+json"> block. This follows the schema.org RealEstateListing type and is faster to parse than walking the rendered DOM — and more stable across Redfin frontend deploys:
import alterlab
from bs4 import BeautifulSoup
import json
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
"https://www.redfin.com/CA/San-Francisco/123-Main-St-94105/home/12345678",
render_js=True,
country_code="us",
wait_for_selector=".price-table", # Wait until price history loads
)
soup = BeautifulSoup(response.text, "lxml")
# Extract JSON-LD structured data block
ld_scripts = soup.find_all("script", {"type": "application/ld+json"})
for script in ld_scripts:
try:
data = json.loads(script.string)
if isinstance(data, list):
data = next((d for d in data if d.get("@type") == "RealEstateListing"), None)
if data and data.get("@type") == "RealEstateListing":
print("Address:", data.get("address", {}).get("streetAddress"))
print("Price: ", data.get("offers", {}).get("price"))
print("Beds: ", data.get("numberOfRooms"))
break
except (json.JSONDecodeError, AttributeError):
continue
# Supplemental facts from the property detail table
facts = {}
for row in soup.select(".facts-table .table-row"):
label = row.select_one(".table-label")
value = row.select_one(".table-value")
if label and value:
facts[label.get_text(strip=True)] = value.get_text(strip=True)
print(json.dumps(facts, indent=2))CSS selector reference for common data points:
| Data Point | Selector |
|---|---|
| Listing price | .homePriceV2 |
| Address | .homeAddressV2 |
| Beds | [data-rf-test-id="abp-beds"] |
| Baths | [data-rf-test-id="abp-baths"] |
| Square footage | [data-rf-test-id="abp-sqft"] |
| Days on market | .daysOnRedfin |
| Price history rows | .price-table .price-table-row |
| Property facts table | .facts-table .table-row |
| Walk / Transit score | .walkscore-stats |
| Listing agent name | .agent-basic-details .agent-name |
Prefer data-rf-test-id attributes over class names wherever they exist — test IDs are significantly more stable across Redfin's frontend deployments than utility class names, which have already incremented at least once in the past 12 months.
Common Pitfalls
Pagination varies by market URL structure. Redfin's search pages use ?page=N for city and neighborhood URLs, but map-polygon search endpoints use &start=N. On some market URLs neither parameter exists and the next page loads via an infinite scroll XHR trigger. Always verify that incrementing the page parameter actually returns a different listing set before building a loop around it.
Class names drift with frontend releases. .homePriceV2 superseded .homePrice sometime in 2024. The V2 suffix has appeared on several selectors. Build your parsers with a fallback chain — try the current selector first, then the previous generation:
def get_price(card):
# Try current generation first, fall back to previous
for selector in [
".homePriceV2",
".homePrice",
"[data-rf-test-id='listing-price']",
]:
el = card.select_one(selector)
if el:
return el.get_text(strip=True)
return NoneRender timing on detail pages. Even with render_js=True, sections like price history charts and neighborhood score widgets load asynchronously after initial paint. If your parser finds empty containers for data you can see in a browser, add wait_for_selector targeting the last element to appear on the page — typically .price-table for the price history block.
Map polygon result caps. If you're scraping using Redfin's map bounding box URLs (useful for irregular geographic boundaries), the API caps results at 350 homes per query regardless of how many listings exist in the area. For dense urban markets, subdivide your bounding polygon into smaller quadrants and merge the results, deduplicating on the MLS ID extracted from each listing's URL path.
Address encoding edge cases. Redfin address fields occasionally include Unicode characters — directional markers, special apartment symbols, or non-breaking spaces — that cause issues when writing to CSV or comparing records across runs. Normalize with .encode("ascii", "ignore").decode() for ASCII-only pipelines, or store as TEXT in PostgreSQL and handle the full range there.
Scaling Up
Once your single-request parser is stable, a production pipeline needs concurrency, scheduling, and deduplication.
Async Batch Scraping
import asyncio
import alterlab
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
ZIP_CODES = [
"94105", "94107", "94109", "94110", "94112",
"94114", "94115", "94116", "94117", "94118",
]
async def scrape_zip(zip_code: str) -> list[dict]:
url = f"https://www.redfin.com/zipcode/{zip_code}"
response = await client.async_scrape(url, render_js=True, country_code="us")
soup = BeautifulSoup(response.text, "lxml")
return [
{
"zip": zip_code,
"address": c.select_one(".homeAddressV2") and
c.select_one(".homeAddressV2").get_text(strip=True),
"price": c.select_one(".homePriceV2") and
c.select_one(".homePriceV2").get_text(strip=True),
}
for c in soup.select(".HomeCardContainer")
]
async def main():
tasks = [scrape_zip(z) for z in ZIP_CODES]
results = await asyncio.gather(*tasks, return_exceptions=True)
listings = [
item
for batch in results
if isinstance(batch, list)
for item in batch
]
print(f"Collected {len(listings)} listings across {len(ZIP_CODES)} zip codes")
asyncio.run(main())Storage Schema with Deduplication
The MLS ID is embedded directly in the Redfin URL path: redfin.com/CA/San-Francisco/123-Main-St-94105/home/12345678 — extract the final numeric segment. Use it as the natural deduplication key:
CREATE TABLE redfin_listings (
id BIGSERIAL PRIMARY KEY,
mls_id TEXT NOT NULL,
address TEXT,
zip_code CHAR(5),
price_usd INTEGER,
beds NUMERIC(3,1),
baths NUMERIC(3,1),
sqft INTEGER,
days_on_mkt INTEGER,
scraped_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (mls_id, (scraped_at::DATE)) -- one record per listing per day
);
CREATE INDEX idx_redfin_zip_scraped ON redfin_listings (zip_code, scraped_at DESC);Run daily scrapes with cron or a workflow tool like Prefect, and INSERT ... ON CONFLICT DO UPDATE to update fields like price and days-on-market while preserving the initial scraped-at timestamp for price change calculations.
Cost Planning at Scale
JavaScript-rendered requests consume more credits than static HTML fetches because of the additional proxy bandwidth and browser compute required. For search result pages, test whether the target URL returns useful HTML without JS rendering enabled — many Redfin city and zip-code search URLs do render a server-side content layer. Only enable render_js=True for pages where the extra data justifies the cost. This pattern typically reduces credit consumption by 40–60% on search-heavy pipelines. See AlterLab pricing for current per-request credit rates across render types.
Key Takeaways
- Redfin blocks datacenter IP ranges and standard Python TLS fingerprints at the network edge. Residential proxies and fingerprint spoofing are not optional.
render_js=Trueis required for search result pages and listing detail pages. Addwait_for_selectortargeting late-loading sections like.price-tableto avoid empty parser results.- Prefer
data-rf-test-idattributes over class names — they survive Redfin frontend deploys more reliably than utility classes. - JSON-LD
<script type="application/ld+json">blocks on detail pages give you clean schema.org fields without DOM traversal for the most commonly needed listing attributes. - Map polygon queries cap at 350 results. Subdivide dense-market bounding boxes and deduplicate on MLS ID.
- Enable async batch scraping from the start — sequential requests don't scale past a few hundred zip codes per hour.
- Use static mode for search pages where possible to cut JS-render credit costs by ~40–60%.
Related Guides
Building a broader real estate data pipeline? These guides cover the other major platforms:
The same anti-bot patterns and async pipeline structure apply to e-commerce scraping:
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.


