
Build a Real-Time Price Monitor with Python
Step-by-step guide to building a production-grade price monitoring system with Python, APScheduler, PostgreSQL, and a scraping API with anti-bot bypass.
March 26, 2026
Price monitoring is straightforward in concept and a maintenance nightmare in practice. Anti-bot measures rotate. CSS selectors drift. IP bans accumulate silently. This guide builds a complete system that handles all three — using Python, APScheduler, PostgreSQL, and a scraping API with anti-bot bypass built in.
By the end you'll have a daemon that:
- Polls any list of product URLs on a per-target schedule
- Survives Cloudflare, PerimeterX, and JavaScript-rendered prices
- Persists a full price history to PostgreSQL
- Fires email alerts with debouncing — no alert storms
Architecture
Prerequisites
pip install requests beautifulsoup4 lxml apscheduler sqlalchemy psycopg2-binaryYou'll also need:
- Python 3.11+
- A running PostgreSQL instance (or
DATABASE_URL=sqlite:///prices.dbfor local dev) - An API key — follow the quickstart guide to get one in under two minutes
Step 1: Target Configuration
Separate configuration from logic. Each target entry specifies everything the monitor needs to run independently:
[
{
"name": "Sony WH-1000XM5",
"url": "https://www.amazon.com/dp/B09XS7JWHH",
"selector": ".a-price .a-offscreen",
"threshold": 299.00,
"currency": "USD",
"interval_minutes": 30
},
{
"name": "Peak Design Travel Backpack 45L",
"url": "https://www.peakdesign.com/products/travel-backpack",
"selector": ".price__current",
"threshold": 550.00,
"currency": "USD",
"interval_minutes": 60
}
]Adding a new product means adding one JSON object — no code changes.
Step 2: The Scraping Client
Most e-commerce sites render prices via JavaScript and block naive HTTP requests within minutes. The scraping API returns fully rendered HTML — you send a URL, get back a DOM.
cURL (for validating selectors before coding):
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.amazon.com/dp/B09XS7JWHH",
"render": true,
"wait_for": ".a-price"
}'Python client with retry logic:
import os
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
ALTERLAB_URL = "https://api.alterlab.io/v1/scrape"
ALTERLAB_KEY = os.environ["ALTERLAB_API_KEY"]
_session = requests.Session()
_session.headers.update({"X-API-Key": ALTERLAB_KEY, "Content-Type": "application/json"})
_session.mount("https://", HTTPAdapter(max_retries=Retry(total=3, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503])))
def scrape(url: str, wait_for: str | None = None, render: bool = True) -> str:
payload: dict[str, object] = {"url": url, "render": render}
if wait_for:
payload["wait_for"] = wait_for
resp = _session.post(ALTERLAB_URL, json=payload, timeout=45)
resp.raise_for_status()
return resp.json()["html"]The wait_for parameter instructs the headless browser to wait until that CSS selector appears in the DOM before returning HTML. Without it, you'll receive the page skeleton before JavaScript has injected the price.
Try scraping this page with AlterLab — inspect the returned HTML for the price element
Step 3: Price Extraction
import re
from decimal import Decimal, InvalidOperation
from bs4 import BeautifulSoup
def extract_price(html: str, selector: str) -> Decimal | None:
soup = BeautifulSoup(html, "lxml")
el = soup.select_one(selector)
if not el:
return None
# Strip currency symbols, whitespace, thousands separators
raw = re.sub(r"[^\d.]", "", el.get_text(strip=True))
if not raw:
return None
try:
return Decimal(raw)
except InvalidOperation:
return NoneTwo failure modes to instrument:
elisNone: The element didn't render in time, or the site changed its layout. Increase thewait_fortimeout or update the selector. Log this as a warning, not an error — it's recoverable.InvalidOperation: The text matched but contained non-numeric content like"From $1.299,00"(European locale formatting). If you monitor non-US sites, add locale-aware normalization before passing toDecimal.
Step 4: Persist Price History
Store every sample with a UTC timestamp. You want the full time series — not just current price — for trend queries and alert debouncing.
import os
from datetime import datetime, timezone
from decimal import Decimal
from sqlalchemy import Column, DateTime, Index, Integer, Numeric, String, create_engine
from sqlalchemy.orm import DeclarativeBase, Session
class Base(DeclarativeBase):
pass
class PriceRecord(Base):
__tablename__ = "price_records"
id = Column(Integer, primary_key=True, autoincrement=True)
product_name = Column(String(256), nullable=False)
product_url = Column(String(2048), nullable=False)
selector = Column(String(256), nullable=False)
price = Column(Numeric(12, 2), nullable=False)
currency = Column(String(8), default="USD", nullable=False)
scraped_at = Column(
DateTime(timezone=True),
nullable=False,
default=lambda: datetime.now(timezone.utc),
)
__table_args__ = (
Index("ix_price_records_url_time", "product_url", "scraped_at"),
)
engine = create_engine(
os.environ.get("DATABASE_URL", "sqlite:///prices.db"),
pool_pre_ping=True,
)
Base.metadata.create_all(engine)
def save_price(name: str, url: str, selector: str, price: Decimal, currency: str = "USD") -> None:
with Session(engine) as session:
session.add(PriceRecord(
product_name=name,
product_url=url,
selector=selector,
price=price,
currency=currency,
))
session.commit()The composite index on (product_url, scraped_at) keeps time-range queries fast as the table grows. For hundreds of products tracked over months, partition by month on scraped_at.
Step 5: Debounced Alert Logic
Without debouncing, you get an email for every polling cycle the price stays below threshold. The cooldown window suppresses re-alerts within a configurable period.
import os, smtplib
from datetime import datetime, timedelta, timezone
from decimal import Decimal
from email.mime.text import MIMEText
from sqlalchemy import select
from models import PriceRecord, Session, engine
ALERT_COOLDOWN_HOURS = 4
def _earliest_price_in_window(url: str) -> Decimal | None:
"""Return the first price recorded within the cooldown window."""
cutoff = datetime.now(timezone.utc) - timedelta(hours=ALERT_COOLDOWN_HOURS)
with Session(engine) as session:
row = session.execute(
select(PriceRecord.price)
.where(PriceRecord.product_url == url)
.where(PriceRecord.scraped_at >= cutoff)
.order_by(PriceRecord.scraped_at.asc())
.limit(1)
).first()
return Decimal(str(row[0])) if row else None
def maybe_alert(name: str, url: str, current: Decimal, threshold: Decimal) -> None:
if current >= threshold:
return # Price is not below threshold — nothing to do
first_in_window = _earliest_price_in_window(url)
if first_in_window is not None and first_in_window < threshold:
return # Already sent an alert during this cooldown window
_send_email(name, url, current, threshold)
def _send_email(name: str, url: str, price: Decimal, threshold: Decimal) -> None:
body = (
f"Price alert for {name}\n\n"
f"Current price: ${price:.2f}\n"
f"Your threshold: ${threshold:.2f}\n"
f"Product URL: {url}"
)
msg = MIMEText(body)
msg["Subject"] = f"Price drop: {name} is now ${price:.2f}"
msg["From"] = os.environ["SMTP_FROM"]
msg["To"] = os.environ["ALERT_EMAIL"]
with smtplib.SMTP(os.environ["SMTP_HOST"], int(os.environ.get("SMTP_PORT", 587))) as srv:
srv.starttls()
srv.login(os.environ["SMTP_USER"], os.environ["SMTP_PASS"])
srv.send_message(msg)The logic: if the earliest price recorded in the cooldown window was already below threshold, the drop event already triggered an alert — suppress this one. When the price recovers above threshold and drops again, _earliest_price_in_window returns a price above threshold, and a new alert fires.
Step 6: Scheduler
Each target gets its own APScheduler job with its own interval. max_instances=1 prevents a slow scrape from stacking concurrent runs for the same product.
import json, logging, os, signal, sys
from decimal import Decimal
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.interval import IntervalTrigger
from alerts import maybe_alert
from client import scrape
from extractor import extract_price
from models import save_price
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger(__name__)
def check_price(target: dict) -> None:
name, url = target["name"], target["url"]
selector = target["selector"]
threshold = Decimal(str(target["threshold"]))
currency = target.get("currency", "USD")
log.info("Checking %s", name)
try:
html = scrape(url, wait_for=selector)
except Exception as exc:
log.error("Scrape failed for %s: %s", name, exc)
return
price = extract_price(html, selector)
if price is None:
log.warning("No price extracted for %s (selector=%r)", name, selector)
return
log.info("%s → $%.2f (threshold $%.2f)", name, price, threshold)
save_price(name, url, selector, price, currency)
maybe_alert(name, url, price, threshold)
def main() -> None:
with open("targets.json") as f:
targets = json.load(f)
scheduler = BlockingScheduler(timezone="UTC")
for t in targets:
scheduler.add_job(
check_price,
trigger=IntervalTrigger(minutes=t["interval_minutes"]),
kwargs={"target": t},
id=t["name"],
max_instances=1,
misfire_grace_time=60,
)
log.info("Scheduled %s every %d min", t["name"], t["interval_minutes"])
signal.signal(signal.SIGTERM, lambda *_: (scheduler.shutdown(wait=False), sys.exit(0)))
signal.signal(signal.SIGINT, lambda *_: (scheduler.shutdown(wait=False), sys.exit(0)))
scheduler.start()
if __name__ == "__main__":
main()Running It
export ALTERLAB_API_KEY=sk_live_your_key_here
export DATABASE_URL=postgresql://user:pass@localhost:5432/pricedb
export SMTP_HOST=smtp.gmail.com
export SMTP_USER=[email protected]
export SMTP_PASS=your_app_password
export SMTP_FROM=[email protected]
export ALERT_EMAIL=[email protected]
python monitor.pyExpected output:
2026-03-26 09:00:00 INFO Scheduled Sony WH-1000XM5 every 30 min
2026-03-26 09:00:00 INFO Scheduled Peak Design Travel Backpack 45L every 60 min
2026-03-26 09:00:01 INFO Checking Sony WH-1000XM5
2026-03-26 09:00:03 INFO Sony WH-1000XM5 → $279.99 (threshold $299.00)Production Considerations
Selector drift is the most common failure mode. E-commerce sites A/B test layouts constantly. Add a dead-man's-switch: if any target returns None from extract_price more than three consecutive times, fire a Slack or webhook alert. You want to know your selector broke before you miss a week of price data.
Request cadence: Even with rotating proxies, behavioral analysis will eventually flag tight polling loops. Keep intervals at 30 minutes or above per target. If you need sub-15-minute monitoring, spread targets across multiple scheduled windows — not faster individual polls.
Containerization: Wrap monitor.py in a Docker container with restart: unless-stopped. APScheduler's BlockingScheduler handles SIGTERM cleanly with the signal handlers above. Mount targets.json as a volume so you can update the target list without rebuilding the image.
Approach Comparison
The break-even point is around two hours of debugging time per month. One Saturday spent chasing an IP block and the economics shift decisively toward managed bypass.
Takeaway
The system you've built covers the full lifecycle: configuration, scraping, extraction, persistence, debounced alerting, and scheduling. The design decisions that matter most:
- Stateless job functions:
check_pricetakes a dict and exits cleanly. Unit-testable without mocking a scheduler. - Debounced alerts: Cooldown windows prevent alert fatigue without missing genuine price-drop events.
- Separated modules: Swapping the scraping backend, database, or alert channel touches exactly one file.
For the full request parameter reference — including session persistence, custom headers, and screenshot capture — see the API docs.
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

Selenium Bot Detection: Why You Get Caught and How to Avoid It

Why Your Headless Browser Gets Detected (and How to Fix It)

Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure

Scraping E-Commerce Sites at Scale Without Getting Blocked

Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.


