AlterLabAlterLab
Build a Real-Time Price Monitor with Python
Tutorials

Build a Real-Time Price Monitor with Python

Step-by-step guide to building a production-grade price monitoring system with Python, APScheduler, PostgreSQL, and a scraping API with anti-bot bypass.

Yash Dubey
Yash Dubey

March 26, 2026

8 min read
2 views

Price monitoring is straightforward in concept and a maintenance nightmare in practice. Anti-bot measures rotate. CSS selectors drift. IP bans accumulate silently. This guide builds a complete system that handles all three — using Python, APScheduler, PostgreSQL, and a scraping API with anti-bot bypass built in.

By the end you'll have a daemon that:

  • Polls any list of product URLs on a per-target schedule
  • Survives Cloudflare, PerimeterX, and JavaScript-rendered prices
  • Persists a full price history to PostgreSQL
  • Fires email alerts with debouncing — no alert storms

Architecture

Prerequisites

Bash
pip install requests beautifulsoup4 lxml apscheduler sqlalchemy psycopg2-binary

You'll also need:

  • Python 3.11+
  • A running PostgreSQL instance (or DATABASE_URL=sqlite:///prices.db for local dev)
  • An API key — follow the quickstart guide to get one in under two minutes

Step 1: Target Configuration

Separate configuration from logic. Each target entry specifies everything the monitor needs to run independently:

JSON
[
  {
    "name": "Sony WH-1000XM5",
    "url": "https://www.amazon.com/dp/B09XS7JWHH",
    "selector": ".a-price .a-offscreen",
    "threshold": 299.00,
    "currency": "USD",
    "interval_minutes": 30
  },
  {
    "name": "Peak Design Travel Backpack 45L",
    "url": "https://www.peakdesign.com/products/travel-backpack",
    "selector": ".price__current",
    "threshold": 550.00,
    "currency": "USD",
    "interval_minutes": 60
  }
]

Adding a new product means adding one JSON object — no code changes.

Step 2: The Scraping Client

Most e-commerce sites render prices via JavaScript and block naive HTTP requests within minutes. The scraping API returns fully rendered HTML — you send a URL, get back a DOM.

cURL (for validating selectors before coding):

Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B09XS7JWHH",
    "render": true,
    "wait_for": ".a-price"
  }'

Python client with retry logic:

Python
import os
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

ALTERLAB_URL = "https://api.alterlab.io/v1/scrape"
ALTERLAB_KEY = os.environ["ALTERLAB_API_KEY"]

_session = requests.Session()
_session.headers.update({"X-API-Key": ALTERLAB_KEY, "Content-Type": "application/json"})
_session.mount("https://", HTTPAdapter(max_retries=Retry(total=3, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503])))

def scrape(url: str, wait_for: str | None = None, render: bool = True) -> str:
    payload: dict[str, object] = {"url": url, "render": render}
    if wait_for:
        payload["wait_for"] = wait_for

    resp = _session.post(ALTERLAB_URL, json=payload, timeout=45)
    resp.raise_for_status()
    return resp.json()["html"]

The wait_for parameter instructs the headless browser to wait until that CSS selector appears in the DOM before returning HTML. Without it, you'll receive the page skeleton before JavaScript has injected the price.

Try it yourself

Try scraping this page with AlterLab — inspect the returned HTML for the price element

Step 3: Price Extraction

Python
import re
from decimal import Decimal, InvalidOperation
from bs4 import BeautifulSoup


def extract_price(html: str, selector: str) -> Decimal | None:
    soup = BeautifulSoup(html, "lxml")
    el = soup.select_one(selector)
    if not el:
        return None

    # Strip currency symbols, whitespace, thousands separators
    raw = re.sub(r"[^\d.]", "", el.get_text(strip=True))
    if not raw:
        return None

    try:
        return Decimal(raw)
    except InvalidOperation:
        return None

Two failure modes to instrument:

  1. el is None: The element didn't render in time, or the site changed its layout. Increase the wait_for timeout or update the selector. Log this as a warning, not an error — it's recoverable.
  2. InvalidOperation: The text matched but contained non-numeric content like "From $1.299,00" (European locale formatting). If you monitor non-US sites, add locale-aware normalization before passing to Decimal.

Step 4: Persist Price History

Store every sample with a UTC timestamp. You want the full time series — not just current price — for trend queries and alert debouncing.

Python
import os
from datetime import datetime, timezone
from decimal import Decimal

from sqlalchemy import Column, DateTime, Index, Integer, Numeric, String, create_engine
from sqlalchemy.orm import DeclarativeBase, Session


class Base(DeclarativeBase):
    pass


class PriceRecord(Base):
    __tablename__ = "price_records"

    id = Column(Integer, primary_key=True, autoincrement=True)
    product_name = Column(String(256), nullable=False)
    product_url = Column(String(2048), nullable=False)
    selector = Column(String(256), nullable=False)
    price = Column(Numeric(12, 2), nullable=False)
    currency = Column(String(8), default="USD", nullable=False)
    scraped_at = Column(
        DateTime(timezone=True),
        nullable=False,
        default=lambda: datetime.now(timezone.utc),
    )

    __table_args__ = (
        Index("ix_price_records_url_time", "product_url", "scraped_at"),
    )


engine = create_engine(
    os.environ.get("DATABASE_URL", "sqlite:///prices.db"),
    pool_pre_ping=True,
)
Base.metadata.create_all(engine)


def save_price(name: str, url: str, selector: str, price: Decimal, currency: str = "USD") -> None:
    with Session(engine) as session:
        session.add(PriceRecord(
            product_name=name,
            product_url=url,
            selector=selector,
            price=price,
            currency=currency,
        ))
        session.commit()

The composite index on (product_url, scraped_at) keeps time-range queries fast as the table grows. For hundreds of products tracked over months, partition by month on scraped_at.

Step 5: Debounced Alert Logic

Without debouncing, you get an email for every polling cycle the price stays below threshold. The cooldown window suppresses re-alerts within a configurable period.

Python
import os, smtplib
from datetime import datetime, timedelta, timezone
from decimal import Decimal
from email.mime.text import MIMEText

from sqlalchemy import select
from models import PriceRecord, Session, engine

ALERT_COOLDOWN_HOURS = 4


def _earliest_price_in_window(url: str) -> Decimal | None:
    """Return the first price recorded within the cooldown window."""
    cutoff = datetime.now(timezone.utc) - timedelta(hours=ALERT_COOLDOWN_HOURS)
    with Session(engine) as session:
        row = session.execute(
            select(PriceRecord.price)
            .where(PriceRecord.product_url == url)
            .where(PriceRecord.scraped_at >= cutoff)
            .order_by(PriceRecord.scraped_at.asc())
            .limit(1)
        ).first()
        return Decimal(str(row[0])) if row else None


def maybe_alert(name: str, url: str, current: Decimal, threshold: Decimal) -> None:
    if current >= threshold:
        return  # Price is not below threshold — nothing to do

    first_in_window = _earliest_price_in_window(url)
    if first_in_window is not None and first_in_window < threshold:
        return  # Already sent an alert during this cooldown window

    _send_email(name, url, current, threshold)


def _send_email(name: str, url: str, price: Decimal, threshold: Decimal) -> None:
    body = (
        f"Price alert for {name}\n\n"
        f"Current price: ${price:.2f}\n"
        f"Your threshold: ${threshold:.2f}\n"
        f"Product URL: {url}"
    )
    msg = MIMEText(body)
    msg["Subject"] = f"Price drop: {name} is now ${price:.2f}"
    msg["From"] = os.environ["SMTP_FROM"]
    msg["To"] = os.environ["ALERT_EMAIL"]

    with smtplib.SMTP(os.environ["SMTP_HOST"], int(os.environ.get("SMTP_PORT", 587))) as srv:
        srv.starttls()
        srv.login(os.environ["SMTP_USER"], os.environ["SMTP_PASS"])
        srv.send_message(msg)

The logic: if the earliest price recorded in the cooldown window was already below threshold, the drop event already triggered an alert — suppress this one. When the price recovers above threshold and drops again, _earliest_price_in_window returns a price above threshold, and a new alert fires.

Step 6: Scheduler

Each target gets its own APScheduler job with its own interval. max_instances=1 prevents a slow scrape from stacking concurrent runs for the same product.

Python
import json, logging, os, signal, sys
from decimal import Decimal

from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.interval import IntervalTrigger

from alerts import maybe_alert
from client import scrape
from extractor import extract_price
from models import save_price

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger(__name__)


def check_price(target: dict) -> None:
    name, url = target["name"], target["url"]
    selector = target["selector"]
    threshold = Decimal(str(target["threshold"]))
    currency = target.get("currency", "USD")

    log.info("Checking %s", name)
    try:
        html = scrape(url, wait_for=selector)
    except Exception as exc:
        log.error("Scrape failed for %s: %s", name, exc)
        return

    price = extract_price(html, selector)
    if price is None:
        log.warning("No price extracted for %s (selector=%r)", name, selector)
        return

    log.info("%s → $%.2f (threshold $%.2f)", name, price, threshold)
    save_price(name, url, selector, price, currency)
    maybe_alert(name, url, price, threshold)


def main() -> None:
    with open("targets.json") as f:
        targets = json.load(f)

    scheduler = BlockingScheduler(timezone="UTC")
    for t in targets:
        scheduler.add_job(
            check_price,
            trigger=IntervalTrigger(minutes=t["interval_minutes"]),
            kwargs={"target": t},
            id=t["name"],
            max_instances=1,
            misfire_grace_time=60,
        )
        log.info("Scheduled %s every %d min", t["name"], t["interval_minutes"])

    signal.signal(signal.SIGTERM, lambda *_: (scheduler.shutdown(wait=False), sys.exit(0)))
    signal.signal(signal.SIGINT, lambda *_: (scheduler.shutdown(wait=False), sys.exit(0)))

    scheduler.start()


if __name__ == "__main__":
    main()

Running It

Bash
export ALTERLAB_API_KEY=sk_live_your_key_here
export DATABASE_URL=postgresql://user:pass@localhost:5432/pricedb
export SMTP_HOST=smtp.gmail.com
export SMTP_USER=[email protected]
export SMTP_PASS=your_app_password
export SMTP_FROM=[email protected]
export ALERT_EMAIL=[email protected]

python monitor.py

Expected output:

Bash
2026-03-26 09:00:00 INFO Scheduled Sony WH-1000XM5 every 30 min
2026-03-26 09:00:00 INFO Scheduled Peak Design Travel Backpack 45L every 60 min
2026-03-26 09:00:01 INFO Checking Sony WH-1000XM5
2026-03-26 09:00:03 INFO Sony WH-1000XM5 → $279.99 (threshold $299.00)

Production Considerations

~2sAvg scrape latency (rendered)
30 minSafe minimum poll interval
99.2%AlterLab scrape success rate
4hRecommended alert cooldown

Selector drift is the most common failure mode. E-commerce sites A/B test layouts constantly. Add a dead-man's-switch: if any target returns None from extract_price more than three consecutive times, fire a Slack or webhook alert. You want to know your selector broke before you miss a week of price data.

Request cadence: Even with rotating proxies, behavioral analysis will eventually flag tight polling loops. Keep intervals at 30 minutes or above per target. If you need sub-15-minute monitoring, spread targets across multiple scheduled windows — not faster individual polls.

Containerization: Wrap monitor.py in a Docker container with restart: unless-stopped. APScheduler's BlockingScheduler handles SIGTERM cleanly with the signal handlers above. Mount targets.json as a volume so you can update the target list without rebuilding the image.

Approach Comparison

The break-even point is around two hours of debugging time per month. One Saturday spent chasing an IP block and the economics shift decisively toward managed bypass.

Takeaway

The system you've built covers the full lifecycle: configuration, scraping, extraction, persistence, debounced alerting, and scheduling. The design decisions that matter most:

  • Stateless job functions: check_price takes a dict and exits cleanly. Unit-testable without mocking a scheduler.
  • Debounced alerts: Cooldown windows prevent alert fatigue without missing genuine price-drop events.
  • Separated modules: Swapping the scraping backend, database, or alert channel touches exactly one file.

For the full request parameter reference — including session persistence, custom headers, and screenshot capture — see the API docs.

Share

Was this article helpful?

Frequently Asked Questions

Amazon aggressively fingerprints requests that don't behave like real browsers. Using a scraping API with built-in anti-bot bypass and residential proxy rotation handles Cloudflare, PerimeterX, and similar defenses transparently — your code never touches headers, cookies, or IP rotation logic.
Thirty minutes per target is the safe floor for most e-commerce sites without triggering behavioral rate limits. For tighter intervals, distribute checks across rotating proxy exit points. Polling faster than 5 minutes per product from a single IP is almost always blocked within hours.
Common selectors include `.a-price .a-offscreen` for Amazon, `[data-price]` or `.price__current` for Shopify stores, and `.product__price` for many WooCommerce themes. Inspect the element in DevTools, copy the selector, and validate with BeautifulSoup before adding it to your config. Build a selector map — don't hardcode per-scraper.