Pricing Compare Playground Blog Docs Changelog

Build a Real-Time Price Monitor with Python

Step-by-step guide to building a production-grade price monitoring system with Python, APScheduler, PostgreSQL, and a scraping API with anti-bot bypass.

Yash DubeyMarch 26, 2026

8 min read

223 views

Price monitoring is straightforward in concept and a maintenance nightmare in practice. Anti-bot measures rotate. CSS selectors drift. IP bans accumulate silently. This guide builds a complete system that handles all three — using Python, APScheduler, PostgreSQL, and a scraping API with anti-bot bypass built in.

By the end you'll have a daemon that:

Polls any list of product URLs on a per-target schedule
Survives Cloudflare, PerimeterX, and JavaScript-rendered prices
Persists a full price history to PostgreSQL
Fires email alerts with debouncing — no alert storms

Architecture

Prerequisites

Bash

pip install requests beautifulsoup4 lxml apscheduler sqlalchemy psycopg2-binary

You'll also need:

Python 3.11+
A running PostgreSQL instance (or DATABASE_URL=sqlite:///prices.db for local dev)
An API key — follow the quickstart guide to get one in under two minutes

Step 1: Target Configuration

Separate configuration from logic. Each target entry specifies everything the monitor needs to run independently:

JSON

[
  {
    "name": "Sony WH-1000XM5",
    "url": "https://www.amazon.com/dp/B09XS7JWHH",
    "selector": ".a-price .a-offscreen",
    "threshold": 299.00,
    "currency": "USD",
    "interval_minutes": 30
  },
  {
    "name": "Peak Design Travel Backpack 45L",
    "url": "https://www.peakdesign.com/products/travel-backpack",
    "selector": ".price__current",
    "threshold": 550.00,
    "currency": "USD",
    "interval_minutes": 60
  }
]

Adding a new product means adding one JSON object — no code changes.

Step 2: The Scraping Client

Most e-commerce sites render prices via JavaScript and block naive HTTP requests within minutes. The scraping API returns fully rendered HTML — you send a URL, get back a DOM.

cURL (for validating selectors before coding):

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.amazon.com/dp/B09XS7JWHH",
    "render": true,
    "wait_for": ".a-price"
  }'

Python client with retry logic:

Python

import os
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

ALTERLAB_URL = "https://api.alterlab.io/v1/scrape"
ALTERLAB_KEY = os.environ["ALTERLAB_API_KEY"]

_session = requests.Session()
_session.headers.update({"X-API-Key": ALTERLAB_KEY, "Content-Type": "application/json"})
_session.mount("https://", HTTPAdapter(max_retries=Retry(total=3, backoff_factor=0.5, status_forcelist=[429, 500, 502, 503])))

def scrape(url: str, wait_for: str | None = None, render: bool = True) -> str:
    payload: dict[str, object] = {"url": url, "render": render}
    if wait_for:
        payload["wait_for"] = wait_for

    resp = _session.post(ALTERLAB_URL, json=payload, timeout=45)
    resp.raise_for_status()
    return resp.json()["html"]

The wait_for parameter instructs the headless browser to wait until that CSS selector appears in the DOM before returning HTML. Without it, you'll receive the page skeleton before JavaScript has injected the price.

Try it yourself

Try scraping this page with AlterLab — inspect the returned HTML for the price element

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Step 3: Price Extraction

Python

import re
from decimal import Decimal, InvalidOperation
from bs4 import BeautifulSoup


def extract_price(html: str, selector: str) -> Decimal | None:
    soup = BeautifulSoup(html, "lxml")
    el = soup.select_one(selector)
    if not el:
        return None

    # Strip currency symbols, whitespace, thousands separators
    raw = re.sub(r"[^\d.]", "", el.get_text(strip=True))
    if not raw:
        return None

    try:
        return Decimal(raw)
    except InvalidOperation:
        return None

Two failure modes to instrument:

el is None: The element didn't render in time, or the site changed its layout. Increase the wait_for timeout or update the selector. Log this as a warning, not an error — it's recoverable.
InvalidOperation: The text matched but contained non-numeric content like "From $1.299,00" (European locale formatting). If you monitor non-US sites, add locale-aware normalization before passing to Decimal.

Step 4: Persist Price History

Store every sample with a UTC timestamp. You want the full time series — not just current price — for trend queries and alert debouncing.

Python

import os
from datetime import datetime, timezone
from decimal import Decimal

from sqlalchemy import Column, DateTime, Index, Integer, Numeric, String, create_engine
from sqlalchemy.orm import DeclarativeBase, Session


class Base(DeclarativeBase):
    pass


class PriceRecord(Base):
    __tablename__ = "price_records"

    id = Column(Integer, primary_key=True, autoincrement=True)
    product_name = Column(String(256), nullable=False)
    product_url = Column(String(2048), nullable=False)
    selector = Column(String(256), nullable=False)
    price = Column(Numeric(12, 2), nullable=False)
    currency = Column(String(8), default="USD", nullable=False)
    scraped_at = Column(
        DateTime(timezone=True),
        nullable=False,
        default=lambda: datetime.now(timezone.utc),
    )

    __table_args__ = (
        Index("ix_price_records_url_time", "product_url", "scraped_at"),
    )


engine = create_engine(
    os.environ.get("DATABASE_URL", "sqlite:///prices.db"),
    pool_pre_ping=True,
)
Base.metadata.create_all(engine)


def save_price(name: str, url: str, selector: str, price: Decimal, currency: str = "USD") -> None:
    with Session(engine) as session:
        session.add(PriceRecord(
            product_name=name,
            product_url=url,
            selector=selector,
            price=price,
            currency=currency,
        ))
        session.commit()

The composite index on (product_url, scraped_at) keeps time-range queries fast as the table grows. For hundreds of products tracked over months, partition by month on scraped_at.

Step 5: Debounced Alert Logic

Without debouncing, you get an email for every polling cycle the price stays below threshold. The cooldown window suppresses re-alerts within a configurable period.

Python

import os, smtplib
from datetime import datetime, timedelta, timezone
from decimal import Decimal
from email.mime.text import MIMEText

from sqlalchemy import select
from models import PriceRecord, Session, engine

ALERT_COOLDOWN_HOURS = 4


def _earliest_price_in_window(url: str) -> Decimal | None:
    """Return the first price recorded within the cooldown window."""
    cutoff = datetime.now(timezone.utc) - timedelta(hours=ALERT_COOLDOWN_HOURS)
    with Session(engine) as session:
        row = session.execute(
            select(PriceRecord.price)
            .where(PriceRecord.product_url == url)
            .where(PriceRecord.scraped_at >= cutoff)
            .order_by(PriceRecord.scraped_at.asc())
            .limit(1)
        ).first()
        return Decimal(str(row[0])) if row else None


def maybe_alert(name: str, url: str, current: Decimal, threshold: Decimal) -> None:
    if current >= threshold:
        return  # Price is not below threshold — nothing to do

    first_in_window = _earliest_price_in_window(url)
    if first_in_window is not None and first_in_window < threshold:
        return  # Already sent an alert during this cooldown window

    _send_email(name, url, current, threshold)


def _send_email(name: str, url: str, price: Decimal, threshold: Decimal) -> None:
    body = (
        f"Price alert for {name}\n\n"
        f"Current price: ${price:.2f}\n"
        f"Your threshold: ${threshold:.2f}\n"
        f"Product URL: {url}"
    )
    msg = MIMEText(body)
    msg["Subject"] = f"Price drop: {name} is now ${price:.2f}"
    msg["From"] = os.environ["SMTP_FROM"]
    msg["To"] = os.environ["ALERT_EMAIL"]

    with smtplib.SMTP(os.environ["SMTP_HOST"], int(os.environ.get("SMTP_PORT", 587))) as srv:
        srv.starttls()
        srv.login(os.environ["SMTP_USER"], os.environ["SMTP_PASS"])
        srv.send_message(msg)

The logic: if the earliest price recorded in the cooldown window was already below threshold, the drop event already triggered an alert — suppress this one. When the price recovers above threshold and drops again, _earliest_price_in_window returns a price above threshold, and a new alert fires.

Step 6: Scheduler

Each target gets its own APScheduler job with its own interval. max_instances=1 prevents a slow scrape from stacking concurrent runs for the same product.

Python

import json, logging, os, signal, sys
from decimal import Decimal

from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.interval import IntervalTrigger

from alerts import maybe_alert
from client import scrape
from extractor import extract_price
from models import save_price

logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger(__name__)


def check_price(target: dict) -> None:
    name, url = target["name"], target["url"]
    selector = target["selector"]
    threshold = Decimal(str(target["threshold"]))
    currency = target.get("currency", "USD")

    log.info("Checking %s", name)
    try:
        html = scrape(url, wait_for=selector)
    except Exception as exc:
        log.error("Scrape failed for %s: %s", name, exc)
        return

    price = extract_price(html, selector)
    if price is None:
        log.warning("No price extracted for %s (selector=%r)", name, selector)
        return

    log.info("%s → $%.2f (threshold $%.2f)", name, price, threshold)
    save_price(name, url, selector, price, currency)
    maybe_alert(name, url, price, threshold)


def main() -> None:
    with open("targets.json") as f:
        targets = json.load(f)

    scheduler = BlockingScheduler(timezone="UTC")
    for t in targets:
        scheduler.add_job(
            check_price,
            trigger=IntervalTrigger(minutes=t["interval_minutes"]),
            kwargs={"target": t},
            id=t["name"],
            max_instances=1,
            misfire_grace_time=60,
        )
        log.info("Scheduled %s every %d min", t["name"], t["interval_minutes"])

    signal.signal(signal.SIGTERM, lambda *_: (scheduler.shutdown(wait=False), sys.exit(0)))
    signal.signal(signal.SIGINT, lambda *_: (scheduler.shutdown(wait=False), sys.exit(0)))

    scheduler.start()


if __name__ == "__main__":
    main()

Running It

Bash

export ALTERLAB_API_KEY=sk_live_your_key_here
export DATABASE_URL=postgresql://user:pass@localhost:5432/pricedb
export SMTP_HOST=smtp.gmail.com
export SMTP_USER=[email protected]
export SMTP_PASS=your_app_password
export SMTP_FROM=[email protected]
export ALERT_EMAIL=[email protected]

python monitor.py

Expected output:

Bash

2026-03-26 09:00:00 INFO Scheduled Sony WH-1000XM5 every 30 min
2026-03-26 09:00:00 INFO Scheduled Peak Design Travel Backpack 45L every 60 min
2026-03-26 09:00:01 INFO Checking Sony WH-1000XM5
2026-03-26 09:00:03 INFO Sony WH-1000XM5 → $279.99 (threshold $299.00)

Production Considerations

~2sAvg scrape latency (rendered)

30 minSafe minimum poll interval

99.2%AlterLab scrape success rate

4hRecommended alert cooldown

Selector drift is the most common failure mode. E-commerce sites A/B test layouts constantly. Add a dead-man's-switch: if any target returns None from extract_price more than three consecutive times, fire a Slack or webhook alert. You want to know your selector broke before you miss a week of price data.

Request cadence: Even with rotating proxies, behavioral analysis will eventually flag tight polling loops. Keep intervals at 30 minutes or above per target. If you need sub-15-minute monitoring, spread targets across multiple scheduled windows — not faster individual polls.

Containerization: Wrap monitor.py in a Docker container with restart: unless-stopped. APScheduler's BlockingScheduler handles SIGTERM cleanly with the signal handlers above. Mount targets.json as a volume so you can update the target list without rebuilding the image.

Approach Comparison

The break-even point is around two hours of debugging time per month. One Saturday spent chasing an IP block and the economics shift decisively toward managed bypass.

Takeaway

The system you've built covers the full lifecycle: configuration, scraping, extraction, persistence, debounced alerting, and scheduling. The design decisions that matter most:

Stateless job functions: check_price takes a dict and exits cleanly. Unit-testable without mocking a scheduler.
Debounced alerts: Cooldown windows prevent alert fatigue without missing genuine price-drop events.
Separated modules: Swapping the scraping backend, database, or alert channel touches exactly one file.

For the full request parameter reference — including session persistence, custom headers, and screenshot capture — see the API docs.

Was this article helpful?

Try it yourself

Extract product data at scale

Prices, reviews, and inventory — structured JSON with one API call.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://amazon.com/dp/B09V3KXJPB"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

Amazon aggressively fingerprints requests that don't behave like real browsers. Using a scraping API with built-in anti-bot bypass and residential proxy rotation handles Cloudflare, PerimeterX, and similar defenses transparently — your code never touches headers, cookies, or IP rotation logic.

Thirty minutes per target is the safe floor for most e-commerce sites without triggering behavioral rate limits. For tighter intervals, distribute checks across rotating proxy exit points. Polling faster than 5 minutes per product from a single IP is almost always blocked within hours.

Common selectors include `.a-price .a-offscreen` for Amazon, `[data-price]` or `.price__current` for Shopify stores, and `.product__price` for many WooCommerce themes. Inspect the element in DevTools, copy the selector, and validate with BeautifulSoup before adding it to your config. Build a selector map — don't hardcode per-scraper.

Yash Dubey

View all posts

Tutorials

How to Give Your AI Agent Access to eBay Data

Learn how to equip your AI agent with live eBay data using AlterLab’s Extract and Search APIs for reliable, structured access.

Herald Blog Service

Jun 26, 2026

Tutorials

How to Give Your AI Agent Access to SimilarWeb Data

Learn how to give your AI agent direct access to SimilarWeb traffic data using structured extraction, anti‑bot bypass, and MCP tooling—no parsing, no headaches.

Herald Blog Service

Jun 26, 2026

Tutorials

How to Give Your AI Agent Access to Statista Data

Enable AI agents to access public Statista data via AlterLab's APIs for structured extraction, search, and MCP integration—no anti-bot barriers or parsing overhead.

Herald Blog Service

Jun 26, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Architecture

Prerequisites

Step 1: Target Configuration

Step 2: The Scraping Client

Step 3: Price Extraction

Step 4: Persist Price History

Step 5: Debounced Alert Logic

Step 6: Scheduler

Running It

Production Considerations

Approach Comparison

Takeaway

Frequently Asked Questions

Related Articles

How to Give Your AI Agent Access to eBay Data

How to Give Your AI Agent Access to SimilarWeb Data

How to Give Your AI Agent Access to Statista Data

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X: Complete Guide for 2026

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026

Stay in the Loop

Explore AlterLab

Anti-Bot Handling API

JavaScript Rendering API

Pricing

Documentation

Web Scraping API Resources