How to Scrape AliExpress: Complete Guide for 2026
Learn how to scrape AliExpress in 2026 with Python. Covers anti-bot bypass, MTOP API extraction, geo-targeting, and scaling your scraping pipeline reliably.
March 24, 2026
AliExpress hosts over 100 million product listings across virtually every consumer category, updated continuously by millions of sellers. That combination of breadth and velocity makes it a primary data source for price monitoring, catalog enrichment, and market research — and one of the more technically demanding sites to scrape reliably.
This guide covers exactly what it takes to scrape AliExpress in 2026: the anti-bot stack you're up against, how to get rendered product data via API, extracting structured fields, and scaling to production volume.
Why Scrape AliExpress?
Three use cases drive the majority of AliExpress scraping pipelines:
Price monitoring and margin optimization. Retailers sourcing products from AliExpress suppliers need continuous price feeds to protect margins. A single supplier can reprice multiple times per day across thousands of SKUs. At that volume, manual tracking is not viable — you need a scheduled scraper writing to a time-series store and alerting on threshold changes.
Dropshipping catalog management. Dropshippers and resellers build and maintain product catalogs programmatically — pulling titles, specifications, images, shipping estimates, and variant data at scale rather than copying listings by hand. Keeping catalog data fresh as suppliers update listings requires ongoing incremental scraping.
Market research and trend detection. AliExpress's seller ecosystem serves as a leading indicator of global manufacturing trends. New-arrival detection, category-level price analysis, and seller reputation tracking all require structured historical data that only a scraping pipeline can produce at the required depth.
Anti-Bot Challenges on aliexpress.com
AliExpress is one of the harder e-commerce targets to scrape reliably. The defenses are layered and actively maintained.
100% Client-Side Rendering, No JSON-LD
A raw HTTP GET to any AliExpress product URL returns a nearly empty HTML shell. There is no product data in the initial response, no JSON-LD structured markup, no server-side-rendered content you can parse directly. Every visible element — title, price, images, reviews, variants — is injected by JavaScript after the page loads.
This immediately rules out requests + BeautifulSoup as a viable approach. You need either a headless browser running full JS execution, or direct access to the API that delivers the data.
The MTOP API
AliExpress serves all product data through its internal MTOP (Mobile Top) gateway:
https://mtop.aliexpress.com/gw/mtop.aliexpress.pcdetail.data.get/Calls to this endpoint require valid session cookies, a cryptographically signed token parameter, and request headers that precisely match a genuine browser fingerprint. The signing algorithm is obfuscated in minified JavaScript and changes with deployments. Reverse engineering it is a recurring maintenance burden, not a one-time task.
Multi-Layer Bot Detection
Beyond rendering complexity, AliExpress runs active bot detection across several vectors:
- TLS/JA3 fingerprinting — non-browser HTTP clients are identified and blocked at the connection layer, before any request reaches application logic
- Browser fingerprinting — canvas rendering, WebGL parameters, installed font enumeration, and plugin detection are used to distinguish headless browsers from real users
- Behavioral signals — mouse movement patterns, scroll velocity, and click timing are evaluated; headless browsers with default settings fail these checks
- IP reputation scoring — datacenter and cloud provider IP ranges are blocked outright or served degraded/empty responses
- CAPTCHA escalation — triggered on anomalous access patterns, particularly rapid sequential requests from a single session
The anti-bot bypass API at AlterLab handles all of these layers transparently. You send a URL and receive rendered HTML or structured JSON — no fingerprint management, no session maintenance, no CAPTCHA pipeline to operate.
Quick Start with AlterLab API
Full SDK installation and authentication setup is covered in the getting started guide. The short version:
pip install alterlabThe simplest working request — fetch a rendered AliExpress product page:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
"https://www.aliexpress.com/item/1005006789012345.html",
render_js=True,
wait_for=".product-title-text"
)
print(response.text) # Full rendered HTML
print(response.status) # 200The render_js=True flag triggers AlterLab's headless browser tier. The wait_for parameter accepts a CSS selector — the request blocks until that element is present in the DOM, ensuring the MTOP API has loaded product data before the snapshot is taken.
For quick testing without the SDK:
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.aliexpress.com/item/1005006789012345.html",
"render_js": true,
"wait_for": ".product-title-text"
}'Try scraping an AliExpress product page — see the full rendered HTML response in seconds
Extracting Structured Data
With rendered HTML in hand, you have two extraction paths: parsing the MTOP API JSON payload (more reliable), or selecting elements from the rendered DOM (simpler, more fragile).
MTOP JSON Payload (Preferred)
The MTOP response is a structured JSON object containing all product modules. Its schema is significantly more stable than AliExpress's DOM, which changes frequently with A/B tests. Use the extract_json=True option to receive the parsed payload directly:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
"https://www.aliexpress.com/item/1005006789012345.html",
render_js=True,
extract_json=True,
wait_for=".product-title-text"
)
data = response.json()
# Navigate the MTOP module structure
title = data["titleModule"]["subject"]
price = data["priceModule"]["minAmount"]["value"]
currency = data["priceModule"]["minAmount"]["currency"]
rating = data["feedbackModule"]["tradeScore"]
reviews = data["feedbackModule"]["tradeCount"]
store = data["storeModule"]["storeName"]
store_id = data["storeModule"]["storeNum"]
sku_info = data["skuModule"]["productSKUPropertyList"] # variants/options
print(f"{title} — {currency}{price} ({reviews} reviews, {rating}★)")Key MTOP modules and what they contain:
| Module | Fields |
|---|---|
titleModule | subject (product title) |
priceModule | minAmount, maxAmount, discount |
feedbackModule | tradeScore (rating), tradeCount (review count) |
skuModule | Variant properties, per-SKU pricing |
storeModule | Store name, ID, follower count, rating |
shippingModule | Shipping options, estimated delivery |
imageModule | Full-resolution image URLs |
CSS Selectors (HTML Fallback)
When working with raw rendered HTML, these selectors are stable as of Q1 2026. Always write defensive parsers — AliExpress runs A/B experiments on its UI continuously:
from bs4 import BeautifulSoup
def parse_product_page(html: str) -> dict:
soup = BeautifulSoup(html, "lxml")
title = soup.select_one("h1.product-title-text")
price = soup.select_one("span.product-price-value")
rating = soup.select_one("span[class*='overview-rating-average']")
reviews = soup.select_one("span[class*='product-reviewer-reviews']")
images = soup.select("img[class*='magnifier-image']")
store = soup.select_one("a[class*='store-header-name']")
return {
"title": title.get_text(strip=True) if title else None,
"price": price.get_text(strip=True) if price else None,
"rating": rating.get_text(strip=True) if rating else None,
"reviews": reviews.get_text(strip=True) if reviews else None,
"images": [img["src"] for img in images if img.get("src")],
"store": store.get_text(strip=True) if store else None,
}Search and Category Pages
Scraping search results requires handling dynamic card loading and pagination. AliExpress search uses JavaScript-driven pagination — the page number is a query parameter but content loads asynchronously. Use scroll_to_bottom=True to trigger lazy-loaded product cards:
import alterlab
from bs4 import BeautifulSoup
client = alterlab.Client("YOUR_API_KEY")
def scrape_search_page(keyword: str, page: int = 1) -> list[dict]:
url = (
f"https://www.aliexpress.com/wholesale"
f"?SearchText={keyword.replace(' ', '+')}&page={page}"
)
response = client.scrape(
url,
render_js=True,
wait_for="[class*='search-item-card']",
scroll_to_bottom=True
)
soup = BeautifulSoup(response.text, "lxml")
cards = soup.select("[class*='search-item-card']")
return [parse_card(card) for card in cards]
def parse_card(card) -> dict:
title = card.select_one("[class*='item-title']")
price = card.select_one("[class*='price-current']")
link = card.select_one("a[href*='/item/']")
return {
"title": title.get_text(strip=True) if title else None,
"price": price.get_text(strip=True) if price else None,
"url": "https:" + link["href"] if link and link.get("href") else None,
}Common Pitfalls
Skipping JS execution. The single most common scraping failure mode on AliExpress. Without render_js=True, every response is an empty HTML shell. No exceptions.
Snapshotting before MTOP loads. Even with JavaScript running, MTOP API calls are asynchronous. If you snapshot the page immediately after JS execution starts, price and title modules may not yet be populated. Always use wait_for targeting a product-specific selector like .product-title-text rather than a generic layout element.
Ignoring geo-targeting. AliExpress serves different pricing, shipping options, and availability based on visitor country. A product priced at $4.99 for a US visitor may show differently to a DE or AU visitor. Pin your exit country explicitly when building region-specific monitors:
response = client.scrape(
"https://www.aliexpress.com/item/1005006789012345.html",
render_js=True,
country="DE",
wait_for=".product-title-text"
)Reusing sessions aggressively. AliExpress tracks session-level behavior. A single session making hundreds of product requests in quick succession will trigger behavioral flags. Use fresh sessions per request, or rely on automatic session rotation.
Brittle CSS selectors. AliExpress frequently ships UI changes and A/B test variants. A selector that returns data on one request may return None on the next request for the same URL. Prefer MTOP JSON extraction for production pipelines; write defensive None-checks everywhere when using DOM parsing.
Scaling Up
Async Batch Requests
Sequential scraping does not scale. Use asyncio with the async client to maximize throughput:
import asyncio
import alterlab
from alterlab import AsyncClient
client = AsyncClient("YOUR_API_KEY")
async def scrape_batch(urls: list[str]) -> list[dict]:
tasks = [
client.scrape(url, render_js=True, wait_for=".product-title-text")
for url in urls
]
responses = await asyncio.gather(*tasks, return_exceptions=True)
results = []
for url, resp in zip(urls, responses):
if isinstance(resp, Exception):
print(f"Failed: {url} — {resp}")
continue
results.append({"url": url, "html": resp.text})
return results
async def main():
product_ids = [
"1005001234567890",
"1005009876543210",
"1005005555444333",
]
urls = [f"https://www.aliexpress.com/item/{pid}.html" for pid in product_ids]
data = await scrape_batch(urls)
print(f"Scraped {len(data)} products successfully")
asyncio.run(main())Concurrent request limits and credit costs per request type vary by plan — see AlterLab pricing for the breakdown by tier.
Scheduled Monitoring with Celery
For continuous price monitoring, wrap scrapes in Celery tasks with beat scheduling:
from celery import Celery
from celery.schedules import crontab
import alterlab
app = Celery("aliexpress_monitor", broker="redis://localhost:6379/0")
client = alterlab.Client("YOUR_API_KEY")
@app.task(bind=True, max_retries=3, default_retry_delay=60)
def monitor_product_price(self, product_id: str):
url = f"https://www.aliexpress.com/item/{product_id}.html"
try:
response = client.scrape(
url,
render_js=True,
extract_json=True,
wait_for=".product-title-text"
)
data = response.json()
price = data["priceModule"]["minAmount"]["value"]
currency = data["priceModule"]["minAmount"]["currency"]
# Write to DB, trigger price alerts, emit to event stream...
return {"product_id": product_id, "price": price, "currency": currency}
except Exception as exc:
raise self.retry(exc=exc)
# Run every 4 hours
app.conf.beat_schedule = {
"price-monitor": {
"task": "tasks.monitor_product_price",
"schedule": crontab(minute=0, hour="*/4"),
"args": ["1005006789012345"],
}
}Large-Scale Pipeline Considerations
At production volume, per-request optimization matters less than overall pipeline throughput:
- Deduplication before scraping. Hash the URL + a daily timestamp. Skip re-scraping pages that haven't changed. For price monitors, only re-scrape products where the stored price hash changed last cycle.
- Columnar storage. Write parsed JSON records directly to BigQuery, ClickHouse, or DuckDB rather than a row store. Analytical queries on price history and category trends run 10–100x faster against columnar formats.
- Backpressure handling. Size your asyncio worker pool to your plan's concurrency ceiling. Use a semaphore to prevent bursting beyond the limit and accumulating retry debt.
- Error tiering. Distinguish transient failures (timeout, 429) from structural failures (selector not found, schema mismatch). Retry transient failures automatically; dead-letter structural failures for manual inspection.
Key Takeaways
- AliExpress is 100% client-side rendered with zero JSON-LD. Raw HTTP requests return empty HTML. JavaScript execution is not optional.
- All product data flows through the MTOP API. Extracting the JSON payload directly is more reliable than parsing rendered HTML — the schema changes less frequently than the DOM.
- Bot detection covers TLS fingerprinting, browser fingerprinting, behavioral analysis, and IP reputation. Each layer requires independent engineering effort to bypass and ongoing maintenance to keep working.
- Geo-targeting is non-trivial: price and availability data varies by visitor country. Pin your exit country explicitly for region-specific data collection.
- Scale with an async request pool, Redis queue, and columnar storage — not sequential requests writing to a relational database.
Related Guides
Building a multi-platform price intelligence pipeline? These guides cover the same scraping patterns applied to other major e-commerce platforms:
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended
Selenium Bot Detection: Why You Get Caught and How to Avoid It
Why Your Headless Browser Gets Detected (and How to Fix It)
Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure
Scraping E-Commerce Sites at Scale Without Getting Blocked
Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading
Selenium Bot Detection: Why You Get Caught and How to Avoid It
Why Your Headless Browser Gets Detected (and How to Fix It)
Web Scraping APIs vs DIY Scrapers: When to Stop Building Infrastructure
Scraping E-Commerce Sites at Scale Without Getting Blocked
Web Scraping with Node.js and Puppeteer: The Complete 2026 Guide
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.