AlterLabAlterLab
Tutorials

How to Scrape AliExpress: Complete Guide for 2026

Learn how to scrape AliExpress in 2026 with Python. Covers anti-bot bypass, MTOP API extraction, geo-targeting, and scaling your scraping pipeline reliably.

Yash Dubey
Yash Dubey

March 24, 2026

10 min read
6 views

AliExpress hosts over 100 million product listings across virtually every consumer category, updated continuously by millions of sellers. That combination of breadth and velocity makes it a primary data source for price monitoring, catalog enrichment, and market research — and one of the more technically demanding sites to scrape reliably.

This guide covers exactly what it takes to scrape AliExpress in 2026: the anti-bot stack you're up against, how to get rendered product data via API, extracting structured fields, and scaling to production volume.


Why Scrape AliExpress?

Three use cases drive the majority of AliExpress scraping pipelines:

Price monitoring and margin optimization. Retailers sourcing products from AliExpress suppliers need continuous price feeds to protect margins. A single supplier can reprice multiple times per day across thousands of SKUs. At that volume, manual tracking is not viable — you need a scheduled scraper writing to a time-series store and alerting on threshold changes.

Dropshipping catalog management. Dropshippers and resellers build and maintain product catalogs programmatically — pulling titles, specifications, images, shipping estimates, and variant data at scale rather than copying listings by hand. Keeping catalog data fresh as suppliers update listings requires ongoing incremental scraping.

Market research and trend detection. AliExpress's seller ecosystem serves as a leading indicator of global manufacturing trends. New-arrival detection, category-level price analysis, and seller reputation tracking all require structured historical data that only a scraping pipeline can produce at the required depth.


Anti-Bot Challenges on aliexpress.com

AliExpress is one of the harder e-commerce targets to scrape reliably. The defenses are layered and actively maintained.

100% Client-Side Rendering, No JSON-LD

A raw HTTP GET to any AliExpress product URL returns a nearly empty HTML shell. There is no product data in the initial response, no JSON-LD structured markup, no server-side-rendered content you can parse directly. Every visible element — title, price, images, reviews, variants — is injected by JavaScript after the page loads.

This immediately rules out requests + BeautifulSoup as a viable approach. You need either a headless browser running full JS execution, or direct access to the API that delivers the data.

The MTOP API

AliExpress serves all product data through its internal MTOP (Mobile Top) gateway:

Code
https://mtop.aliexpress.com/gw/mtop.aliexpress.pcdetail.data.get/

Calls to this endpoint require valid session cookies, a cryptographically signed token parameter, and request headers that precisely match a genuine browser fingerprint. The signing algorithm is obfuscated in minified JavaScript and changes with deployments. Reverse engineering it is a recurring maintenance burden, not a one-time task.

Multi-Layer Bot Detection

Beyond rendering complexity, AliExpress runs active bot detection across several vectors:

  • TLS/JA3 fingerprinting — non-browser HTTP clients are identified and blocked at the connection layer, before any request reaches application logic
  • Browser fingerprinting — canvas rendering, WebGL parameters, installed font enumeration, and plugin detection are used to distinguish headless browsers from real users
  • Behavioral signals — mouse movement patterns, scroll velocity, and click timing are evaluated; headless browsers with default settings fail these checks
  • IP reputation scoring — datacenter and cloud provider IP ranges are blocked outright or served degraded/empty responses
  • CAPTCHA escalation — triggered on anomalous access patterns, particularly rapid sequential requests from a single session

The anti-bot bypass API at AlterLab handles all of these layers transparently. You send a URL and receive rendered HTML or structured JSON — no fingerprint management, no session maintenance, no CAPTCHA pipeline to operate.

100M+AliExpress Listings
99.1%Scrape Success Rate
1.4sAvg JS Render Time
0CAPTCHAs to Solve

Quick Start with AlterLab API

Full SDK installation and authentication setup is covered in the getting started guide. The short version:

Bash
pip install alterlab

The simplest working request — fetch a rendered AliExpress product page:

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

response = client.scrape(
    "https://www.aliexpress.com/item/1005006789012345.html",
    render_js=True,
    wait_for=".product-title-text"
)

print(response.text)    # Full rendered HTML
print(response.status)  # 200

The render_js=True flag triggers AlterLab's headless browser tier. The wait_for parameter accepts a CSS selector — the request blocks until that element is present in the DOM, ensuring the MTOP API has loaded product data before the snapshot is taken.

For quick testing without the SDK:

Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.aliexpress.com/item/1005006789012345.html",
    "render_js": true,
    "wait_for": ".product-title-text"
  }'
Try it yourself

Try scraping an AliExpress product page — see the full rendered HTML response in seconds


Extracting Structured Data

With rendered HTML in hand, you have two extraction paths: parsing the MTOP API JSON payload (more reliable), or selecting elements from the rendered DOM (simpler, more fragile).

MTOP JSON Payload (Preferred)

The MTOP response is a structured JSON object containing all product modules. Its schema is significantly more stable than AliExpress's DOM, which changes frequently with A/B tests. Use the extract_json=True option to receive the parsed payload directly:

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

response = client.scrape(
    "https://www.aliexpress.com/item/1005006789012345.html",
    render_js=True,
    extract_json=True,
    wait_for=".product-title-text"
)

data = response.json()

# Navigate the MTOP module structure
title    = data["titleModule"]["subject"]
price    = data["priceModule"]["minAmount"]["value"]
currency = data["priceModule"]["minAmount"]["currency"]
rating   = data["feedbackModule"]["tradeScore"]
reviews  = data["feedbackModule"]["tradeCount"]
store    = data["storeModule"]["storeName"]
store_id = data["storeModule"]["storeNum"]
sku_info = data["skuModule"]["productSKUPropertyList"]  # variants/options

print(f"{title}{currency}{price} ({reviews} reviews, {rating}★)")

Key MTOP modules and what they contain:

ModuleFields
titleModulesubject (product title)
priceModuleminAmount, maxAmount, discount
feedbackModuletradeScore (rating), tradeCount (review count)
skuModuleVariant properties, per-SKU pricing
storeModuleStore name, ID, follower count, rating
shippingModuleShipping options, estimated delivery
imageModuleFull-resolution image URLs

CSS Selectors (HTML Fallback)

When working with raw rendered HTML, these selectors are stable as of Q1 2026. Always write defensive parsers — AliExpress runs A/B experiments on its UI continuously:

Python
from bs4 import BeautifulSoup

def parse_product_page(html: str) -> dict:
    soup = BeautifulSoup(html, "lxml")

    title   = soup.select_one("h1.product-title-text")
    price   = soup.select_one("span.product-price-value")
    rating  = soup.select_one("span[class*='overview-rating-average']")
    reviews = soup.select_one("span[class*='product-reviewer-reviews']")
    images  = soup.select("img[class*='magnifier-image']")
    store   = soup.select_one("a[class*='store-header-name']")

    return {
        "title":   title.get_text(strip=True) if title else None,
        "price":   price.get_text(strip=True) if price else None,
        "rating":  rating.get_text(strip=True) if rating else None,
        "reviews": reviews.get_text(strip=True) if reviews else None,
        "images":  [img["src"] for img in images if img.get("src")],
        "store":   store.get_text(strip=True) if store else None,
    }

Search and Category Pages

Scraping search results requires handling dynamic card loading and pagination. AliExpress search uses JavaScript-driven pagination — the page number is a query parameter but content loads asynchronously. Use scroll_to_bottom=True to trigger lazy-loaded product cards:

Python
import alterlab
from bs4 import BeautifulSoup

client = alterlab.Client("YOUR_API_KEY")

def scrape_search_page(keyword: str, page: int = 1) -> list[dict]:
    url = (
        f"https://www.aliexpress.com/wholesale"
        f"?SearchText={keyword.replace(' ', '+')}&page={page}"
    )
    response = client.scrape(
        url,
        render_js=True,
        wait_for="[class*='search-item-card']",
        scroll_to_bottom=True
    )
    soup = BeautifulSoup(response.text, "lxml")
    cards = soup.select("[class*='search-item-card']")
    return [parse_card(card) for card in cards]

def parse_card(card) -> dict:
    title = card.select_one("[class*='item-title']")
    price = card.select_one("[class*='price-current']")
    link  = card.select_one("a[href*='/item/']")
    return {
        "title": title.get_text(strip=True) if title else None,
        "price": price.get_text(strip=True) if price else None,
        "url":   "https:" + link["href"] if link and link.get("href") else None,
    }

Common Pitfalls

Skipping JS execution. The single most common scraping failure mode on AliExpress. Without render_js=True, every response is an empty HTML shell. No exceptions.

Snapshotting before MTOP loads. Even with JavaScript running, MTOP API calls are asynchronous. If you snapshot the page immediately after JS execution starts, price and title modules may not yet be populated. Always use wait_for targeting a product-specific selector like .product-title-text rather than a generic layout element.

Ignoring geo-targeting. AliExpress serves different pricing, shipping options, and availability based on visitor country. A product priced at $4.99 for a US visitor may show differently to a DE or AU visitor. Pin your exit country explicitly when building region-specific monitors:

Python
response = client.scrape(
    "https://www.aliexpress.com/item/1005006789012345.html",
    render_js=True,
    country="DE",
    wait_for=".product-title-text"
)

Reusing sessions aggressively. AliExpress tracks session-level behavior. A single session making hundreds of product requests in quick succession will trigger behavioral flags. Use fresh sessions per request, or rely on automatic session rotation.

Brittle CSS selectors. AliExpress frequently ships UI changes and A/B test variants. A selector that returns data on one request may return None on the next request for the same URL. Prefer MTOP JSON extraction for production pipelines; write defensive None-checks everywhere when using DOM parsing.


Scaling Up

Async Batch Requests

Sequential scraping does not scale. Use asyncio with the async client to maximize throughput:

Python
import asyncio
import alterlab
from alterlab import AsyncClient

client = AsyncClient("YOUR_API_KEY")

async def scrape_batch(urls: list[str]) -> list[dict]:
    tasks = [
        client.scrape(url, render_js=True, wait_for=".product-title-text")
        for url in urls
    ]
    responses = await asyncio.gather(*tasks, return_exceptions=True)

    results = []
    for url, resp in zip(urls, responses):
        if isinstance(resp, Exception):
            print(f"Failed: {url}{resp}")
            continue
        results.append({"url": url, "html": resp.text})
    return results

async def main():
    product_ids = [
        "1005001234567890",
        "1005009876543210",
        "1005005555444333",
    ]
    urls = [f"https://www.aliexpress.com/item/{pid}.html" for pid in product_ids]
    data = await scrape_batch(urls)
    print(f"Scraped {len(data)} products successfully")

asyncio.run(main())

Concurrent request limits and credit costs per request type vary by plan — see AlterLab pricing for the breakdown by tier.

Scheduled Monitoring with Celery

For continuous price monitoring, wrap scrapes in Celery tasks with beat scheduling:

Python
from celery import Celery
from celery.schedules import crontab
import alterlab

app = Celery("aliexpress_monitor", broker="redis://localhost:6379/0")
client = alterlab.Client("YOUR_API_KEY")

@app.task(bind=True, max_retries=3, default_retry_delay=60)
def monitor_product_price(self, product_id: str):
    url = f"https://www.aliexpress.com/item/{product_id}.html"
    try:
        response = client.scrape(
            url,
            render_js=True,
            extract_json=True,
            wait_for=".product-title-text"
        )
        data = response.json()
        price    = data["priceModule"]["minAmount"]["value"]
        currency = data["priceModule"]["minAmount"]["currency"]
        # Write to DB, trigger price alerts, emit to event stream...
        return {"product_id": product_id, "price": price, "currency": currency}
    except Exception as exc:
        raise self.retry(exc=exc)

# Run every 4 hours
app.conf.beat_schedule = {
    "price-monitor": {
        "task": "tasks.monitor_product_price",
        "schedule": crontab(minute=0, hour="*/4"),
        "args": ["1005006789012345"],
    }
}

Large-Scale Pipeline Considerations

At production volume, per-request optimization matters less than overall pipeline throughput:

  • Deduplication before scraping. Hash the URL + a daily timestamp. Skip re-scraping pages that haven't changed. For price monitors, only re-scrape products where the stored price hash changed last cycle.
  • Columnar storage. Write parsed JSON records directly to BigQuery, ClickHouse, or DuckDB rather than a row store. Analytical queries on price history and category trends run 10–100x faster against columnar formats.
  • Backpressure handling. Size your asyncio worker pool to your plan's concurrency ceiling. Use a semaphore to prevent bursting beyond the limit and accumulating retry debt.
  • Error tiering. Distinguish transient failures (timeout, 429) from structural failures (selector not found, schema mismatch). Retry transient failures automatically; dead-letter structural failures for manual inspection.

Key Takeaways

  • AliExpress is 100% client-side rendered with zero JSON-LD. Raw HTTP requests return empty HTML. JavaScript execution is not optional.
  • All product data flows through the MTOP API. Extracting the JSON payload directly is more reliable than parsing rendered HTML — the schema changes less frequently than the DOM.
  • Bot detection covers TLS fingerprinting, browser fingerprinting, behavioral analysis, and IP reputation. Each layer requires independent engineering effort to bypass and ongoing maintenance to keep working.
  • Geo-targeting is non-trivial: price and availability data varies by visitor country. Pin your exit country explicitly for region-specific data collection.
  • Scale with an async request pool, Redis queue, and columnar storage — not sequential requests writing to a relational database.

Building a multi-platform price intelligence pipeline? These guides cover the same scraping patterns applied to other major e-commerce platforms:

Share

Was this article helpful?

Frequently Asked Questions

Scraping publicly accessible data from AliExpress is generally lawful in most jurisdictions, but AliExpress's Terms of Service prohibit automated access. You should review local laws — particularly around data storage and GDPR if operating in the EU — and limit scraping to publicly visible product data rather than user-generated content. Consulting a lawyer for commercial use cases is advisable.
AliExpress uses multi-layered defenses including TLS/JA3 fingerprinting, browser fingerprinting, behavioral analysis, and IP reputation scoring — making DIY bypass stacks expensive to build and maintain. AlterLab's anti-bot bypass API handles all of this transparently: you send a URL, it returns rendered HTML or structured JSON without any fingerprint management or CAPTCHA solving on your end.
Cost depends on request volume and whether you need JavaScript rendering. JS-rendered requests consume more credits than plain HTTP fetches due to headless browser overhead. AlterLab's pricing tiers start at pay-as-you-go rates with volume discounts at higher tiers — see the pricing page for current credit costs per request type and concurrency limits per plan.