Shopify Stores Data API: Extract Structured JSON in 2026
Tutorials

Shopify Stores Data API: Extract Structured JSON in 2026

Learn how to extract structured JSON data from Shopify Stores using AlterLab's Extract API. Get typed e-commerce data (title, price, SKU) without HTML parsing.

4 min read
4 views

This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.

TL;DR

Use AlterLab's Extract API to get structured JSON from Shopify Stores by defining a schema for fields like title, price, and SKU. Pass the URL and schema to receive validated, typed data—no HTML parsing needed. This approach handles anti-bot measures and delivers ready-to-use data for pipelines.

Why use Shopify Stores data?

Engineers extract Shopify Stores data to:

  • Train product recommendation models using real-time pricing and availability
  • Build competitive intelligence dashboards tracking SKU changes across stores
  • Enrich CRM systems with product catalog data from public storefront updates These use cases require clean, structured data—exactly what AlterLab's Extract API delivers.

What data can you extract?

From publicly accessible Shopify Stores pages, you can extract:

  • title: Product name (string)
  • price: Current price (string to preserve formatting)
  • currency: ISO currency code (e.g., "USD")
  • sku: Stock Keeping Unit (string)
  • availability: "in stock", "out of stock", or pre-order status (string)
  • rating: Average review score (string, e.g., "4.5") AlterLab returns these as typed JSON matching your schema—no cleanup required.

The extraction approach

Raw HTTP requests + HTML parsing fail on Shopify Stores due to:

  • JavaScript-rendered content requiring headless browsers
  • Anti-bot measures (rate limits, CAPTCHAs) blocking scrapers
  • Frequent frontend changes breaking CSS selectors AlterLab's Extract API solves this by combining AI-powered data understanding with automated bypass. You define what you want via JSON schema; AlterLab handles how to get it from public pages reliably.

Quick start with AlterLab Extract API

First, install the client: pip install alterlab. See the getting started guide for setup.

Python example

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

schema = {
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "Product title from product page"
    },
    "price": {
      "type": "string",
      "description": "Current price (e.g., '29.99')"
    },
    "currency": {
      "type": "string",
      "description": "3-letter currency code (e.g., 'USD')"
    },
    "sku": {
      "type": "string",
      "description": "Stock Keeping Unit"
    },
    "availability": {
      "type": "string",
      "description": "Availability status"
    },
    "rating": {
      "type": "string",
      "description": "Average rating (e.g., '4.2')"
    }
  }
}

result = client.extract(
    url="https://shopify.com/example-product",
    schema=schema,
    formats=["json"]  # Ensures JSON output
)
print(result.data)

Output:

JSON
{
  "title": "Wireless Bluetooth Headphones",
  "price": "89.99",
  "currency": "USD",
  "sku": "WBH-001",
  "availability": "in stock",
  "rating": "4.5"
}

See full Extract API docs for parameter details.

cURL equivalent

Bash
curl -X POST https://api.alterlab.io/v1/extract \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://shopify.com/example-product",
    "schema": {
      "properties": {
        "title": {"type": "string"},
        "price": {"type": "string"},
        "currency": {"type": "string"},
        "sku": {"type": "string"},
        "availability": {"type": "string"},
        "rating": {"type": "string"}
      }
    },
    "formats": ["json"]
  }'

Batch processing example

For high-volume extraction (e.g., 10k+ products), use async jobs:

Python
import alterlab
from alterlab import BatchJob

client = alterlab.Client("YOUR_API_KEY")

urls = [
    "https://store1.myshopify.com/products/a",
    "https://store2.myshopify.com/products/b",
    # ... 10k more URLs
]

job = BatchJob(
    client=client,
    extract_func=lambda u: client.extract(url=u, schema=schema, formats=["json"]),
    urls=urls,
    max_concurrent=50  # Adjust based on your plan
)

results = []
for result in job.run():
    if result.is_success:
        results.append(result.data)
    else:
        print(f"Failed {result.url}: {result.error}")

print(f"Extracted {len(results)} products")

This handles retries, rate limiting, and progress tracking automatically.

Define your schema

The schema parameter is JSON Schema draft-07. AlterLab validates output against it:

  • Type safety: Ensures price is string (not number) to avoid float precision issues
  • Required fields: Add "required": ["title", "price"] to enforce critical data
  • Descriptions: Help the AI understand context (e.g., "SKU as shown on product page") AlterLab returns only validated data—failed validations trigger retries with different extraction strategies. This eliminates post-processing cleanup.

Handle pagination and scale

Shopify Stores often paginate collections. For scale:

  1. Extract pagination links: First scrape collection page to get product URLs
  2. Batch process URLs: Use the async pattern above with concurrency tuned to your pricing tier
  3. Rate limit awareness: AlterLab automatically respects Retry-After headers and exponential backs off
  4. Cost control: Set max_concurrent based on your credit balance—each successful extraction costs ~$0.002-$0.005 For monitoring changes over time, combine with AlterLab's Monitoring feature to track price/availability deltas.

Key takeaways

  • AlterLab's Extract API turns Shopify Stores into a structured data API via schema-driven JSON extraction
  • Focus on defining your data model (schema)—not fighting anti-bot measures or parsing HTML
  • Output is immediately usable in data pipelines, ML training, or analytics tools
  • Always verify public data access complies with target site's policies and robots.txt
  • Start with the Extract API docs to build your first extraction in under 5 minutes
99.2%Extraction Accuracy
1.4sAvg Response Time
100%Typed JSON Output
Try it yourself

Extract structured e-commerce data from Shopify Stores

```
Share

Was this article helpful?

Frequently Asked Questions

Shopify provides admin APIs for store management but no public API for scraping storefront data. AlterLab fills this gap by extracting structured JSON from publicly accessible storefront pages using AI, respecting robots.txt and rate limits.
You can extract publicly available e-commerce fields like product title, price, currency, SKU, availability, and rating. AlterLab validates output against your JSON schema to ensure typed, structured data without HTML parsing.
AlterLab charges per successful extraction request with pay-as-you-go pricing. Credits never expire and there are no minimums. See [pricing](/pricing) for volume discounts—typical Shopify Stores extraction costs $0.002-$0.005 per request.