Pricing Compare Playground Blog Docs Changelog

Automate Web Scraping in n8n with AlterLab API

Learn how to build automated web scraping workflows in n8n using AlterLab's API. Step-by-step tutorial with Python SDK and cURL examples.

Yash DubeyApril 11, 2026

8 min read

390 views

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

Automate Web Scraping in n8n with AlterLab's API

n8n is a workflow automation tool that connects APIs, databases, and services. Pair it with a scraping API that handles anti-bot bypass, proxy rotation, and headless rendering, and you get a pipeline that pulls structured data from any website on a schedule.

This tutorial shows how to build that pipeline. You will configure an n8n workflow that sends scrape requests, receives clean JSON, and routes the data to a database, spreadsheet, or webhook.

Prerequisites

An n8n instance (self-hosted or cloud)
An API key from alterlab.io/signup
Basic familiarity with n8n's node-based workflow editor

Step 1: Configure the HTTP Request Node

Create a new workflow in n8n. Add an HTTP Request node and configure it as follows:

Method: POST
URL: https://api.alterlab.io/v1/scrape
Authentication: Header Auth
Header Name: X-API-Key
Header Value: Your API key
Send Body: JSON

Set the JSON body to:

JSON

{
  "url": "https://example.com/products",
  "formats": ["json"],
  "min_tier": 3
}

The min_tier parameter controls the scraping tier. Tier 3 enables JavaScript rendering. Set it higher for sites with aggressive bot detection. The anti-bot bypass system auto-escalates if the initial tier fails.

Step 2: Test with cURL First

Before building the full workflow, verify the endpoint works from your terminal. This isolates API issues from n8n configuration problems.

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/products", "formats": ["json"]}'

A successful response returns structured data:

JSON

{
  "status": "success",
  "data": {
    "products": [
      {"name": "Widget A", "price": 29.99},
      {"name": "Widget B", "price": 49.99}
    ]
  },
  "metadata": {
    "url": "https://example.com/products",
    "timestamp": "2026-04-11T10:30:00Z"
  }
}

Try it yourself

Try scraping this page with AlterLab

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Step 3: Build the Full n8n Workflow

A production workflow needs more than a single HTTP request. You need error handling, data transformation, and a destination for the scraped data.

Workflow Structure

Code

[Schedule Trigger] -> [HTTP Request (Scrape)] -> [Code (Parse)] -> [Database/Sheet/Webhook]

Add these nodes in order:

1. Schedule Trigger

Set a cron expression for your scrape frequency. Daily at 6 AM UTC:

Code

0 6 * * *

2. HTTP Request Node

Use the configuration from Step 1. Enable "Continue On Fail" so one failed scrape does not block the entire workflow.

3. Code Node (Data Transformation)

Parse the JSON response and extract the fields you need:

Python

# Access the HTTP Request output
response = json.parse($input.first().json.body)

# Extract product data
products = response.get("data", {}).get("products", [])

# Transform to your schema
items = []
for product in products:
    items.append({
        "json": {
            "name": product["name"],
            "price": product["price"],
            "scraped_at": response["metadata"]["timestamp"],
            "source": response["metadata"]["url"]
        }
    })

return items

4. Destination Node

Connect your output node. Common choices:

Postgres/MySQL: Use the database node to upsert records
Google Sheets: Append rows for lightweight tracking
Webhook: Push to your own API or a Slack channel

Step 4: Handle Multiple URLs

Scraping a single page is straightforward. Real pipelines scrape dozens or hundreds of URLs. Use n8n's Split Out node to fan out requests.

Python

# Code node that outputs multiple URLs
urls = [
    "https://example.com/products/page/1",
    "https://example.com/products/page/2",
    "https://example.com/products/page/3"
]

return [{"json": {"url": u}} for u in urls]

Connect this to a Split Out node, then to your HTTP Request node. Each URL becomes a separate execution branch. n8n processes them in parallel up to your concurrency limit.

Add rate limiting between requests if the target site requires it. Use the Wait node between the Split Out and HTTP Request nodes:

Code

Wait: 2 seconds

Step 5: Add Error Handling and Retries

Scraping fails. Pages change structure, sites go down, anti-bot systems update. Your workflow should handle failures gracefully.

Retry Configuration

In the HTTP Request node settings:

Retry On Fail: Enable
Max Retries: 3
Retry Backoff: Exponential

Error Routing

Add an error output branch from the HTTP Request node:

Code

[HTTP Request] --(success)--> [Parse] --> [Database]
       |
       --(error)--> [Error Handler] --> [Alert/Log]

The error handler can log failures to a separate sheet, send a Slack notification, or queue the URL for a retry with a higher tier.

Python

# Capture failed URLs for retry
error_data = $input.first().json

failed_urls.append({
    "url": error_data.get("url"),
    "error": error_data.get("error"),
    "timestamp": datetime.utcnow().isoformat(),
    "retry_tier": 4  # escalate tier on retry
})

return [{"json": {"failed": failed_urls}}]

Step 6: Use Cortex AI for Structured Extraction

Some pages do not have clean HTML structures. Product listings buried in JavaScript, unstructured text, or dynamic content require a different approach. Cortex AI extracts structured data using natural language instructions.

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/reviews",
    "formats": ["json"],
    "cortex": {
      "prompt": "Extract reviewer name, rating (1-5), and review text from each review block"
    }
  }'

The response returns data matching your schema:

JSON

{
  "status": "success",
  "data": {
    "reviews": [
      {
        "reviewer_name": "Jane D.",
        "rating": 5,
        "review_text": "Excellent product, fast shipping."
      },
      {
        "reviewer_name": "Mark S.",
        "rating": 4,
        "review_text": "Good quality, slightly overpriced."
      }
    ]
  }
}

In n8n, the Cortex output works identically to standard JSON output. Route it through the same Code and Database nodes.

Step 7: Monitor and Alert on Changes

Scraping is not always about collecting new data. Sometimes you need to detect changes on existing pages. Price drops, stock availability, competitor updates, regulatory filings.

Configure monitoring by storing previous scrape results and comparing them on each run:

Python

# Compare current scrape with previous state
current = $input.first().json
previous = get_previous_state(current["url"])  # from database

changes = []
for key in current["data"]:
    if key not in previous:
        changes.append({"field": key, "action": "added", "value": current["data"][key]})
    elif current["data"][key] != previous[key]:
        changes.append({
            "field": key,
            "action": "changed",
            "old": previous[key],
            "new": current["data"][key]
        })

# Only pass through if changes detected
if changes:
    return [{"json": {"url": current["url"], "changes": changes}}]
return []

When changes exist, route to an alert node. When nothing changed, the workflow exits silently.

3Lines of config to start

JSONOutput format

cronScheduling syntax

autoProxy rotation

Cost Considerations

Scraping pipelines can run expensive if you are not careful. A few practices:

Cache aggressively: Do not re-scrape pages that have not changed. Store hashes of previous responses and skip identical results.
Use the lowest tier that works: Start with min_tier: 1 for static pages. Only escalate to tier 3+ for JavaScript-heavy sites.
Batch URLs: Group related URLs into single workflow runs rather than triggering separate workflows per URL.
Set spend limits: API keys support spend caps. Set them per workflow to prevent runaway costs.

Check pricing for current rates. You pay for what you use with no monthly minimums.

Complete Workflow Example

Here is the full n8n workflow JSON for a daily product price scrape:

JSON

{
  "name": "Daily Price Scraper",
  "nodes": [
    {
      "name": "Schedule",
      "type": "n8n-nodes-base.scheduleTrigger",
      "parameters": {
        "rule": { "interval": ["days"], "triggerAtHour": 6 }
      }
    },
    {
      "name": "Scrape Products",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "https://api.alterlab.io/v1/scrape",
        "authentication": "headerAuth",
        "body": {
          "url": "={{ $json.url }}",
          "formats": ["json"],
          "min_tier": 3
        },
        "options": {
          "retryOnFail": true,
          "maxTries": 3
        }
      }
    },
    {
      "name": "Parse Response",
      "type": "n8n-nodes-base.code",
      "parameters": {
        "jsCode": "const data = $input.first().json.body;\nreturn data.data.products.map(p => ({ json: p }));"
      }
    },
    {
      "name": "Save to Database",
      "type": "n8n-nodes-base.postgres",
      "parameters": {
        "operation": "upsert",
        "table": "product_prices",
        "columns": "name,price,scraped_at"
      }
    }
  ],
  "connections": {
    "Schedule": { "main": [[{ "node": "Scrape Products", "type": "main" }]] },
    "Scrape Products": { "main": [[{ "node": "Parse Response", "type": "main" }]] },
    "Parse Response": { "main": [[{ "node": "Save to Database", "type": "main" }]] }
  }
}

Import this into n8n via the workflow editor, replace the authentication credentials with your API key, and adjust the URL and database schema to match your use case.

Troubleshooting

Empty responses: The page may require a higher tier. Increase min_tier to 4 or 5. Check the API docs for tier descriptions.

Rate limit errors: Add a Wait node between requests. Start with 1-2 seconds and increase if needed.

CAPTCHA blocks: Set min_tier: 5 to enable CAPTCHA solving. This costs more per request but eliminates manual intervention.

Schema drift: Websites change their HTML structure. Cortex AI handles this better than CSS selectors since it uses semantic understanding. Switch to Cortex if your selectors break frequently.

n8n timeout: Long-running scrapes can exceed n8n's execution timeout. For large batches, use the webhook pattern. Configure AlterLab to push results to an n8n webhook URL instead of polling.

Takeaway

n8n handles orchestration. AlterLab handles extraction. Together they give you a scraping pipeline that runs on a schedule, handles failures, and delivers clean data to your systems.

Start with a single URL and a basic HTTP Request node. Add error handling, multi-URL support, and change detection as your needs grow. The quickstart guide covers API setup in under five minutes.

Was this article helpful?

Try it yourself

One API call. Any language.

Python SDK, Node SDK, or plain HTTP. Get started in under a minute.

from alterlab import AlterLab

client = AlterLab(api_key="YOUR_KEY")
result = client.scrape("https://example.com")
print(result.markdown)

No credit card required · 5,000 free requests

Frequently Asked Questions

Use n8n's HTTP Request node to POST to https://api.alterlab.io/v1/scrape with your API key in the X-API-Key header. You can also use the Python SDK in a Code node for more complex workflows.

Yes. AlterLab automatically handles anti-bot detection, CAPTCHAs, and JavaScript rendering. You set the tier level via the min_tier parameter and the API handles the rest.

AlterLab returns clean JSON, Markdown, or plain text. JSON works best in n8n since it maps directly to node outputs for downstream processing.

Yash Dubey

View all posts

Tutorials

How to Give Your AI Agent Access to Medium Data

Learn how to connect your AI agent to Medium using AlterLab's Extract API to retrieve structured, public data for RAG pipelines and content intelligence.

Herald Blog Service

Jul 9, 2026

Best Practices

Managing Headless Browser Overhead in Data Pipelines

Learn how to reduce latency and resource consumption when using headless browsers for data extraction in large-scale web scraping pipelines.

Herald Blog Service

Jul 8, 2026

Tutorials

How to Give Your AI Agent Access to AngelList Data

Enable AI agents to retrieve AngelList job data via AlterLab structured extraction with clean JSON output and automatic anti bot handling

Herald Blog Service

Jul 7, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

Automate Web Scraping in n8n with AlterLab's API

Prerequisites

Step 1: Configure the HTTP Request Node

Step 2: Test with cURL First

Step 3: Build the Full n8n Workflow

Workflow Structure

Step 4: Handle Multiple URLs

Step 5: Add Error Handling and Retries

Retry Configuration

Error Routing

Step 6: Use Cortex AI for Structured Extraction

Step 7: Monitor and Alert on Changes

Cost Considerations

Complete Workflow Example

Troubleshooting

Takeaway

Frequently Asked Questions

Related Articles

How to Give Your AI Agent Access to Medium Data

Managing Headless Browser Overhead in Data Pipelines

How to Give Your AI Agent Access to AngelList Data

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

How to Scrape Twitter/X: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: Which Scraping API Is Better in 2026?

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Stay in the Loop

Explore AlterLab

Python Web Scraping API

Compare Scraping APIs

Pricing

Documentation

Web Scraping API Resources