Pricing Compare Playground Blog Docs Changelog

Automate Lead Enrichment in n8n with Web Scraping APIs

Build a deterministic n8n workflow to extract structured JSON data from public company websites using automated data pipelines and headless browsers.

Yash DubeyApril 24, 2026

5 min read

155 views

AlterLab handles this automatically — scrape any URL with one API call. No infrastructure required.

Try it free

Lead enrichment pipelines typically rely on static, outdated third-party databases. By extracting data from public company websites directly, you guarantee data freshness and relevance. n8n provides the orchestration layer to move this data, but extracting structured data from unstructured HTML requires a dedicated scraping layer.

We will build an n8n workflow that takes a raw company URL, processes it through a headless browser, extracts specific firmographic data using LLM-based parsing, and pushes the structured JSON into a database.

The Core Concept: HTML to JSON

Raw HTML is noisy. Writing regex or CSS selectors for hundreds of different company website layouts is brittle and requires constant maintenance. The modern approach offloads the parsing to a scraping API that accepts a URL and a desired JSON schema, returning exactly the data requested.

AlterLab handles this via Cortex AI. You pass a target URL and a schema definition. The API handles the network routing, renders the JavaScript, parses the DOM, and returns the variables matching your schema.

Try it yourself

Extract company data as JSON with AlterLab

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/about"}'

Enable JavaScript to try the live demo, or sign up to use the API directly.

Pipeline Architecture

Our automated pipeline consists of four distinct stages inside n8n.

Before building the workflow, you need an active n8n instance (self-hosted or cloud) and a scraping API credential. If you do not have one, create an account to get an API key.

Prototyping the Extraction Request

Before configuring the n8n HTTP Request node, test the extraction logic locally. We want to extract three data points from a target company website: the main support email, the primary product offering, and the physical address.

Here is how to test the extraction using standard tools.

Bash

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example-b2b-site.com/contact",
    "formats": ["json"],
    "cortex": {
      "schema": {
        "support_email": "string",
        "primary_product": "string",
        "headquarters_address": "string"
      }
    }
  }'

If you are building custom n8n nodes or prefer writing Python scripts for your data engineering tasks, you can achieve the exact same operation using our Python SDK.

Python

import alterlab

client = alterlab.Client("YOUR_API_KEY")

response = client.scrape(
    url="https://example-b2b-site.com/contact",
    formats=["json"],
    cortex={
        "schema": {
            "support_email": "string",
            "primary_product": "string",
            "headquarters_address": "string"
        }
    }
)

print(response.json)

Both methods return a deterministic JSON object mapping the schema keys to the extracted values. We will use this exact payload structure inside our n8n workflow.

Step 1: Configuring the n8n Trigger

Start by adding a Postgres node (or your preferred database node) to n8n. Set the operation to Execute Query.

Write a query that selects records lacking enrichment data. Limit the batch size to prevent overwhelming the downstream nodes.

SQL

SELECT id, domain 
FROM leads 
WHERE enrichment_status = 'pending' 
LIMIT 10;

Add a Schedule Trigger to run this query every hour. This creates a steady, predictable throughput for the enrichment pipeline.

Step 2: The Scraping Node

Add an HTTP Request node directly after the database trigger. This node loops through the domains returned by the database and calls the scraping API.

Configure the HTTP Request node with these settings:

Method: POST
URL: https://api.alterlab.io/v1/scrape
Authentication: Generic Credential Type (Header Auth)
Header Name: X-API-Key

In the Body Parameters section, use n8n expressions to dynamically inject the domain from the previous node.

JSON

{
  "url": "https://{{ $json.domain }}",
  "formats": ["json"],
  "cortex": {
    "schema": {
      "support_email": "string",
      "primary_product": "string",
      "headquarters_address": "string"
    }
  }
}

Managing Execution Tiers

Public B2B directories and heavily trafficked company websites often deploy strict security measures. Standard HTTP requests will fail with 403 Forbidden errors.

Your scraping configuration needs to account for this. By default, AlterLab automatically escalates the request through different proxy and browser tiers until it succeeds. You pay for what you use based on the tier required to access the public data. If you know a target domain requires JavaScript rendering, you can bypass the lower tiers by setting a min_tier parameter in your JSON body. This reduces total latency. Read more about handling complex targets in our anti-bot solution documentation.

Step 3: Parsing and Validation

Add an IF node after the HTTP Request. Network requests fail, target sites go down, and domains expire. You must handle these states gracefully.

Configure the IF node to check the HTTP status code.

Condition 1: {{ $response.statusCode }} Equal to 200.
Condition 2: {{ $json.data.support_email }} Is Not Empty.

If the conditions are true, route the workflow to the True branch. If false, route to an error handling branch that updates the database record status to failed to prevent infinite retry loops.

Step 4: Storage and Database Updates

On the True branch of your IF node, add another Postgres node. Set the operation to Update.

Map the extracted JSON data to your database columns using n8n expressions:

email: {{ $json.data.support_email }}
product_focus: {{ $json.data.primary_product }}
location: {{ $json.data.headquarters_address }}
enrichment_status: completed

Ensure you use the id from the original trigger node as the update key.

Scaling the Workflow

Once the pipeline runs successfully for small batches, you will need to adjust the configuration for higher volume.

Concurrency and Rate Limits

n8n processes items sequentially by default. To process leads faster, use the Split In Batches node. Set the batch size to 5 or 10. The HTTP Request node will fire these requests in parallel. Ensure your AlterLab API key has a sufficient concurrency limit to handle the batch size.

Asynchronous Processing

For complex sites requiring heavy JavaScript execution, the request might take longer than n8n's default HTTP timeout. Instead of keeping the HTTP connection open, switch to asynchronous webhooks.

Create a Webhook node in n8n and copy its test URL.
Update your AlterLab HTTP Request body to include the webhook URL: {"url": "...", "webhook_url": "YOUR_N8N_WEBHOOK"}.
n8n will immediately receive a 202 Accepted response.
The scraping API will process the page in the background and POST the final JSON payload to your n8n Webhook node once complete.

Takeaways

Extracting unstructured web data into structured database records does not require massive custom codebases. By connecting n8n's orchestration with a dedicated scraping API, you build a resilient pipeline that adapts to site changes automatically. Focus your engineering effort on how to use the enriched data, not on maintaining CSS selectors. Ensure you configure error routing, handle status codes, and test your schemas thoroughly before scaling the batch sizes.

Was this article helpful?

Try it yourself

See how AlterLab compares — try it yourself

One API call handles JavaScript rendering, challenge resolution, and proxy rotation. 5,000 free requests to start.

curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

No credit card required · 5,000 free requests

Frequently Asked Questions

You can use an HTTP Request node to send the target URL to a scraping API equipped with LLM extraction. The API parses the DOM and returns a clean JSON payload directly to your n8n workflow.

n8n itself does not render JavaScript. You must route the HTTP request through a headless browser service or a scraping API that executes the JavaScript before returning the response to n8n.

Use the Webhook node in n8n to receive asynchronous POST requests. Instead of holding the HTTP node open, your scraping API processes the page in the background and pushes the final data to the n8n webhook URL.

Yash Dubey

View all posts

Tutorials

MarketWatch Data API: Extract Structured JSON in 2026

Learn how to build a production-ready marketwatch data api pipeline to extract structured JSON finance data using schema-based extraction and AlterLab.

Herald Blog Service

Jul 22, 2026

Tutorials

How to Scrape AngelList Data: Complete Guide for 2026

Learn to scrape AngelList jobs data ethically using AlterLab's API with Python and Node.js examples. Covers anti-bot handling, structured extraction, and cost-effective scaling.

Herald Blog Service

Jul 22, 2026

Tutorials

Building Reliable Agentic Browsing Pipelines with Real-Time Web Data and MCP Servers

Learn how to construct adaptive scraping pipelines using MCP servers and AlterLab's anti-bot infrastructure for reliable real-time web data collection at scale.

Herald Blog Service

Jul 22, 2026

Stay in the Loop

Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.

Web Scraping API Resources

Part of the Web Scraping API Documentation cluster

Web Scraping API Documentation

Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.

Pillar page

JavaScript Rendering Guide

Configure Tier 4 browser rendering for SPAs and dynamic content.

Authenticated Scraping Guide

Scrape pages behind login using session management.

Web Scraping API Benchmarks

Real success rates and cost data across all 5 tiers.

AlterLab for AI Agents

MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.

The Core Concept: HTML to JSON

Pipeline Architecture

Prototyping the Extraction Request

Step 1: Configuring the n8n Trigger

Step 2: The Scraping Node

Managing Execution Tiers

Step 3: Parsing and Validation

Step 4: Storage and Database Updates

Scaling the Workflow

Concurrency and Rate Limits

Asynchronous Processing

Takeaways

Frequently Asked Questions

Related Articles

MarketWatch Data API: Extract Structured JSON in 2026

How to Scrape AngelList Data: Complete Guide for 2026

Building Reliable Agentic Browsing Pipelines with Real-Time Web Data and MCP Servers

Popular Posts

Playwright Bot Detection: What Actually Works in 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X: Complete Guide for 2026

Best Web Scraping APIs in 2026: Complete Comparison Guide

How to Scrape Cloudflare-Protected Sites in 2026

Recommended

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: In-Depth Review with Benchmarks & Code Examples

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Newsletter

Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

AlterLab vs Firecrawl: In-Depth Review with Benchmarks & Code Examples

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

Stay in the Loop

Explore AlterLab

Python Web Scraping API

Compare Scraping APIs

Pricing

Documentation

Web Scraping API Resources