
Automated AI Agent Workflows with n8n & JSON Extraction
Build scalable website enrichment and competitor research workflows for AI agents using n8n and structured JSON extraction APIs.
TL;DR
To build automated website enrichment and competitor research workflows for AI agents, use n8n to orchestrate the pipeline and a web scraping API to convert public HTML pages into structured JSON. By passing target URLs from your CRM into an n8n HTTP Request node, requesting JSON format from the scraper, and feeding the output into an AI agent node, you can continuously extract competitor pricing, feature sets, and firmographic data without writing custom parsers.
The Architecture of an Enrichment Workflow
AI agents require structured context. Feeding raw, unparsed HTML into an LLM window results in high token costs, degraded reasoning, and hallucinated data. To automate competitor research or lead enrichment, the pipeline must standardize the input before it reaches the agent.
An effective n8n enrichment pipeline consists of four stages:
- Triggering: A CRM webhook, database event, or cron schedule initiates the workflow with a target URL.
- Extraction: A request is made to a scraping API to fetch the publicly accessible page and return it as a structured JSON object.
- Reasoning: The AI agent processes the structured JSON against a specific prompt to extract insights (e.g., pricing tiers, feature lists).
- Storage: The structured insights are pushed back to the originating CRM or database.
Building the n8n Pipeline
n8n is a node-based workflow automation tool that excels at integrating APIs and LLMs. We will build a pipeline that monitors competitor pricing pages and enriches a central database.
Step 1: Triggering the Workflow
Start by adding a Webhook node or a Schedule node in n8n. If you are enriching inbound leads, a Webhook node is optimal. Configure your CRM to send a POST request to the n8n webhook URL containing the lead's company website.
For continuous competitor research, use a Schedule node set to run weekly, followed by a database node (like PostgreSQL or Supabase) that pulls a list of competitor URLs to check.
Step 2: Structured Data Extraction
Once you have the target URL, you need to extract the data. Traditional scraping requires building brittle CSS selectors. Instead, we use AlterLab to request the page and return a structured JSON representation of the content.
Add an HTTP Request node in n8n. Configure it to make a POST request to the scraping API endpoint.
If you are testing locally outside of n8n, you can achieve the exact same extraction using cURL or Python.
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example-competitor.com/pricing", "formats": ["json"]}'For custom applications or dedicated orchestration scripts outside of n8n, you can use our Python SDK to handle the extraction synchronously.
import alterlab
import json
client = alterlab.Client("YOUR_API_KEY")
# Requesting 'json' format instructs the API to parse the layout automatically
response = client.scrape("https://example-competitor.com/pricing", formats=["json"])
data = response.json()
print(json.dumps(data, indent=2))Try extracting structured JSON from a pricing page
Step 3: Handling JavaScript and Anti-Bot Systems
Modern public websites, especially e-commerce platforms and SaaS sites, rely heavily on Single Page Application (SPA) architectures. A standard GET request will only return the empty root <div>.
Furthermore, data collection systems frequently encounter rate limits or bot detection mechanisms, even when accessing public information at respectful intervals. When building reliable automated workflows, robust anti-bot handling is a requirement, not an optional feature.
By offloading the HTTP request to a dedicated API, your n8n workflow does not need to manage headless browser instances, proxy rotation, or retries. The API renders the JavaScript, handles the network complexities, and returns the final DOM state as structured data.
Step 4: Structuring the Data for AI Agents
With the JSON data in n8n, add an Advanced AI agent node. Connect your preferred LLM provider (OpenAI, Anthropic, or local via Ollama).
Configure the AI node with a system prompt that enforces strict JSON output. The agent's job is to read the extracted page content and map it to your internal schema.
Example System Prompt for the AI Node:
You are a firmographic data extraction agent.
Analyze the provided JSON representation of a competitor's pricing page.
Extract the pricing tiers, the cost of each tier, and the core features included.
Output your response STRICTLY as a JSON object matching this schema:
{
"company_name": "string",
"pricing_tiers": [
{
"tier_name": "string",
"price_monthly": "number",
"core_features": ["string"]
}
]
}
Do not include markdown formatting or conversational text.Map the output of the HTTP Request node (the scraped JSON) to the input of the AI agent node. The agent will parse the structured web data and output a clean, standardized object that matches your database schema.
Step 5: Routing Data to Your Target System
The final step in n8n is writing the enriched data to its destination. Add a node for your target system (e.g., PostgreSQL, Salesforce, or HubSpot).
Map the strictly formatted JSON output from the AI agent directly into the corresponding fields of your database or CRM. If this workflow runs on a schedule, you can add an intermediate diff-checking node to compare the newly extracted pricing against the last known pricing in your database, only triggering an alert or update if the competitor has changed their tiers.
Extending the Workflow
Once the basic pipeline is operational, you can expand its capabilities:
- Pagination: For e-commerce category pages, use n8n's loop node to follow pagination links extracted in the initial request.
- Multi-Page Context: Scrape the target's homepage,
/about, and/pricingpages in parallel HTTP nodes. Merge the JSON outputs into a single text block before passing it to the AI agent to provide comprehensive context for lead enrichment. - Webhook Responses: If using AlterLab, you can configure the API to push results to an n8n Webhook trigger asynchronously. Refer to the documentation for configuring asynchronous webhook deliveries to prevent n8n execution timeouts on heavy pages.
Summary
Automating competitor research and website enrichment requires standardizing unstructured web data. By orchestrating workflows in n8n, offloading the browser rendering and extraction to an API, and using AI agents to map the resulting JSON to your internal schemas, you create a resilient, scalable data pipeline. You avoid writing brittle CSS selectors, eliminate the overhead of managing headless browsers, and ensure your databases are continuously enriched with the latest publicly available information.
Was this article helpful?
Frequently Asked Questions
Related Articles

Playwright vs Puppeteer 2026: Stealth for AI Web Agents
Compare Playwright and Puppeteer for AI web agents in 2026. Learn how to handle advanced anti-bot systems, browser fingerprinting, and stealth scraping.
Herald Blog Service

Scrape JavaScript-Heavy Sites Without Getting Blocked
Learn how to reliably scrape JavaScript-rendered websites by managing headless browsers, residential proxies, and TLS fingerprints at scale.
Herald Blog Service

AlterLab vs Bright Data: Which Web Scraping API Is Better in 2026?
Evaluating Bright Data pricing in 2026? Compare features, proxy networks, and API simplicity to see if AlterLab is the right Bright Data alternative for your team.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape Amazon in 2026: Engineering Guide

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Indeed: Complete Guide for 2026

How to Scrape Twitter/X Data: Complete Guide for 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.