
Build an n8n AI Agent Workflow to Scrape Job Boards and Automate Candidate Scoring
Learn how to build an automated n8n pipeline that scrapes public job boards, parses requirements, and uses an AI agent to score roles against your resume.
April 26, 2026
Automating job searches requires extracting structured data from heavily protected job boards and evaluating it against specific, highly subjective criteria. Manual filtering is inefficient, but traditional keyword matching falls short when evaluating complex engineering roles.
By combining n8n (a node-based workflow automation tool) with a reliable scraping engine and a Large Language Model (LLM), you can build an autonomous agent that continuously monitors public tech career sites, extracts new postings, and reads the job descriptions to score them against your exact skillset.
This guide walks through the architecture and configuration of an end-to-end job scoring pipeline.
System Architecture
Our workflow consists of five distinct operational phases:
- Triggering: A cron node in n8n initiates the workflow every 6 hours.
- Data Extraction: An HTTP Request node calls a web scraping API to render the target job board and bypass bot protection.
- Parsing: HTML extraction nodes parse the raw response into an array of individual job listing URLs and titles.
- AI Evaluation: An LLM node processes each job description alongside your resume, outputting a structured JSON match score.
- Routing: Conditional logic filters out low scores and pushes high-matching roles to a Slack or Discord webhook.
Step 1: Handling Data Extraction
Modern Applicant Tracking Systems (ATS) and aggregator sites rarely serve static HTML. They rely on client-side JavaScript to hydrate job lists and employ aggressive bot mitigation strategies to block automated traffic.
A standard n8n HTTP Request node will fail to retrieve the actual job content, returning either an empty div or a CAPTCHA challenge. To reliably extract this data, we delegate the fetch operation to an infrastructure layer capable of headless browser execution and anti-bot handling.
Test rendering a JavaScript-heavy career page using AlterLab
API Integration Examples
Before configuring n8n, verify your target payload using a standard HTTP client. We use AlterLab for the extraction layer to ensure JavaScript is fully executed before the HTML is returned.
Here is the cURL command to fetch the target page:
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://careers.example-startup.com/engineering", "render_js": true}'If you prefer building this pipeline outside of n8n, you can achieve the same extraction using the Python SDK:
import alterlab
client = alterlab.Client("YOUR_API_KEY")
# render_js ensures the SPA hydrates the job list before returning
response = client.scrape(
url="https://careers.example-startup.com/engineering",
render_js=True
)
print(response.text)Step 2: Configuring the n8n HTTP Request Node
In your n8n canvas, add an HTTP Request node. This node will execute the POST request defined above.
Configure the node with the following parameters:
- Method:
POST - URL:
https://api.alterlab.io/v1/scrape - Authentication: Generic Credential Type (Header Auth)
- Name:
X-API-Key - Value: Your API key
- Body Parameters: Add
url(your target job board) andrender_js(set totrue).
When you execute this node, the output will contain the fully rendered HTML of the job board, complete with all dynamically loaded job titles and links. For detailed schema information, refer to the API reference.
Step 3: Parsing the Job Data
The response from the extraction phase is a single, monolithic HTML string. We need to split this into an array of individual job items so n8n can process them sequentially.
Add an HTML Extract node to your workflow. n8n uses a Cheerio-like syntax for CSS selection.
- Extraction Values: Create a new value.
- Key:
job_links - CSS Selector:
.job-posting-list .job-item a(Adjust this selector based on the DOM structure of your specific target site). - Return Value:
Attribute - Attribute Name:
href - Return Array: Enable this toggle.
This configuration transforms the raw HTML into a clean JSON array of URLs pointing to specific job descriptions.
Next, add an Item Lists node configured to Split Out Items. This takes the array of URLs and outputs individual n8n items, allowing the subsequent nodes to run once per job listing.
For each individual job URL, insert a second HTTP Request node to fetch the specific job description page using the same extraction API configuration as Step 2.
Step 4: The AI Scoring Agent
This is the core of the autonomous workflow. We have the raw text of a job description, and we need a deterministic evaluation of how well it matches a candidate's profile.
Add an Advanced LLM node (or the OpenAI/Anthropic specific node) to the canvas.
Prompt Engineering for Deterministic Output
To prevent the LLM from returning conversational text ("Here is your evaluation..."), we must strictly define the output schema. Use the following System Prompt:
You are an expert technical recruiter and engineering manager.
Your task is to evaluate a job description against a candidate's background.
You MUST respond in raw JSON format with exactly two keys:
1. "score": An integer from 0 to 100 representing the match quality.
2. "reasoning": A concise, one-sentence explanation for the score.
Scoring Criteria:
- Start at 50.
- Add 20 points if the primary language is Go or Python.
- Add 15 points if the role involves data pipelines or distributed systems.
- Deduct 40 points if the role requires extensive front-end React work.
- Deduct 50 points if the role requires security clearance.
- Deduct 100 points if the role is purely managerial (no hands-on coding).
Candidate Background:
- Senior Backend Engineer
- 7 years experience
- Expertise in Python, Go, Kubernetes, PostgreSQL, Kafka.
- Prefers backend, infrastructure, and platform engineering.In the User Message field of the LLM node, map the parsed text output from your job description extraction node:
Evaluate this job description:
{{ $json.extracted_job_text }}Ensure you toggle JSON Output or Structured Output in the LLM node settings. This forces the API to adhere to the requested schema.
Step 5: Routing and Notifications
The LLM node will output items that look like this:
{
"score": 85,
"reasoning": "Strong match due to heavy emphasis on Python and Kafka, with zero front-end responsibilities."
}Add an If node (or Switch node in newer n8n versions) to filter the results.
- Condition: Number
- Value 1:
{{ $json.score }} - Operation:
Larger or Equal - Value 2:
80
Route the True output of the If node to a Slack or Discord node.
Configure the message to include the high-signal data:
🚨 *High Match Job Found!* 🚨
*Score*: {{ $json.score }}/100
*Why*: {{ $json.reasoning }}
*Link*: {{ $node["Split Job Links"].json["url"] }}Handling Failure States and Scaling
When building autonomous agents that interact with external web resources, failure is a feature of the environment. Job boards change their DOM structures, networks time out, and LLM APIs experience latency.
Implementing Fallbacks
In n8n, utilize the Error Trigger node to catch workflow failures. If the CSS selector in your HTML Extract node suddenly returns an empty array (indicating the target site deployed a frontend update), the Error Trigger can alert you to update the selector.
Rate Limiting and Concurrency
When your Item Lists node splits 50 job URLs, the subsequent HTTP Request and LLM nodes will attempt to execute concurrently. If you are scraping a strict target, 50 simultaneous headless browser requests might trigger a temporary IP ban, and 50 simultaneous LLM calls might hit your token rate limit.
In n8n, configure the node settings for your secondary HTTP Request node to process in batches. Select Batching in the node options, set the batch size to 5, and introduce a 2000 millisecond delay between batches. This jitter ensures smooth, continuous execution without overloading the target servers or your API limits.
Takeaways
By decoupling the data extraction layer from the evaluation logic, you create a highly resilient automation pipeline.
- Reliable Extraction: Utilizing a managed API to handle JavaScript rendering and proxy rotation prevents your workflow from failing silently due to bot mitigation.
- Structured AI: Forcing LLMs to return strict JSON allows you to build programmatic routing logic based on subjective text evaluation.
- Infinite Customization: Because the evaluation criteria live entirely within the System Prompt, you can duplicate this workflow to hunt for entirely different roles simply by altering a few lines of text.
The combination of n8n's visual routing, robust data extraction, and deterministic LLM outputs transforms unstructured web data into a highly curated, actionable feed.
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

Selenium Bot Detection: Why You Get Flagged and How to Fix It

How to Scrape Glassdoor: Complete Guide for 2026

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Cloudflare-Protected Sites in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.


