Build an n8n AI Agent Workflow to Scrape Job Boards and Automate Candidate Scoring
Tutorials

Build an n8n AI Agent Workflow to Scrape Job Boards and Automate Candidate Scoring

Learn how to build an automated n8n pipeline that scrapes public job boards, parses requirements, and uses an AI agent to score roles against your resume.

Yash Dubey
Yash Dubey

April 26, 2026

7 min read
5 views

Automating job searches requires extracting structured data from heavily protected job boards and evaluating it against specific, highly subjective criteria. Manual filtering is inefficient, but traditional keyword matching falls short when evaluating complex engineering roles.

By combining n8n (a node-based workflow automation tool) with a reliable scraping engine and a Large Language Model (LLM), you can build an autonomous agent that continuously monitors public tech career sites, extracts new postings, and reads the job descriptions to score them against your exact skillset.

This guide walks through the architecture and configuration of an end-to-end job scoring pipeline.

System Architecture

Our workflow consists of five distinct operational phases:

  1. Triggering: A cron node in n8n initiates the workflow every 6 hours.
  2. Data Extraction: An HTTP Request node calls a web scraping API to render the target job board and bypass bot protection.
  3. Parsing: HTML extraction nodes parse the raw response into an array of individual job listing URLs and titles.
  4. AI Evaluation: An LLM node processes each job description alongside your resume, outputting a structured JSON match score.
  5. Routing: Conditional logic filters out low scores and pushes high-matching roles to a Slack or Discord webhook.

Step 1: Handling Data Extraction

Modern Applicant Tracking Systems (ATS) and aggregator sites rarely serve static HTML. They rely on client-side JavaScript to hydrate job lists and employ aggressive bot mitigation strategies to block automated traffic.

A standard n8n HTTP Request node will fail to retrieve the actual job content, returning either an empty div or a CAPTCHA challenge. To reliably extract this data, we delegate the fetch operation to an infrastructure layer capable of headless browser execution and anti-bot handling.

Try it yourself

Test rendering a JavaScript-heavy career page using AlterLab

API Integration Examples

Before configuring n8n, verify your target payload using a standard HTTP client. We use AlterLab for the extraction layer to ensure JavaScript is fully executed before the HTML is returned.

Here is the cURL command to fetch the target page:

Bash
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://careers.example-startup.com/engineering", "render_js": true}'

If you prefer building this pipeline outside of n8n, you can achieve the same extraction using the Python SDK:

Python
import alterlab

client = alterlab.Client("YOUR_API_KEY")

# render_js ensures the SPA hydrates the job list before returning
response = client.scrape(
    url="https://careers.example-startup.com/engineering",
    render_js=True
)

print(response.text)

Step 2: Configuring the n8n HTTP Request Node

In your n8n canvas, add an HTTP Request node. This node will execute the POST request defined above.

Configure the node with the following parameters:

  • Method: POST
  • URL: https://api.alterlab.io/v1/scrape
  • Authentication: Generic Credential Type (Header Auth)
  • Name: X-API-Key
  • Value: Your API key
  • Body Parameters: Add url (your target job board) and render_js (set to true).

When you execute this node, the output will contain the fully rendered HTML of the job board, complete with all dynamically loaded job titles and links. For detailed schema information, refer to the API reference.

Step 3: Parsing the Job Data

The response from the extraction phase is a single, monolithic HTML string. We need to split this into an array of individual job items so n8n can process them sequentially.

Add an HTML Extract node to your workflow. n8n uses a Cheerio-like syntax for CSS selection.

  1. Extraction Values: Create a new value.
  2. Key: job_links
  3. CSS Selector: .job-posting-list .job-item a (Adjust this selector based on the DOM structure of your specific target site).
  4. Return Value: Attribute
  5. Attribute Name: href
  6. Return Array: Enable this toggle.

This configuration transforms the raw HTML into a clean JSON array of URLs pointing to specific job descriptions.

Next, add an Item Lists node configured to Split Out Items. This takes the array of URLs and outputs individual n8n items, allowing the subsequent nodes to run once per job listing.

For each individual job URL, insert a second HTTP Request node to fetch the specific job description page using the same extraction API configuration as Step 2.

Step 4: The AI Scoring Agent

This is the core of the autonomous workflow. We have the raw text of a job description, and we need a deterministic evaluation of how well it matches a candidate's profile.

Add an Advanced LLM node (or the OpenAI/Anthropic specific node) to the canvas.

Prompt Engineering for Deterministic Output

To prevent the LLM from returning conversational text ("Here is your evaluation..."), we must strictly define the output schema. Use the following System Prompt:

TEXT
You are an expert technical recruiter and engineering manager. 
Your task is to evaluate a job description against a candidate's background.

You MUST respond in raw JSON format with exactly two keys:
1. "score": An integer from 0 to 100 representing the match quality.
2. "reasoning": A concise, one-sentence explanation for the score.

Scoring Criteria:
- Start at 50.
- Add 20 points if the primary language is Go or Python.
- Add 15 points if the role involves data pipelines or distributed systems.
- Deduct 40 points if the role requires extensive front-end React work.
- Deduct 50 points if the role requires security clearance.
- Deduct 100 points if the role is purely managerial (no hands-on coding).

Candidate Background:
- Senior Backend Engineer
- 7 years experience
- Expertise in Python, Go, Kubernetes, PostgreSQL, Kafka.
- Prefers backend, infrastructure, and platform engineering.

In the User Message field of the LLM node, map the parsed text output from your job description extraction node:

TEXT
Evaluate this job description:
{{ $json.extracted_job_text }}

Ensure you toggle JSON Output or Structured Output in the LLM node settings. This forces the API to adhere to the requested schema.

Step 5: Routing and Notifications

The LLM node will output items that look like this:

JSON
{
  "score": 85,
  "reasoning": "Strong match due to heavy emphasis on Python and Kafka, with zero front-end responsibilities."
}

Add an If node (or Switch node in newer n8n versions) to filter the results.

  • Condition: Number
  • Value 1: {{ $json.score }}
  • Operation: Larger or Equal
  • Value 2: 80

Route the True output of the If node to a Slack or Discord node.

Configure the message to include the high-signal data:

TEXT
🚨 *High Match Job Found!* 🚨
*Score*: {{ $json.score }}/100
*Why*: {{ $json.reasoning }}
*Link*: {{ $node["Split Job Links"].json["url"] }}

Handling Failure States and Scaling

When building autonomous agents that interact with external web resources, failure is a feature of the environment. Job boards change their DOM structures, networks time out, and LLM APIs experience latency.

Implementing Fallbacks

In n8n, utilize the Error Trigger node to catch workflow failures. If the CSS selector in your HTML Extract node suddenly returns an empty array (indicating the target site deployed a frontend update), the Error Trigger can alert you to update the selector.

Rate Limiting and Concurrency

When your Item Lists node splits 50 job URLs, the subsequent HTTP Request and LLM nodes will attempt to execute concurrently. If you are scraping a strict target, 50 simultaneous headless browser requests might trigger a temporary IP ban, and 50 simultaneous LLM calls might hit your token rate limit.

In n8n, configure the node settings for your secondary HTTP Request node to process in batches. Select Batching in the node options, set the batch size to 5, and introduce a 2000 millisecond delay between batches. This jitter ensures smooth, continuous execution without overloading the target servers or your API limits.

Takeaways

By decoupling the data extraction layer from the evaluation logic, you create a highly resilient automation pipeline.

  1. Reliable Extraction: Utilizing a managed API to handle JavaScript rendering and proxy rotation prevents your workflow from failing silently due to bot mitigation.
  2. Structured AI: Forcing LLMs to return strict JSON allows you to build programmatic routing logic based on subjective text evaluation.
  3. Infinite Customization: Because the evaluation criteria live entirely within the System Prompt, you can duplicate this workflow to hunt for entirely different roles simply by altering a few lines of text.

The combination of n8n's visual routing, robust data extraction, and deterministic LLM outputs transforms unstructured web data into a highly curated, actionable feed.

Share

Was this article helpful?

Frequently Asked Questions

n8n relies on external HTTP Request nodes to fetch data. For single-page applications or heavily JavaScript-rendered job boards, you must route your request through an API that provides headless browser rendering before passing the HTML back to n8n.
Yes. By enabling JSON mode in your n8n LLM node and providing a strict output schema in your system prompt, you can ensure the AI returns a deterministic JSON object containing the numerical score and reasoning.
Schedule your n8n cron triggers to introduce jitter, and utilize a scraping API that handles proxy rotation automatically. This ensures your data extraction remains reliable without triggering automated rate-limiting blocks.