
Handling Infinite Scroll & Pagination in Headless Browsers
Learn how to reliably handle infinite scroll, cursor-based pagination, and dynamic rendering for autonomous AI web scraping agents using headless browsers.
TL;DR
To handle infinite scroll and pagination in headless browsers, you must synchronize programmatic scrolling or button clicking with network idle events and DOM updates to ensure complete data extraction. Intercepting the underlying XHR/Fetch API requests is the most robust approach, but when that fails, carefully timed JavaScript execution simulating user scrolling combined with smart rendering provides a reliable fallback for autonomous AI agents.
The Challenge of Dynamic Content Loading
Modern web applications rarely load all content simultaneously. Instead, they rely on single-page application (SPA) architectures, infinite scrolling, or "Load More" buttons to fetch data asynchronously. For autonomous AI agents tasked with reading public data feeds, e-commerce product grids, or article archives, standard static HTML fetching fails because the required content is trapped behind client-side JavaScript execution.
Handling this correctly requires a headless browser capable of executing JavaScript, intercepting network requests, and managing state across multiple asynchronous loads. The two primary strategies for extracting this data are intercepting API requests and simulating user interactions.
Strategy 1: Intercepting XHR and Fetch Requests (The API Route)
The cleanest and most resource-efficient way to handle pagination is entirely bypassing the UI rendering layer. When a user scrolls down an infinite scroll page, the frontend application fires an HTTP request (usually XHR or Fetch) to a backend API to retrieve the next batch of items.
By observing the network tab in your browser's developer tools, you can often identify these requests. They typically return JSON data and include pagination parameters like offset, limit, page, or an opaque cursor.
If the API is publicly accessible without complex cryptographic signatures, your agent can recreate these requests in a loop, paginating through the dataset directly until an empty response or a has_next: false flag is returned.
However, many modern sites implement strict bot mitigation that blocks direct API access. In these cases, you must rely on a browser environment to execute the requests natively, allowing the site's own JavaScript to handle token generation and request signing. Using an anti-bot solution helps maintain session validity while extracting this data.
Try scraping this dynamically loaded page with AlterLab
Strategy 2: Programmatic Scrolling and DOM Extraction
When API interception is impossible, your AI agent must drive the headless browser to behave like a user. This means scrolling the viewport, waiting for loading indicators to disappear, and verifying that new DOM elements have been injected before extracting the data.
The Mechanics of Programmatic Scrolling
Scrolling a headless browser reliably requires more than simply setting window.scrollTo(0, document.body.scrollHeight). Modern infinite scroll implementations often employ techniques like virtualization, where DOM elements that scroll out of view are removed to save memory.
To scrape all items, you must extract data incrementally during the scroll process, keeping a running hash set of unique identifiers (like product IDs or URLs) to deduplicate items.
Implementing the Scroll Loop
A robust scroll loop requires three components:
- Scroll Action: Executing JavaScript to move the viewport down.
- Wait Condition: Pausing execution until the network is idle or a specific DOM element appears/disappears (e.g., waiting for a spinner to vanish).
- Termination Condition: Determining when the end of the list is reached. This is typically detected when the page height stops increasing after consecutive scroll attempts.
Here is how you can implement this logic using AlterLab.
Python SDK Implementation
We provide a robust Python SDK that allows you to easily dispatch scraping jobs with custom JavaScript execution. The following example demonstrates how to inject a script that handles infinite scrolling before the HTML is returned.
import alterlab
import time
# Initialize the client
client = alterlab.Client("YOUR_API_KEY")
# The JavaScript to execute in the browser environment
scroll_script = """
async () => {
let lastHeight = document.body.scrollHeight;
while (true) {
window.scrollTo(0, document.body.scrollHeight);
// Wait for new content to load
await new Promise(resolve => setTimeout(resolve, 2000));
let newHeight = document.body.scrollHeight;
if (newHeight === lastHeight) {
break; // No new content loaded, exit loop
}
lastHeight = newHeight;
}
}
"""
print("Starting extraction...")
response = client.scrape(
url="https://example-ecommerce-site.com/products",
js_scenario={"evaluate": scroll_script},
wait_for={"network_idle": True}
)
print(f"Extraction complete. HTML length: {len(response.text)}")cURL Implementation
The same behavior can be achieved via direct API calls using standard tools. This is useful for integrating into diverse environments or lightweight AI agents.
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example-ecommerce-site.com/products",
"js_scenario": {
"evaluate": "async () => { let h = document.body.scrollHeight; while(true) { window.scrollTo(0, h); await new Promise(r => setTimeout(r, 2000)); let newH = document.body.scrollHeight; if(newH === h) break; h = newH; } }"
},
"wait_for": {"network_idle": true}
}'Handling Pagination Buttons
Infinite scrolling is common, but traditional "Next Page" buttons are still prevalent, particularly in enterprise directories or search results.
Navigating traditional pagination requires identifying the "Next" button via CSS selectors or XPath, clicking it, and waiting for the new results to render. The challenge is that single-page applications often update the DOM without triggering a full page reload, meaning standard page-load wait conditions will fail.
Your agent must locate the container holding the results, store a reference to the current items, click "Next", and explicitly wait for the items inside the container to change. If the "Next" button becomes disabled or is removed from the DOM, the pagination sequence is complete.
Structuring Output for AI Agents
Autonomous AI agents operate best on structured data, not raw HTML. When extracting data across multiple pages or scroll events, it is highly recommended to parse the DOM into JSON format immediately within the browser context, rather than pulling gigabytes of raw HTML back to your agent for post-processing.
You can modify the JavaScript scenario to execute document.querySelectorAll, map the elements to a JSON array, and return that structured object directly. AlterLab handles the underlying browser infrastructure and proxy rotation automatically, letting you focus entirely on the extraction logic and data quality. For a full breakdown of request parameters, consult our API docs.
Takeaway
Successfully extracting data from dynamically loaded interfaces requires careful management of browser state. Whether you are reverse-engineering undocumented pagination APIs or simulating complex user scroll behaviors, rely on robust wait conditions targeting network activity and DOM mutations. Intercepting XHR requests is always the preferred method for performance and reliability, but programmatic scrolling in a headless browser serves as an essential fallback when APIs are inaccessible.
Was this article helpful?
Frequently Asked Questions
Related Articles

Playwright Network Interception Guide for AI Data Extraction
Learn how to intercept and block network requests in Playwright to accelerate AI agent data extraction, reduce bandwidth, and capture raw API JSON payloads.
Herald Blog Service

Building an Autonomous CrewAI Web Scraping Tool for JSON Extraction
Learn how to build a custom CrewAI tool that autonomously scrapes dynamic websites and returns structured JSON using a headless browser API.
Herald Blog Service

Proxy Rotation & Session Management for AI Web Agents
Learn how to implement sticky sessions, intelligent proxy rotation, and consistent TLS fingerprinting to build reliable autonomous AI web scraping agents.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape Amazon in 2026: Engineering Guide

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Indeed: Complete Guide for 2026

How to Scrape Twitter/X Data: Complete Guide for 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.