Enhanced Scraping Reliability
This release improves JavaScript rendering, image extraction, and content classification to deliver more accurate scrape results. Key fixes address iframe handling, lazy-loaded images, and navigation removal for cleaner data extraction.
JavaScript-based scraping now properly handles iframe inclusion, shadow DOM flattening, and custom actions parameters. Requests that previously failed with TypeError errors due to missing function arguments will now execute successfully.
New Features
3Add content depth events for docs and blog
Add content depth events for docs and blog .
Wire product engagement events in dashboard
Wire product engagement events in dashboard .
Navigation removal control
Content extraction now supports removing navigation elements from scoped content using the new remove_selectors field. This provides cleaner results when scraping homepage content that includes navigation menus within the main content container.
Improvements
2SDK scrolling parameters
Python and Node SDKs now support scroll_to_load and scroll_count parameters for JavaScript scraping. This enables automated scrolling for infinite scroll pages and provides better control over dynamic content loading.
General bug fixes and improvements
Plus 1 internal improvement for better reliability and performance.
Bug Fixes
17Defer content_only heading-strip until after child promot...
Defer content_only heading-strip until after child promot....
Throw on missing DEMO_TOKEN_SECRET in production
Throw on missing DEMO_TOKEN_SECRET in production .
Include scroll params in cache key
Include scroll params in cache key .
Use callback refs in useScrollDepth/useTimeOnPage
Use callback refs in useScrollDepth/useTimeOnPage .
Gate FLAG_ICON_PATTERNS on small dimensions
Gate FLAG_ICON_PATTERNS on small dimensions .
Add classifier-less fallback for json_v2 section tree (#3...
Add classifier-less fallback for json_v2 section tree (#3....
Level-aware pruning guard — only preserve h1/h2 headings ...
Level-aware pruning guard — only preserve h1/h2 headings ....
Unwrap options wrapper in UnifiedScrapeRequest validator ...
Unwrap options wrapper in UnifiedScrapeRequest validator ....
Fallback to original soup when Readability empties sectio...
Fallback to original soup when Readability empties sectio....
Scope markdown with playbook content_selectors
Scope markdown with playbook content_selectors .
Add leaf-container fallback for json_v2 section tree
Add leaf-container fallback for json_v2 section tree .
Prune empty leaf sections from json_v2 section tree
Prune empty leaf sections from json_v2 section tree .
JavaScript rendering stability
JavaScript-based scraping now properly handles iframe inclusion, shadow DOM flattening, and custom actions parameters. Requests that previously failed with TypeError errors due to missing function arguments will now execute successfully.
Image URL parsing accuracy
Image source set parsing now correctly handles URLs containing commas, such as Cloudinary transformations and Shopify query parameters. This prevents broken image URLs when extracting responsive images from srcset attributes.
Content classifier accuracy
Content classification no longer skips valid content within containers rejected by the classifier. Text blocks, lists, and headings inside non-content containers are now properly included in the final extraction results.
Lazy-loaded image support
Image extraction now resolves sources from common lazy-loading patterns including data-lazy-src, data-original, data-image, srcset, data-srcset, and picture/source elements. This ensures images from Shopify themes and other lazy-loading implementations are captured correctly.
Hero section detection
Hero banner classification now recognizes carousel, slider, and slideshow containers in addition to traditional hero classes. This improves detection of main banner sections on e-commerce sites using modern slideshow implementations.
Security
1SSRF protection enhancement
Security measures now block IPv4-mapped IPv6 addresses and bracketed IPv6 formats that could bypass SSRF protection. This closes potential bypass vectors in demo routes while maintaining legitimate access.
Plus 1 internal change for stability and performance.