Enhanced Scraping Reliability
This release improves data consistency and user experience with better schema alignment, refined extraction logic, and expanded response capabilities. Users will notice more reliable scrape results and clearer interface interactions.
Scrape responses now include optional warning and content quality fields, providing users with additional context about data reliability and potential issues during extraction.
New Features
1Expanded Response Data
Scrape responses now include optional warning and content quality fields, providing users with additional context about data reliability and potential issues during extraction.
Improvements
4General bug fixes and improvements
Plus 9 internal improvements for better reliability and performance.
Revenue tracking expanded
Platform balance information is now included in the Revenue section of monitoring reports, giving you clearer visibility into financial metrics and platform health.
Geolocation security enhanced
IP geolocation services now use HTTPS by default through a more secure provider, ensuring all location data collection meets modern security standards and protects user privacy.
Refined Content Extraction
Banner classification logic has been adjusted to prevent misclassification of promotional images as hero content. This change improves accuracy for e-commerce sites with complex layout patterns.
Bug Fixes
16Fixed stuck crawl button
The crawl Run button will no longer remain disabled after consecutive API failures. The system now tracks consecutive poll failures and automatically resets after 3 failed attempts, ensuring you can retry crawls without manual intervention.
Improved SDK validation
The Python and Node SDKs now validate section_filter parameters on the client side, catching typos and type errors before they reach the API. This prevents common configuration mistakes and provides immediate feedback during development.
Cleaner text extraction
Extracted article text now has less boilerplate content like navigation, headers, footers, and aside elements removed. The improved cleaning process provides more focused content by stripping residual elements that Readability's heuristics might miss.
Accurate image URL parsing
Image URLs containing commas are now parsed correctly from srcset attributes. The improved parser preserves commas embedded in URLs for services like Cloudinary and Shopify, preventing broken image links in extracted content.
Better content scoping
You can now remove specific elements like navigation from scoped content using the new remove_selectors option. This is particularly useful for homepage scraping where the main content wrapper includes unwanted navigation elements.
Complete content tree traversal
Content containers rejected by the classifier now retain their valid child elements like paragraphs, lists, and headings. This prevents the loss of legitimate content that happens to be inside a container the classifier deemed non-content.
Enhanced lazy image loading
The system now resolves image sources from common lazy-loading patterns including data-lazy-src, data-original, data-image, srcset, data-srcset, and picture/source elements. This ensures images from Shopify themes and other lazy-loaded content are properly extracted.
Better hero section detection
Hero classification now recognizes carousel, slider, swiper, and slideshow elements in addition to traditional hero, banner, jumbotron, and masthead classes. This improves detection of main banner sections on e-commerce sites and modern web designs.
Fixed quickstart documentation
The quickstart hub page is now accessible, resolving the previous 404 error. This provides a proper entry point for new users getting started with the platform.
Enhanced SDK scrolling options
The Python and Node SDKs now support scroll_to_load and scroll_count parameters for handling infinite scroll content. This enables more comprehensive scraping of pages that load additional content as you scroll.
Discord report accuracy fixed
Log-monitor Discord reports now display all sections consistently, including Top Domains, API Activity, Recent Jobs, Active Users, Security, and Issues. No sections are skipped based on conditions.
Top-ups reporting corrected
Top-up notifications in Discord now filter by credit type instead of amount thresholds, preventing false alerts from free trial grants and refunds while accurately tracking genuine top-ups.
New signup notifications added
Discord reports now display users_new_1h when there are new signups in the past hour, providing real-time visibility into platform growth and user acquisition trends.
Failure rate calculations corrected
Platform failure rate metrics now exclude user-gated failures from tier budget caps, providing more accurate all-time failure statistics that reflect actual platform reliability rather than user limitations.
Complete Scrape Response Schema
The scrape response now includes the missing note field, ensuring consistency with the unified response format. This alignment prevents data gaps when processing results through the Node SDK.
Improved Interface Flow
The cookie banner now appears as a floating card instead of sticky, creating a cleaner user experience. The sticky CTA is hidden until the cookie banner is dismissed, preventing interface conflicts.
Security
1Strengthened SSRF protection
Security improvements block IPv4-mapped IPv6 addresses and bracketed IPv6 formats that could bypass SSRF protection in demo routes. This closes potential attack vectors while maintaining legitimate functionality.
Plus 10 internal changes for stability and performance.