WebSocket upgrades an HTTP connection to a persistent, bidirectional channel where both client and server can send messages at any time. The upgrade is initiated by the client with an HTTP Upgrade request; once established, WebSocket frames replace HTTP request/response cycles. WebSocket is widely used for real-time applications: live stock tickers, sports scores, collaborative editors, and trading platforms.
For scrapers, WebSocket presents an opportunity and a challenge. Many real-time data sources stream their updates over WebSocket rather than exposing polling REST endpoints. Subscribing to the WebSocket feed and capturing the message stream is far more efficient than polling. However, WebSocket connections require maintaining a persistent session and handling reconnection logic, ping/pong keepalives, and binary frame formats.
Browser-based scrapers can intercept WebSocket frames using the Page.webSocketFrameReceived CDP event. Libraries like `websocket-client` (Python) and `ws` (Node.js) provide standalone WebSocket clients that can connect directly to a target site's WebSocket endpoint without a browser.