browser

Shadow DOM

Shadow DOM is a browser API that encapsulates a component's internal HTML and CSS in an isolated subtree, separate from the main document DOM.

Shadow DOM allows web components to encapsulate their internal markup and styles so that they do not conflict with the rest of the page. The shadow root is attached to a host element; the DOM subtree inside the shadow root is accessible through JavaScript but is invisible to standard document-level `querySelector` and `getElementsBy*` calls.

For scrapers, Shadow DOM content is invisible to basic CSS selector or XPath queries on the main document. Playwright and Puppeteer provide `piercing` selectors that traverse shadow roots — Playwright's `>>` syntax and its built-in shadow-piercing modes — allowing extraction of content inside custom elements.

Modern web components (navigation menus, product carousels, checkout flows) increasingly use Shadow DOM for encapsulation. Scrapers targeting these sites must detect and traverse shadow roots to access the underlying content.

Examples

// Playwright: pierce Shadow DOM with >>> selector
const text = await page.locator("my-component >>> .product-price").textContent();

// Or via evaluate:
const text = await page.evaluate(() => {
  const host = document.querySelector("my-component");
  return host.shadowRoot.querySelector(".product-price").textContent;
});