browser

DOM (Document Object Model)

The tree-structured programmatic representation of a web page that JavaScript manipulates to create dynamic content.

The Document Object Model (DOM) is the programming interface for HTML and XML documents — an in-memory tree representation of the page structure that JavaScript can read and modify. Every HTML element becomes a node in the DOM tree; scripts can add, remove, or change nodes to update what the user sees without reloading the page.

The DOM is central to modern web scraping because JavaScript-rendered pages build all their content by manipulating the DOM after the initial HTML load. A React or Vue application receives a near-empty HTML document, executes its JavaScript bundle, fetches data via API calls, and then calls DOM APIs (`document.createElement`, `element.appendChild`, `element.textContent = ...`) to build the visible page. A headless browser that executes this JavaScript produces a fully populated DOM; a plain HTTP request produces only the empty shell.

For scraping, the DOM is the extraction target — the source of data after all JavaScript has executed. CSS selectors and XPath expressions operate on the DOM to locate specific nodes. Browser DevTools provide a live DOM inspector that is invaluable for identifying the correct selector path to target data on any page.

Related Terms

    DOM (Document Object Model) — Web Scraping Glossary | AlterLab