extraction

Table Parsing

Table parsing extracts structured rows and columns of data from HTML `<table>` elements, converting them into arrays, dataframes, or other tabular formats.

HTML tables encode information in a nested structure of `<tr>` (row), `<th>` (header cell), and `<td>` (data cell) elements. Libraries like BeautifulSoup, pandas `read_html`, and Scrapy selectors can traverse this structure and return the contents as a two-dimensional array.

Complications arise from tables that use `colspan` and `rowspan` attributes to merge cells across multiple rows or columns. Parsing these correctly requires tracking span state across rows to align data with the right column header. Some pages also nest tables inside other tables for layout purposes, requiring the scraper to target the correct table by position, class, or caption text.

Pandas `pd.read_html(url)` is a common shortcut that downloads a page and returns all detected tables as DataFrames with a single call, handling most span logic automatically. For more complex tables or JavaScript-rendered ones, BeautifulSoup or browser-based extraction is needed.

Examples

# pandas: extract all tables from a page
import pandas as pd

tables = pd.read_html("https://example.com/data")
first_table = tables[0]
print(first_table.head())

Related Terms

Extract Table Parsing data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    Table Parsing — Web Scraping Glossary | AlterLab