general

Noindex

The `noindex` directive instructs search engine crawlers not to include a page in search index results, often used for private, duplicate, or low-value pages.

A page can be excluded from search engine indexes via two mechanisms: a `<meta name='robots' content='noindex'>` tag in the HTML `<head>`, or an `X-Robots-Tag: noindex` HTTP response header. Both signal to compliant crawlers (Googlebot, Bingbot) that the page should be fetched but not indexed — it will not appear in search results.

For web scrapers, `noindex` pages are interesting because they often contain content the site operator wants to keep private from search engines but still accessible to logged-in users — admin panels, cart pages, staging previews, or thin content variants. Scrapers are not bound by the noindex directive (it is a crawler convention, not a technical restriction), but responsible scrapers respect it in the same spirit as robots.txt.

From an SEO standpoint, noindex pages should not be included in a site's canonical content inventory. When auditing a site's crawlability, distinguishing between indexed and noindexed pages helps identify technical SEO issues like canonicalisation conflicts.

Examples

# Detect noindex directive when crawling
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
robots_meta = soup.find("meta", attrs={"name": "robots"})
if robots_meta and "noindex" in robots_meta.get("content", "").lower():
    print("Page is noindex — excluded from search engine index")

Related Terms

Extract Noindex data from any website

AlterLab returns clean, structured data from any public URL — no scraper infrastructure needed. Start free, no credit card required.

View API docs

Your first scrape.
Sixty seconds.

$1 free balance. No credit card. No SDK.Just a POST request.

terminal
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "formats": ["markdown"]}'

No credit card required · Up to 5,000 free scrapes · Balance never expires

    Noindex — Web Scraping Glossary | AlterLab