Guide

New

Search Guide

Find relevant web pages when you know what you need but not where it lives. Search is the discovery layer that feeds into scraping and extraction.

Prerequisite

This guide covers patterns and workflows. For the full parameter reference, see the Search API Reference.

When to Use Search

The Search endpoint is for cases where you need to discover URLs before scraping them. Compare with direct scraping:

Scenario	Use	Why
You have the exact URL	`/v1/scrape`	Direct scrape is cheaper and faster
You need to find URLs by keyword	`/v1/search`	Search discovers, then scrape fetches
You need all pages on a domain	`/v1/crawl`	Crawl follows links systematically
You want specific pages on a domain	`/v1/search` + domain	Domain-scoped search finds relevant pages without crawling everything

Domain-Scoped Search

Use the domain parameter to restrict results to a specific website. This is faster and more targeted than crawling an entire site.

Python

import requests

# Find all documentation pages about authentication on GitHub Docs
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "OAuth2 authentication tokens",
        "domain": "docs.github.com",
        "num_results": 20
    }
)

data = response.json()
print(f"Found {data['results_count']} pages about auth on GitHub Docs")

for r in data["results"]:
    print(f"  {r['position']}. {r['title']}")
    print(f"     {r['url']}")

How Domain Scoping Works

The domain parameter adds a site: prefix to your query. You can pass just the domain (e.g., docs.github.com) without the protocol.

Search + Scrape Workflow

The most powerful pattern: find pages and scrape them in a single API call. Set scrape_results: true to get full page content alongside search results.

Python

import requests
import time

# Search and scrape in one call
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "Python async best practices 2026",
        "num_results": 5,
        "scrape_results": True,
        "formats": ["text", "markdown"]
    }
)

data = response.json()

# For <= 5 results, content may be available immediately
# For > 5, poll using search_id
if response.status_code == 202:
    search_id = data["search_id"]
    while True:
        status = requests.get(
            f"https://api.alterlab.io/api/v1/search/{search_id}",
            headers={"X-API-Key": "YOUR_API_KEY"}
        ).json()
        print(f"Progress: {status['completed']}/{status['results_count']}")
        if status["status"] == "completed":
            data = status
            break
        time.sleep(2)

# All results now have full content
for result in data["results"]:
    print(f"\n--- {result['title']} ---")
    if result.get("content"):
        text = result["content"].get("text", "")
        print(f"  {len(text)} characters of content")

Time-Filtered Search

Use time_range to find recent content. Useful for news monitoring, trend tracking, and finding up-to-date information.

Python

# Find articles published in the last week
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "AI regulation Europe",
        "time_range": "week",
        "num_results": 10
    }
)

data = response.json()
print(f"Found {data['results_count']} recent articles")

# Available time ranges:
# "hour"  — last hour
# "day"   — last 24 hours
# "week"  — last 7 days
# "month" — last 30 days
# "year"  — last 12 months

Geo-Targeted Search

Combine country and language parameters to get localized search results — essential for competitive analysis across markets.

Python

# Search from a German perspective, in German
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "beste Webhosting Anbieter",
        "country": "DE",
        "language": "de",
        "num_results": 10
    }
)

# Compare with US results
response_us = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "best web hosting providers",
        "country": "US",
        "language": "en",
        "num_results": 10
    }
)

de_urls = {r["url"] for r in response.json()["results"]}
us_urls = {r["url"] for r in response_us.json()["results"]}
print(f"Overlap: {len(de_urls & us_urls)} URLs in common")

Search + Extract Pipeline

Combine search, scraping, and structured extraction in a single call. Pass an extraction_schema with scrape_results: true to get structured data from every result.

Python

import requests
import time

# Find and extract pricing from competitor pages
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "web scraping API pricing",
        "num_results": 5,
        "scrape_results": True,
        "formats": ["text"],
        "extraction_schema": {
            "type": "object",
            "properties": {
                "company_name": {"type": "string"},
                "plans": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "price": {"type": "string"},
                            "requests_per_month": {"type": "string"}
                        }
                    }
                },
                "free_tier": {"type": "boolean"}
            }
        }
    }
)

data = response.json()

# Poll if async
if response.status_code == 202:
    while True:
        status = requests.get(
            f"https://api.alterlab.io/api/v1/search/{data['search_id']}",
            headers={"X-API-Key": "YOUR_API_KEY"}
        ).json()
        if status["status"] == "completed":
            data = status
            break
        time.sleep(2)

# Extracted pricing data from each competitor
for result in data["results"]:
    ext = result.get("content", {})
    if ext and ext.get("extraction"):
        pricing = ext["extraction"]
        print(f"\n{pricing.get('company_name', result['title'])}:")
        print(f"  Free tier: {pricing.get('free_tier', 'Unknown')}")
        for plan in pricing.get("plans", []):
            print(f"  {plan['name']}: {plan['price']}")

AI Agent Patterns

Search is the discovery tool for AI agents. The typical flow is: search → scrape → extract → reason. Here is a minimal agent loop:

Python

import requests

API_KEY = "YOUR_API_KEY"
BASE = "https://api.alterlab.io/api/v1"

def research(topic: str, num_sources: int = 5) -> list[dict]:
    """Search, scrape, and extract key facts about a topic."""

    # Step 1: Discover relevant pages
    search = requests.post(
        f"{BASE}/search",
        headers={"X-API-Key": API_KEY},
        json={
            "query": topic,
            "num_results": num_sources,
            "time_range": "month",       # Recent content only
            "scrape_results": True,       # Fetch full text
            "formats": ["text"],
            "extraction_schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "key_facts": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "3-5 key facts or findings"
                    },
                    "date_published": {"type": "string"}
                }
            }
        }
    ).json()

    # Step 2: Collect extracted data
    sources = []
    for result in search.get("results", []):
        ext = (result.get("content") or {}).get("extraction")
        sources.append({
            "url": result["url"],
            "title": result["title"],
            "facts": ext.get("key_facts", []) if ext else [],
            "date": ext.get("date_published") if ext else None,
        })

    return sources

# Use it
sources = research("quantum computing breakthroughs 2026")
for s in sources:
    print(f"\n{s['title']} ({s['url']})")
    for fact in s["facts"]:
        print(f"  - {fact}")

Full Tutorial

For a complete AI research agent with LangChain integration, see the AI Research Agent Tutorial.

Best Practices

1. Start with Search-Only

Run a search-only call first (2 credits) to verify results are relevant before committing to scrape credits. Then pass the URLs you want to /v1/scrape or /v1/batch.

2. Use Domain Scoping for Site Search

Instead of crawling an entire site, use domain to find the specific pages you need. This is faster and cheaper than a full crawl.

3. Limit num_results When Scraping

Each scraped result costs additional credits. Start with 5 results, verify quality, then scale up. Use num_results: 5 or less to get inline results without polling.

4. Add Time Ranges for Freshness

For news, trends, or rapidly changing topics, always set time_range. Without it, results may include outdated content.

5. Use Extraction Schemas for Structured Output

When building pipelines, pass an extraction_schema to get consistent, machine-readable data from every result page.

← Previous: Scheduler Next: AI Research Agent →

Last updated: March 2026

Guide

New

Search Guide

Find relevant web pages when you know what you need but not where it lives. Search is the discovery layer that feeds into scraping and extraction.

Prerequisite

This guide covers patterns and workflows. For the full parameter reference, see the Search API Reference.

When to Use Search

The Search endpoint is for cases where you need to discover URLs before scraping them. Compare with direct scraping:

Scenario	Use	Why
You have the exact URL	`/v1/scrape`	Direct scrape is cheaper and faster
You need to find URLs by keyword	`/v1/search`	Search discovers, then scrape fetches
You need all pages on a domain	`/v1/crawl`	Crawl follows links systematically
You want specific pages on a domain	`/v1/search` + domain	Domain-scoped search finds relevant pages without crawling everything

Domain-Scoped Search

Use the domain parameter to restrict results to a specific website. This is faster and more targeted than crawling an entire site.

Python

import requests

# Find all documentation pages about authentication on GitHub Docs
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "OAuth2 authentication tokens",
        "domain": "docs.github.com",
        "num_results": 20
    }
)

data = response.json()
print(f"Found {data['results_count']} pages about auth on GitHub Docs")

for r in data["results"]:
    print(f"  {r['position']}. {r['title']}")
    print(f"     {r['url']}")

How Domain Scoping Works

The domain parameter adds a site: prefix to your query. You can pass just the domain (e.g., docs.github.com) without the protocol.

Search + Scrape Workflow

The most powerful pattern: find pages and scrape them in a single API call. Set scrape_results: true to get full page content alongside search results.

Python

import requests
import time

# Search and scrape in one call
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "Python async best practices 2026",
        "num_results": 5,
        "scrape_results": True,
        "formats": ["text", "markdown"]
    }
)

data = response.json()

# For <= 5 results, content may be available immediately
# For > 5, poll using search_id
if response.status_code == 202:
    search_id = data["search_id"]
    while True:
        status = requests.get(
            f"https://api.alterlab.io/api/v1/search/{search_id}",
            headers={"X-API-Key": "YOUR_API_KEY"}
        ).json()
        print(f"Progress: {status['completed']}/{status['results_count']}")
        if status["status"] == "completed":
            data = status
            break
        time.sleep(2)

# All results now have full content
for result in data["results"]:
    print(f"\n--- {result['title']} ---")
    if result.get("content"):
        text = result["content"].get("text", "")
        print(f"  {len(text)} characters of content")

Time-Filtered Search

Use time_range to find recent content. Useful for news monitoring, trend tracking, and finding up-to-date information.

Python

# Find articles published in the last week
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "AI regulation Europe",
        "time_range": "week",
        "num_results": 10
    }
)

data = response.json()
print(f"Found {data['results_count']} recent articles")

# Available time ranges:
# "hour"  — last hour
# "day"   — last 24 hours
# "week"  — last 7 days
# "month" — last 30 days
# "year"  — last 12 months

Geo-Targeted Search

Combine country and language parameters to get localized search results — essential for competitive analysis across markets.

Python

# Search from a German perspective, in German
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "beste Webhosting Anbieter",
        "country": "DE",
        "language": "de",
        "num_results": 10
    }
)

# Compare with US results
response_us = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "best web hosting providers",
        "country": "US",
        "language": "en",
        "num_results": 10
    }
)

de_urls = {r["url"] for r in response.json()["results"]}
us_urls = {r["url"] for r in response_us.json()["results"]}
print(f"Overlap: {len(de_urls & us_urls)} URLs in common")

Search + Extract Pipeline

Combine search, scraping, and structured extraction in a single call. Pass an extraction_schema with scrape_results: true to get structured data from every result.

Python

import requests
import time

# Find and extract pricing from competitor pages
response = requests.post(
    "https://api.alterlab.io/api/v1/search",
    headers={"X-API-Key": "YOUR_API_KEY"},
    json={
        "query": "web scraping API pricing",
        "num_results": 5,
        "scrape_results": True,
        "formats": ["text"],
        "extraction_schema": {
            "type": "object",
            "properties": {
                "company_name": {"type": "string"},
                "plans": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "price": {"type": "string"},
                            "requests_per_month": {"type": "string"}
                        }
                    }
                },
                "free_tier": {"type": "boolean"}
            }
        }
    }
)

data = response.json()

# Poll if async
if response.status_code == 202:
    while True:
        status = requests.get(
            f"https://api.alterlab.io/api/v1/search/{data['search_id']}",
            headers={"X-API-Key": "YOUR_API_KEY"}
        ).json()
        if status["status"] == "completed":
            data = status
            break
        time.sleep(2)

# Extracted pricing data from each competitor
for result in data["results"]:
    ext = result.get("content", {})
    if ext and ext.get("extraction"):
        pricing = ext["extraction"]
        print(f"\n{pricing.get('company_name', result['title'])}:")
        print(f"  Free tier: {pricing.get('free_tier', 'Unknown')}")
        for plan in pricing.get("plans", []):
            print(f"  {plan['name']}: {plan['price']}")

AI Agent Patterns

Search is the discovery tool for AI agents. The typical flow is: search → scrape → extract → reason. Here is a minimal agent loop:

Python

import requests

API_KEY = "YOUR_API_KEY"
BASE = "https://api.alterlab.io/api/v1"

def research(topic: str, num_sources: int = 5) -> list[dict]:
    """Search, scrape, and extract key facts about a topic."""

    # Step 1: Discover relevant pages
    search = requests.post(
        f"{BASE}/search",
        headers={"X-API-Key": API_KEY},
        json={
            "query": topic,
            "num_results": num_sources,
            "time_range": "month",       # Recent content only
            "scrape_results": True,       # Fetch full text
            "formats": ["text"],
            "extraction_schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "key_facts": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "3-5 key facts or findings"
                    },
                    "date_published": {"type": "string"}
                }
            }
        }
    ).json()

    # Step 2: Collect extracted data
    sources = []
    for result in search.get("results", []):
        ext = (result.get("content") or {}).get("extraction")
        sources.append({
            "url": result["url"],
            "title": result["title"],
            "facts": ext.get("key_facts", []) if ext else [],
            "date": ext.get("date_published") if ext else None,
        })

    return sources

# Use it
sources = research("quantum computing breakthroughs 2026")
for s in sources:
    print(f"\n{s['title']} ({s['url']})")
    for fact in s["facts"]:
        print(f"  - {fact}")

Full Tutorial

For a complete AI research agent with LangChain integration, see the AI Research Agent Tutorial.

Best Practices

1. Start with Search-Only

Run a search-only call first (2 credits) to verify results are relevant before committing to scrape credits. Then pass the URLs you want to /v1/scrape or /v1/batch.

2. Use Domain Scoping for Site Search

Instead of crawling an entire site, use domain to find the specific pages you need. This is faster and cheaper than a full crawl.

3. Limit num_results When Scraping

Each scraped result costs additional credits. Start with 5 results, verify quality, then scale up. Use num_results: 5 or less to get inline results without polling.

4. Add Time Ranges for Freshness

For news, trends, or rapidly changing topics, always set time_range. Without it, results may include outdated content.

5. Use Extraction Schemas for Structured Output

When building pipelines, pass an extraction_schema to get consistent, machine-readable data from every result page.

← Previous: Scheduler Next: AI Research Agent →

Last updated: March 2026