Search Guide
Find relevant web pages when you know what you need but not where it lives. Search is the discovery layer that feeds into scraping and extraction.
Prerequisite
When to Use Search
The Search endpoint is for cases where you need to discover URLs before scraping them. Compare with direct scraping:
| Scenario | Use | Why |
|---|---|---|
| You have the exact URL | /v1/scrape | Direct scrape is cheaper and faster |
| You need to find URLs by keyword | /v1/search | Search discovers, then scrape fetches |
| You need all pages on a domain | /v1/crawl | Crawl follows links systematically |
| You want specific pages on a domain | /v1/search + domain | Domain-scoped search finds relevant pages without crawling everything |
Domain-Scoped Search
Use the domain parameter to restrict results to a specific website. This is faster and more targeted than crawling an entire site.
import requests
# Find all documentation pages about authentication on GitHub Docs
response = requests.post(
"https://api.alterlab.io/api/v1/search",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"query": "OAuth2 authentication tokens",
"domain": "docs.github.com",
"num_results": 20
}
)
data = response.json()
print(f"Found {data['results_count']} pages about auth on GitHub Docs")
for r in data["results"]:
print(f" {r['position']}. {r['title']}")
print(f" {r['url']}")How Domain Scoping Works
domain parameter adds a site: prefix to your query. You can pass just the domain (e.g., docs.github.com) without the protocol.Search + Scrape Workflow
The most powerful pattern: find pages and scrape them in a single API call. Set scrape_results: true to get full page content alongside search results.
import requests
import time
# Search and scrape in one call
response = requests.post(
"https://api.alterlab.io/api/v1/search",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"query": "Python async best practices 2026",
"num_results": 5,
"scrape_results": True,
"formats": ["text", "markdown"]
}
)
data = response.json()
# For <= 5 results, content may be available immediately
# For > 5, poll using search_id
if response.status_code == 202:
search_id = data["search_id"]
while True:
status = requests.get(
f"https://api.alterlab.io/api/v1/search/{search_id}",
headers={"X-API-Key": "YOUR_API_KEY"}
).json()
print(f"Progress: {status['completed']}/{status['results_count']}")
if status["status"] == "completed":
data = status
break
time.sleep(2)
# All results now have full content
for result in data["results"]:
print(f"\n--- {result['title']} ---")
if result.get("content"):
text = result["content"].get("text", "")
print(f" {len(text)} characters of content")Time-Filtered Search
Use time_range to find recent content. Useful for news monitoring, trend tracking, and finding up-to-date information.
# Find articles published in the last week
response = requests.post(
"https://api.alterlab.io/api/v1/search",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"query": "AI regulation Europe",
"time_range": "week",
"num_results": 10
}
)
data = response.json()
print(f"Found {data['results_count']} recent articles")
# Available time ranges:
# "hour" — last hour
# "day" — last 24 hours
# "week" — last 7 days
# "month" — last 30 days
# "year" — last 12 monthsGeo-Targeted Search
Combine country and language parameters to get localized search results — essential for competitive analysis across markets.
# Search from a German perspective, in German
response = requests.post(
"https://api.alterlab.io/api/v1/search",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"query": "beste Webhosting Anbieter",
"country": "DE",
"language": "de",
"num_results": 10
}
)
# Compare with US results
response_us = requests.post(
"https://api.alterlab.io/api/v1/search",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"query": "best web hosting providers",
"country": "US",
"language": "en",
"num_results": 10
}
)
de_urls = {r["url"] for r in response.json()["results"]}
us_urls = {r["url"] for r in response_us.json()["results"]}
print(f"Overlap: {len(de_urls & us_urls)} URLs in common")Search + Extract Pipeline
Combine search, scraping, and structured extraction in a single call. Pass an extraction_schema with scrape_results: true to get structured data from every result.
import requests
import time
# Find and extract pricing from competitor pages
response = requests.post(
"https://api.alterlab.io/api/v1/search",
headers={"X-API-Key": "YOUR_API_KEY"},
json={
"query": "web scraping API pricing",
"num_results": 5,
"scrape_results": True,
"formats": ["text"],
"extraction_schema": {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"plans": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "string"},
"requests_per_month": {"type": "string"}
}
}
},
"free_tier": {"type": "boolean"}
}
}
}
)
data = response.json()
# Poll if async
if response.status_code == 202:
while True:
status = requests.get(
f"https://api.alterlab.io/api/v1/search/{data['search_id']}",
headers={"X-API-Key": "YOUR_API_KEY"}
).json()
if status["status"] == "completed":
data = status
break
time.sleep(2)
# Extracted pricing data from each competitor
for result in data["results"]:
ext = result.get("content", {})
if ext and ext.get("extraction"):
pricing = ext["extraction"]
print(f"\n{pricing.get('company_name', result['title'])}:")
print(f" Free tier: {pricing.get('free_tier', 'Unknown')}")
for plan in pricing.get("plans", []):
print(f" {plan['name']}: {plan['price']}")AI Agent Patterns
Search is the discovery tool for AI agents. The typical flow is: search → scrape → extract → reason. Here is a minimal agent loop:
import requests
API_KEY = "YOUR_API_KEY"
BASE = "https://api.alterlab.io/api/v1"
def research(topic: str, num_sources: int = 5) -> list[dict]:
"""Search, scrape, and extract key facts about a topic."""
# Step 1: Discover relevant pages
search = requests.post(
f"{BASE}/search",
headers={"X-API-Key": API_KEY},
json={
"query": topic,
"num_results": num_sources,
"time_range": "month", # Recent content only
"scrape_results": True, # Fetch full text
"formats": ["text"],
"extraction_schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"key_facts": {
"type": "array",
"items": {"type": "string"},
"description": "3-5 key facts or findings"
},
"date_published": {"type": "string"}
}
}
}
).json()
# Step 2: Collect extracted data
sources = []
for result in search.get("results", []):
ext = (result.get("content") or {}).get("extraction")
sources.append({
"url": result["url"],
"title": result["title"],
"facts": ext.get("key_facts", []) if ext else [],
"date": ext.get("date_published") if ext else None,
})
return sources
# Use it
sources = research("quantum computing breakthroughs 2026")
for s in sources:
print(f"\n{s['title']} ({s['url']})")
for fact in s["facts"]:
print(f" - {fact}")Full Tutorial
Best Practices
1. Start with Search-Only
Run a search-only call first (2 credits) to verify results are relevant before committing to scrape credits. Then pass the URLs you want to /v1/scrape or /v1/batch.
2. Use Domain Scoping for Site Search
Instead of crawling an entire site, use domain to find the specific pages you need. This is faster and cheaper than a full crawl.
3. Limit num_results When Scraping
Each scraped result costs additional credits. Start with 5 results, verify quality, then scale up. Use num_results: 5 or less to get inline results without polling.
4. Add Time Ranges for Freshness
For news, trends, or rapidly changing topics, always set time_range. Without it, results may include outdated content.
5. Use Extraction Schemas for Structured Output
When building pipelines, pass an extraction_schema to get consistent, machine-readable data from every result page.