AlterLabAlterLab
PricingComparePlaygroundBlogDocsChangelog
    AlterLabAlterLab
    PricingComparePlaygroundBlogDocsChangelog
    IntroductionQuickstartInstallationYour First Request
    REST APIJob PollingAPI KeysSessions APINew
    OverviewPythonNode.js
    JavaScript RenderingOutput FormatsPDF & OCRCachingWebhooksJSON Schema FilteringWebSocket Real-TimeBring Your Own ProxyProAuthenticated ScrapingNewWeb CrawlingBatch ScrapingSchedulerChange DetectionCloud Storage ExportSpend LimitsOrganizations & TeamsAlerts & Notifications
    Structured ExtractionAIE-commerce ScrapingNews MonitoringPrice MonitoringMulti-Page CrawlingMonitoring DashboardAI Agent / MCPMCPData Pipeline to Cloud
    PricingRate LimitsError Codes
    From FirecrawlFrom ApifyFrom ScrapingBee / ScraperAPI
    PlaygroundPricingStatus
    Guide

    Planning with Map

    Use the Map endpoint to discover a site's structure before committing credits to scraping. Map costs just 1 credit per call and returns every URL it finds.

    1 Credit, Unlimited Insight

    Map is the cheapest way to understand a website. One call returns up to thousands of URLs with depth, source, and metadata — for a single credit.

    Why Map First?

    Save Credits

    Instead of blindly scraping an entire site, map it first to identify exactly which pages you need. Scrape only what matters.

    Understand Structure

    See how pages are organized, what URL patterns exist, and how deep the site goes — before writing any scraping logic.

    Find Hidden Pages

    Compare link discovery with sitemap parsing. Some pages only appear in one source — map catches what manual exploration misses.

    Plan Batch Jobs

    Filter map results by URL pattern, then pipe the filtered list directly into a batch scrape for efficient processing.

    Discover Site Structure

    Start by mapping a site with default settings. This crawls links up to 3 levels deep and returns up to 100 URLs:

    Bash
    curl -X POST https://api.alterlab.io/api/v1/map \
      -H "X-API-Key: your_api_key" \
      -H "Content-Type: application/json" \
      -d '{"url": "https://store.com"}'

    For larger sites, increase max_pages and max_depth:

    JSON
    {
      "url": "https://store.com",
      "max_pages": 1000,
      "max_depth": 5,
      "include_metadata": true
    }

    The response includes a depth field for each URL, showing how many clicks it is from the starting page. Use this to understand the site hierarchy.

    Filter Map Results

    Use include_patterns and exclude_patterns to narrow results before they are returned:

    JSON
    {
      "url": "https://store.com",
      "max_pages": 500,
      "include_patterns": ["/products/*", "/categories/*"],
      "exclude_patterns": ["/products/discontinued/*", "*.pdf"]
    }

    Filter Server-Side

    Patterns are applied during the crawl, not after. This means the crawler still discovers all pages (to follow links), but only matching URLs are included in the response. The credit cost does not change.

    Search for Specific Pages

    The search parameter lets you find pages by keyword relevance. Results include a relevance_score (0 to 1) and are sorted by relevance:

    Bash
    # Find pricing-related pages on a docs site
    curl -X POST https://api.alterlab.io/api/v1/map \
      -H "X-API-Key: your_api_key" \
      -H "Content-Type: application/json" \
      -d '{
        "url": "https://docs.example.com",
        "search": "pricing enterprise plan",
        "max_pages": 20,
        "include_metadata": true
      }'

    Search is useful when you need a specific page but do not know the exact URL. It is faster and cheaper than scraping every page and searching the content.

    Sitemap vs Link Discovery

    AspectLink Discovery (default)Sitemap Mode
    SpeedSlower — must fetch and parse each pageFaster — parses a single XML file
    CoverageFinds pages reachable via navigationFinds pages the owner lists as canonical
    Orphan PagesMisses pages with no inbound linksCatches orphans if they are in the sitemap
    Dynamic SitesBetter for SPAs and JS-rendered navigationMay miss pages not in static sitemap
    Best ForE-commerce, forums, user-generated contentBlogs, documentation, news sites

    For best coverage, run map twice — once with link discovery and once with sitemap mode — then merge the results. The source field in each URL object tells you how it was discovered.

    Map + Batch Scrape Workflow

    The most common pattern is to map a site, filter the URLs, and then send them to batch scrape:

    1

    Map the site

    Call POST /api/v1/map to discover all URLs. Cost: 1 credit.

    2

    Filter client-side

    Apply your own logic to select which URLs to scrape. Filter by URL pattern, depth, metadata, or relevance score.

    3

    Batch scrape

    Send the filtered URL list to POST /api/v1/batch for parallel processing. You pay per URL scraped, not per URL discovered.

    Full Python Workflow

    Python
    import alterlab
    import time
    
    client = alterlab.AlterLab(api_key="your_api_key")
    
    # Step 1: Map the site (1 credit)
    site_map = client.map(
        "https://store.com",
        max_pages=500,
        include_patterns=["/products/*"],
        include_metadata=True,
    )
    print(f"Discovered {site_map['total_urls']} product URLs")
    
    # Step 2: Filter client-side
    # Only scrape products updated in the last 30 days
    from datetime import datetime, timedelta, timezone
    
    cutoff = datetime.now(timezone.utc) - timedelta(days=30)
    recent_urls = []
    for u in site_map["urls"]:
        modified = u.get("metadata", {}).get("last_modified")
        if modified:
            mod_date = datetime.fromisoformat(modified.replace("Z", "+00:00"))
            if mod_date > cutoff:
                recent_urls.append(u["url"])
    
    print(f"Filtered to {len(recent_urls)} recently updated products")
    
    # Step 3: Batch scrape the filtered URLs
    if recent_urls:
        batch = client.batch_scrape(
            urls=[
                {"url": u, "formats": ["json"], "extraction_schema": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "price": {"type": "number"},
                        "in_stock": {"type": "boolean"},
                    },
                }}
                for u in recent_urls[:100]  # Max 100 per batch
            ],
        )
        print(f"Batch {batch['batch_id']} submitted")
    
        # Poll for results
        while True:
            status = client.get_batch_status(batch["batch_id"])
            if status["status"] != "processing":
                break
            time.sleep(2)
    
        for item in status["items"]:
            if item["status"] == "succeeded":
                data = item["result"].get("json", {})
                print(f"  {data.get('name')}: ${data.get('price')}")

    Full Node.js Workflow

    TYPESCRIPT
    import AlterLab from "@alterlab/sdk";
    
    const client = new AlterLab({ apiKey: "your_api_key" });
    
    // Step 1: Map the site (1 credit)
    const siteMap = await client.map("https://store.com", {
      maxPages: 500,
      includePatterns: ["/products/*"],
      includeMetadata: true,
    });
    console.log(`Discovered ${siteMap.totalUrls} product URLs`);
    
    // Step 2: Filter client-side
    const cutoff = new Date(Date.now() - 30 * 24 * 60 * 60 * 1000);
    const recentUrls = siteMap.urls
      .filter((u) => {
        const modified = u.metadata?.lastModified;
        return modified && new Date(modified) > cutoff;
      })
      .map((u) => u.url);
    
    console.log(`Filtered to ${recentUrls.length} recently updated products`);
    
    // Step 3: Batch scrape the filtered URLs
    if (recentUrls.length > 0) {
      const batch = await client.batchScrape({
        urls: recentUrls.slice(0, 100).map((url) => ({
          url,
          formats: ["json"],
          extractionSchema: {
            type: "object",
            properties: {
              name: { type: "string" },
              price: { type: "number" },
              inStock: { type: "boolean" },
            },
          },
        })),
      });
      console.log(`Batch ${batch.batchId} submitted`);
    
      // Poll for results
      let status;
      do {
        await new Promise((r) => setTimeout(r, 2000));
        status = await client.getBatchStatus(batch.batchId);
      } while (status.status === "processing");
    
      for (const item of status.items) {
        if (item.status === "succeeded") {
          const data = item.result?.json ?? {};
          console.log(`  ${data.name}: $${data.price}`);
        }
      }
    }

    Tips & Best Practices

    • Start small. Use max_pages: 50 to quickly check URL patterns before running a full map.
    • Use include_patterns server-side. Filtering during the crawl keeps your response small. Client-side filtering still works but transfers more data.
    • Enable metadata selectively. include_metadata increases response time because the crawler must fetch each page's HTML. Skip it for initial exploration.
    • Compare discovery methods. Run map once with links and once with sitemap: true to find pages that only appear in one source.
    • Respect robots.txt. The default is respect_robots: true. Only disable it if you have explicit permission from the site owner.
    • Cache map results. Site structures do not change often. Map once, save the URL list, and re-scrape from that list on a schedule.
    Last updated: March 2026

    On this page