Supabase Integration
Scrape any website and store results directly in your Supabase database. Edge Function examples, pg_cron scheduling, and full Python and Node.js SDK walkthroughs included.
Overview
AlterLab turns any URL into structured data. Supabase gives you a managed Postgres database, Edge Functions for serverless compute, and pg_cron for scheduling — with no infrastructure to manage. Together, they cover the full scrape-to-database pipeline in minutes.
Serverless Scraping
Call AlterLab from Supabase Edge Functions — no servers, no queue management. Scale to zero when idle.
Scheduled Pipelines
Use pg_cron to trigger scrape jobs on any schedule — hourly price checks, daily news ingestion, weekly content audits.
Structured Output
Extract typed JSON from any page using a JSON Schema. Store prices, titles, inventory, and more as real columns.
Quickstart
Create a Supabase table and start storing scraped pages in under five minutes.
Step 1 — Create the database table
Open the SQL editor in your Supabase dashboard and run:
-- Create a table for scraped pages
create table scraped_pages (
id uuid primary key default gen_random_uuid(),
url text unique not null,
markdown text,
html text,
scraped_at timestamptz default now(),
tier_used int,
cost numeric(10, 6),
created_at timestamptz default now()
);
-- Index for fast URL lookups
create index on scraped_pages (url);
create index on scraped_pages (scraped_at desc);Step 2 — Add your AlterLab API key as a secret
In your Supabase project, go to Project Settings → Edge Functions → Secrets and add:
ALTERLAB_API_KEY=sk_live_your_key_hereEdge Functions
Deploy a Supabase Edge Function that calls AlterLab and upserts the result into your table. The upsert pattern means re-scraping a URL updates the existing row rather than creating duplicates.
// supabase/functions/scrape-and-store/index.ts
import { createClient } from "jsr:@supabase/supabase-js@2";
const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!,
);
Deno.serve(async (req) => {
const { url } = await req.json();
// Scrape the URL with AlterLab
const response = await fetch("https://api.alterlab.io/v1/scrape", {
method: "POST",
headers: {
"X-API-Key": Deno.env.get("ALTERLAB_API_KEY")!,
"Content-Type": "application/json",
},
body: JSON.stringify({
url,
formats: ["markdown", "html"],
waitFor: 1000,
}),
});
if (!response.ok) {
return new Response(
JSON.stringify({ error: "Scrape failed", status: response.status }),
{ status: 502, headers: { "Content-Type": "application/json" } },
);
}
const result = await response.json();
// Insert into Supabase
const { data, error } = await supabase
.from("scraped_pages")
.upsert({
url,
markdown: result.markdown,
html: result.html,
scraped_at: new Date().toISOString(),
tier_used: result.meta?.tier,
cost: result.meta?.cost,
}, { onConflict: "url" });
if (error) {
return new Response(
JSON.stringify({ error: error.message }),
{ status: 500, headers: { "Content-Type": "application/json" } },
);
}
return new Response(
JSON.stringify({ success: true, data }),
{ headers: { "Content-Type": "application/json" } },
);
});Deploy the function
supabase functions deploy scrape-and-store --no-verify-jwt
# Invoke manually to test
curl -i --location --request POST \
'https://your-project.supabase.co/functions/v1/scrape-and-store' \
--header 'Authorization: Bearer YOUR_ANON_KEY' \
--header 'Content-Type: application/json' \
--data '{"url":"https://example.com"}'waitFor: 2000 and "renderJs": true to the scrape request body. AlterLab will use a full headless browser with anti-bot bypass automatically.Example: Price Monitor Edge Function
A more complete example that uses AlterLab's structured extraction to pull typed price data and store it in a history table:
// supabase/functions/price-monitor/index.ts
// Called by pg_cron every hour to check product prices
import { createClient } from "jsr:@supabase/supabase-js@2";
const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!,
);
const PRODUCTS = [
{ name: "Widget Pro", url: "https://shop.example.com/widget-pro" },
{ name: "Gadget X", url: "https://shop.example.com/gadget-x" },
];
// Simple price extraction schema
const EXTRACT_SCHEMA = {
type: "object",
properties: {
price: { type: "number", description: "Current price in USD" },
currency: { type: "string", description: "Currency code, e.g. USD" },
in_stock: { type: "boolean", description: "Is the product in stock?" },
},
};
Deno.serve(async () => {
const results = [];
for (const product of PRODUCTS) {
const res = await fetch("https://api.alterlab.io/v1/scrape", {
method: "POST",
headers: {
"X-API-Key": Deno.env.get("ALTERLAB_API_KEY")!,
"Content-Type": "application/json",
},
body: JSON.stringify({
url: product.url,
extract: { schema: EXTRACT_SCHEMA },
}),
});
if (!res.ok) continue;
const data = await res.json();
const extracted = data.extract ?? {};
await supabase.from("price_history").insert({
product_name: product.name,
url: product.url,
price: extracted.price,
currency: extracted.currency ?? "USD",
in_stock: extracted.in_stock,
checked_at: new Date().toISOString(),
});
results.push({ product: product.name, ...extracted });
}
return new Response(JSON.stringify({ checked: results.length, results }), {
headers: { "Content-Type": "application/json" },
});
});Scheduling with pg_cron
pg_cron is a PostgreSQL extension built into Supabase. It runs SQL on a cron schedule directly inside the database — no external cron infrastructure required.
pg_net extension for HTTP calls from SQL.-- Enable pg_cron (once per project, via Supabase Dashboard → Extensions)
-- create extension if not exists pg_cron;
-- Schedule the Edge Function every hour
select cron.schedule(
'scrape-price-monitor', -- job name (unique)
'0 * * * *', -- every hour at :00
$$
select net.http_post(
url := current_setting('app.supabase_url') || '/functions/v1/price-monitor',
headers := jsonb_build_object(
'Authorization', 'Bearer ' || current_setting('app.service_role_key'),
'Content-Type', 'application/json'
),
body := '{}'::jsonb
) as request_id;
$$
);
-- View scheduled jobs
select jobid, schedule, command from cron.job;
-- Remove a job
select cron.unschedule('scrape-price-monitor');Common schedules
| Schedule | cron | Use case |
|---|---|---|
| Every hour | 0 * * * * | Price monitoring |
| Every 15 min | */15 * * * * | Stock / inventory checks |
| Daily at 6am UTC | 0 6 * * * | News ingestion, content sync |
| Weekly on Monday | 0 8 * * 1 | Lead enrichment, SEO audits |
Python SDK
Use the AlterLab Python SDK with the Supabase Python client for scripts, data pipelines, and backend services.
Installation
pip install alterlab supabaseScrape and store
import alterlab
from supabase import create_client, Client
from datetime import datetime
# Initialize clients
scraper = alterlab.Client(api_key="sk_live_...")
supabase: Client = create_client(
"https://your-project.supabase.co",
"your-service-role-key",
)
def scrape_and_store(url: str) -> dict:
"""Scrape a URL and store the result in Supabase."""
# Scrape with AlterLab
result = scraper.scrape(
url,
formats=["markdown"],
wait_for=1000,
)
# Upsert into Supabase (update if URL already exists)
response = (
supabase.table("scraped_pages")
.upsert({
"url": url,
"markdown": result.markdown,
"scraped_at": datetime.utcnow().isoformat(),
"tier_used": result.meta.get("tier"),
"cost": result.meta.get("cost"),
}, on_conflict="url")
.execute()
)
return response.data[0] if response.data else {}
# Batch scrape a list of URLs
urls = [
"https://news.ycombinator.com",
"https://techcrunch.com",
"https://theverge.com",
]
for url in urls:
row = scrape_and_store(url)
print(f"Stored {url} → id={row.get('id')}")Node.js SDK
The AlterLab Node.js SDK works natively in Deno (Edge Functions) and Node.js runtimes. Use it with the Supabase JS client for TypeScript projects.
Installation
npm install @alterlab/sdk @supabase/supabase-jsScrape and store
import AlterLab from "@alterlab/sdk";
import { createClient } from "@supabase/supabase-js";
const scraper = new AlterLab({ apiKey: process.env.ALTERLAB_API_KEY! });
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY!,
);
async function scrapeAndStore(url: string) {
// Scrape with AlterLab
const result = await scraper.scrape(url, {
formats: ["markdown"],
waitFor: 1000,
});
// Upsert into Supabase
const { data, error } = await supabase
.from("scraped_pages")
.upsert(
{
url,
markdown: result.markdown,
scraped_at: new Date().toISOString(),
tier_used: result.meta?.tier,
cost: result.meta?.cost,
},
{ onConflict: "url" },
)
.select()
.single();
if (error) throw error;
return data;
}
// Price monitor: check products on an interval
const urls = [
"https://example.com/product/a",
"https://example.com/product/b",
];
for (const url of urls) {
const row = await scrapeAndStore(url);
console.log(`Stored ${url} → id=${row.id}`);
}Common Patterns
Price monitoring
Scrape product pages with structured extraction (extract.schema) to pull price, currency, and stock status into a price_history table. Use Supabase Realtime to push alerts when prices drop.
Lead enrichment
Trigger an Edge Function via Supabase Database Webhooks when a new lead is inserted. The function scrapes the lead's website and appends company size, tech stack, and description back to the row.
Content aggregation
Schedule a pg_cron job to scrape news sources and blogs daily. Store full Markdown content in Supabase and use pgvector with OpenAI embeddings for semantic search across all ingested articles.
Competitor tracking
Monitor competitor pricing pages, job listings, and feature announcements on a schedule. Use Supabase Edge Functions to send a Slack notification via webhook when changes are detected.
Error Handling
AlterLab returns standard HTTP status codes. For production pipelines, implement exponential backoff for 429 (rate limit) and 503 (temporary failure) responses.
// Exponential backoff for transient AlterLab errors
async function scrapeWithRetry(
url: string,
maxRetries = 3,
): Promise<ScrapeResult> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await scraper.scrape(url, { formats: ["markdown"] });
} catch (err: unknown) {
const isRetryable =
err instanceof Error &&
(err.message.includes("429") || err.message.includes("503"));
if (!isRetryable || attempt === maxRetries - 1) throw err;
// Exponential backoff: 1s, 2s, 4s
await new Promise((r) => setTimeout(r, 1000 * 2 ** attempt));
}
}
throw new Error("Max retries exceeded");
}