Cloud Storage Export
Export scraping results directly to Amazon S3, Google Cloud Storage, or Azure Blob Storage. Connect once, then auto-export from batch and scheduled jobs.
Dashboard Available
How It Works
Connect
Add your cloud storage credentials via the API or dashboard. AlterLab validates access and encrypts credentials at rest.
Configure
Set a bucket, optional key prefix, and mark one integration as your default export destination.
Export
Enable storage_export on batch or scheduled scraping jobs. Results are automatically uploaded as JSONL, CSV, or JSON files.
Supported Providers
| Provider | Value | Required Credentials |
|---|---|---|
| Amazon S3 | s3 | access_key_id, secret_access_key |
| Google Cloud Storage | gcs | service_account_json |
| Azure Blob Storage | azure | connection_string |
Add a Storage Integration
/api/v1/integrations/storageCreates a new storage integration. Credentials are validated against the provider before being encrypted and stored.
| Parameter | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Friendly name (e.g., "My S3 Bucket") |
| provider | string | Yes | s3, gcs, or azure |
| credentials | object | Yes | Provider-specific credentials (see examples below) |
| bucket | string | Yes | Bucket or container name |
| region | string | No | AWS region, GCS location, or Azure region |
| prefix | string | No | Default key prefix for uploaded files (e.g., scraping/exports/) |
| is_default | boolean | No | Set as default export destination (default: false) |
Amazon S3
Create an IAM user with s3:PutObject, s3:GetObject, and s3:DeleteObject permissions on your target bucket.
curl -X POST https://api.alterlab.io/api/v1/integrations/storage \
-H "Authorization: Bearer YOUR_SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Production S3 Bucket",
"provider": "s3",
"credentials": {
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
},
"bucket": "my-scraping-exports",
"region": "us-east-1",
"prefix": "alterlab/",
"is_default": true
}'Google Cloud Storage
Create a service account with the Storage Object Admin role on your target bucket. Pass the full JSON key as a string.
curl -X POST https://api.alterlab.io/api/v1/integrations/storage \
-H "Authorization: Bearer YOUR_SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "GCS Data Lake",
"provider": "gcs",
"credentials": {
"service_account_json": "{\"type\": \"service_account\", \"project_id\": \"my-project\", ...}"
},
"bucket": "my-gcs-bucket",
"region": "us-central1",
"prefix": "scraping/"
}'Azure Blob Storage
Use a connection string from your Azure Storage Account. The account needs Storage Blob Data Contributor role on the container.
curl -X POST https://api.alterlab.io/api/v1/integrations/storage \
-H "Authorization: Bearer YOUR_SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Azure Exports",
"provider": "azure",
"credentials": {
"connection_string": "DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=...;EndpointSuffix=core.windows.net"
},
"bucket": "scraping-container",
"region": "eastus"
}'Credentials Are Encrypted
Test Your Connection
/api/v1/integrations/storage/{integration_id}/testUploads a small test file to your bucket and deletes it, verifying read and write access.
curl -X POST https://api.alterlab.io/api/v1/integrations/storage/{integration_id}/test \
-H "Authorization: Bearer YOUR_SESSION_TOKEN"Response:
{
"success": true,
"message": "Connection test passed",
"error": null
}You can also run a deeper validation that checks both credential validity and write access:
/api/v1/integrations/storage/{integration_id}/validate{
"valid": true,
"provider": "s3",
"bucket_exists": true,
"writable": true,
"error": null
}Manage Integrations
List Integrations
/api/v1/integrations/storagecurl https://api.alterlab.io/api/v1/integrations/storage \
-H "Authorization: Bearer YOUR_SESSION_TOKEN"Optional query parameters:
| Parameter | Default | Description |
|---|---|---|
| include_inactive | false | Include disabled integrations |
| provider | null | Filter by provider: s3, gcs, azure |
{
"integrations": [
{
"id": "a1b2c3d4-...",
"name": "Production S3 Bucket",
"provider": "s3",
"bucket": "my-scraping-exports",
"region": "us-east-1",
"prefix": "alterlab/",
"is_active": true,
"is_default": true,
"validation_status": "valid",
"total_exports": 142,
"successful_exports": 140,
"failed_exports": 2,
"success_rate": 98.59,
"total_bytes_uploaded": 52428800,
"last_used_at": "2026-03-24T10:00:00Z",
"created_at": "2026-03-01T08:00:00Z"
}
],
"total": 1
}Update Settings
/api/v1/integrations/storage/{integration_id}Update name, bucket, region, prefix, or active status. Does not modify credentials.
curl -X PUT https://api.alterlab.io/api/v1/integrations/storage/{integration_id} \
-H "Authorization: Bearer YOUR_SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prefix": "exports/2026/",
"is_active": true
}'Rotate Credentials
/api/v1/integrations/storage/{integration_id}/credentialsReplace the stored credentials. The new credentials are validated before saving.
curl -X PUT https://api.alterlab.io/api/v1/integrations/storage/{integration_id}/credentials \
-H "Authorization: Bearer YOUR_SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"credentials": {
"access_key_id": "AKIANEWKEY7EXAMPLE",
"secret_access_key": "newSecretKey/EXAMPLE"
}
}'Set Default Destination
/api/v1/integrations/storage/{integration_id}/defaultMark an integration as the default export destination. Only one integration can be default at a time — setting a new default automatically clears the previous one.
curl -X PUT https://api.alterlab.io/api/v1/integrations/storage/{integration_id}/default \
-H "Authorization: Bearer YOUR_SESSION_TOKEN"Delete Integration
/api/v1/integrations/storage/{integration_id}Permanently removes the integration and its encrypted credentials. Returns 204 No Content.
curl -X DELETE https://api.alterlab.io/api/v1/integrations/storage/{integration_id} \
-H "Authorization: Bearer YOUR_SESSION_TOKEN"Auto-Export with Batch & Scheduler
Once you have a default storage integration, you can enable automatic export on batch and scheduled scraping jobs. Results are uploaded to your bucket as they complete.
Default Integration Required
With Batch Scraping
Add storage_export to your batch request to upload results when the batch completes:
curl -X POST https://api.alterlab.io/api/v1/batch \
-H "X-API-Key: your_api_key" \
-H "Content-Type: application/json" \
-d '{
"urls": [
{ "url": "https://example.com/page-1" },
{ "url": "https://example.com/page-2" }
],
"storage_export": {
"enabled": true,
"format": "jsonl"
}
}'With Scheduled Scraping
Add storage_export to your schedule so every execution uploads results automatically:
curl -X POST https://api.alterlab.io/api/v1/schedules \
-H "X-API-Key: your_api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"cron": "0 */6 * * *",
"storage_export": {
"enabled": true,
"format": "csv"
}
}'Supported export formats:
| Format | Description |
|---|---|
| jsonl | One JSON object per line — ideal for streaming and BigQuery |
| csv | Comma-separated values — opens in Excel and Google Sheets |
| json | Single JSON array — simple for small datasets |
Limits
Integration Limits
- Maximum 5 storage integrations per account
- Maximum 2 integrations per provider
- Only one default integration at a time
- Credentials are validated on create and credential update — invalid credentials are rejected
- All operations require a valid session (JWT via
Authorizationheader)
Python Example
import alterlab
client = alterlab.AlterLab(api_key="your_api_key")
# Add an S3 integration
integration = client.create_storage_integration(
name="My S3 Bucket",
provider="s3",
credentials={
"access_key_id": "AKIAIOSFODNN7EXAMPLE",
"secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
},
bucket="my-scraping-exports",
region="us-east-1",
prefix="alterlab/",
is_default=True,
)
print(f"Integration ID: {integration['id']}")
print(f"Status: {integration['validation_status']}")
# Test the connection
test = client.test_storage_integration(integration["id"])
print(f"Test passed: {test['success']}")
# List all integrations
integrations = client.list_storage_integrations()
for i in integrations["integrations"]:
default = " (default)" if i["is_default"] else ""
print(f" {i['provider']}: {i['name']}{default} — {i['validation_status']}")
# Use with batch scraping
batch = client.batch_scrape(
urls=[
{"url": "https://example.com/page-1"},
{"url": "https://example.com/page-2"},
],
storage_export={"enabled": True, "format": "jsonl"},
)
print(f"Batch {batch['batch_id']} submitted with storage export")Node.js Example
import AlterLab from "@alterlab/sdk";
const client = new AlterLab({ apiKey: "your_api_key" });
// Add an S3 integration
const integration = await client.createStorageIntegration({
name: "My S3 Bucket",
provider: "s3",
credentials: {
access_key_id: "AKIAIOSFODNN7EXAMPLE",
secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
},
bucket: "my-scraping-exports",
region: "us-east-1",
prefix: "alterlab/",
isDefault: true,
});
console.log(`Integration ID: ${integration.id}`);
console.log(`Status: ${integration.validationStatus}`);
// Test the connection
const test = await client.testStorageIntegration(integration.id);
console.log(`Test passed: ${test.success}`);
// List all integrations
const { integrations } = await client.listStorageIntegrations();
for (const i of integrations) {
const tag = i.isDefault ? " (default)" : "";
console.log(` ${i.provider}: ${i.name}${tag} — ${i.validationStatus}`);
}
// Use with batch scraping
const batch = await client.batchScrape({
urls: [
{ url: "https://example.com/page-1" },
{ url: "https://example.com/page-2" },
],
storageExport: { enabled: true, format: "jsonl" },
});
console.log(`Batch ${batch.batchId} submitted with storage export`);