
Crunchbase Data API: Extract Structured JSON in 2026
Learn how to extract structured JSON from Crunchbase using AlterLab's data API — no HTML parsing, just typed finance data ready for pipelines.
This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.
TL;DR
Use AlterLab's Extract API to get structured JSON from Crunchbase pages. Define a JSON schema for the fields you need (ticker, price, change_percent, volume, market_cap), POST the URL and schema, and receive validated typed data — no HTML parsing or custom parsers required.
Why use Crunchbase data?
Crunchbase hosts a wealth of public company and financial information that fuels several engineering workflows:
- AI training: Feed structured funding rounds, acquisition prices, and valuation trends into models for market prediction.
- Analytics pipelines: Join Crunchbase metrics with internal CRM or product usage data to assess competitive positioning.
- Competitive intelligence: Monitor changes in a rival's funding rounds, leadership, or market cap in near real time.
What data can you extract?
The publicly visible finance section on a Crunchbase entity page includes:
- ticker – stock symbol if the company is public.
- price – latest share price.
- change_percent – price change percentage from previous close.
- volume – trading volume.
- market_cap – total market capitalization. These fields are presented as plain text in the page HTML, making them ideal candidates for schema‑based extraction.
The extraction approach
Attempting to pull this data with raw HTTP requests and HTML parsers leads to fragile selectors that break whenever Crunchbase updates its UI. You also need to handle JavaScript rendering, anti‑bot measures, and pagination manually. A data API removes those concerns: the service renders the page, applies anti‑bot bypass, and returns the requested fields according to your schema, delivering ready‑to‑use JSON.
Quick start with AlterLab Extract API
See the Extract API docs for full reference. Below are minimal examples in Python and cURL.
import alterlab
client = alterlab.Client("YOUR_API_KEY")
schema = {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "The ticker field"
},
"price": {
"type": "string",
"description": "The price field"
},
"change_percent": {
"type": "string",
"description": "The change percent field"
},
"volume": {
"type": "string",
"description": "The volume field"
},
"market_cap": {
"type": "string",
"description": "The market cap field"
}
}
}
result = client.extract(
url="https://crunchbase.com/organization/example-company",
schema=schema,
)
print(result.data)curl -X POST https://api.alterlab.io/v1/extract \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://crunchbase.com/organization/example-company",
"schema": {
"properties": {
"ticker": {"type": "string"},
"price": {"type": "string"},
"change_percent": {"type": "string"},
"volume": {"type": "string"},
"market_cap": {"type": "string"}
}
}
}'Both snippets return a JSON object like:
{
"ticker": "ABC",
"price": "152.34",
"change_percent": "+2.5%",
"volume": "4.2M",
"market_cap": "$12.8B"
}Notice the values are already typed strings; AlterLab validates them against the schema before returning.
Define your schema
The schema parameter drives the extraction. Each property can include a description that helps the model locate the correct element on the page. You can also add constraints such as "pattern": "^\\$?[0-9]+\\.?[0-9]*[KMGT]?B?$" for market‑cap strings, ensuring the output matches expectations. If a field isn't found, AlterLab omits it or returns null depending on your "required" list.
Handle pagination and scale
Crunchbase often lists multiple entities (e.g., a search results page). To extract many records:
- Batch URLs: Send an array of objects to the
/v1/extract/batchendpoint (see docs) – each object contains its own URL and can share the same schema. - Async jobs: For >100 pages, use the async endpoint (
/v1/extract/async) to poll for completion, avoiding long‑running HTTP connections. - Rate limits: AlterLab automatically distributes requests across its proxy pool; you still benefit from applying a modest delay (e.g., 200 ms) between batches to stay polite to the source. Check the pricing page for volume‑based discounts; there are no minimums and unused credits never expire.
Key takeaways
- Use a data API, not a scraper, to get reliable, structured JSON from Crunchbase.
- Define a clear JSON schema; AlterLab handles page rendering, anti‑bot, and validation.
- Scale with batch or async calls and monitor usage via the pricing dashboard.
- Always respect robots.txt and Terms of Service when accessing public data.
Extract structured finance data from Crunchbase
Was this article helpful?
Frequently Asked Questions
Related Articles

Google Maps Data API: Extract Structured JSON in 2026
Learn how to extract structured JSON from Google Maps using AlterLab's Extract API — no HTML parsing, just define a schema and get typed data.
Herald Blog Service

How to Scrape AliExpress Data: Complete Guide for 2026
Learn how to scrape AliExpress product data with Python using AlterLab's scraping API. Covers anti-bot handling, selectors, and scaling.
Herald Blog Service

How to Scrape Yelp Data: Complete Guide for 2026
Learn how to scrape Yelp for public business data using Python, AlterLab API, and best practices for handling JavaScript, rate limits, and anti-bot measures.
Herald Blog Service
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Twitter/X Data: Complete Guide for 2026

How to Scrape Cloudflare-Protected Sites in 2026

How to Bypass Cloudflare Bot Protection with Puppeteer in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.
Explore AlterLab
Web Scraping API Resources
Part of the Web Scraping API Documentation cluster
Complete API reference with 5-tier auto-escalation — Curl to challenge resolution.
Pillar pageConfigure Tier 4 browser rendering for SPAs and dynamic content.
Scrape pages behind login using session management.
Real success rates and cost data across all 5 tiers.
MCP Server, Python SDK, and Firecrawl-compatible API for AI agent workflows.