
How to Scrape Instagram Data: Complete Guide for 2026
Learn how to scrape Instagram publicly available data using Python. Handle dynamic GraphQL endpoints and JavaScript rendering without building complex infrastructure.
April 26, 2026
Disclaimer: This guide covers extracting publicly accessible data. Always review a site's robots.txt and Terms of Service before scraping.
Scraping Instagram requires more than a simple HTTP request. The platform is a complex Single Page Application (SPA). Content loads dynamically via GraphQL. If you send a standard curl or requests.get() to a public profile, you receive a barebones HTML shell heavily obfuscated by JavaScript.
To get structured data, you must render the JavaScript or intercept the underlying API calls. This guide shows you how to scrape Instagram public profiles and posts reliably using Python.
Why collect social data from Instagram?
Engineers and data scientists extract public Instagram data for three primary reasons:
- Market Research: Brands track competitor follower growth and public engagement metrics over time.
- Sentiment Analysis: Public comments on brand posts provide raw data for NLP pipelines to gauge customer reaction to product launches.
- Trend Monitoring: Discovering velocity changes in specific public hashtags to identify emerging fashion, tech, or cultural trends.
You only need access to publicly available information to build these datasets. Authenticated scraping or extracting private user data introduces significant legal and ethical risks. Stick to public pages.
Technical challenges
When you attempt to scrape Instagram, you immediately hit infrastructure hurdles.
First, the data isn't in the initial HTML payload. The browser executes megabytes of JavaScript, which then fires GraphQL requests to fetch profile details, post metadata, and image URLs. To see what a real user sees, your scraper must execute this JavaScript.
Second, the platform employs strict rate limiting on public endpoints. If a single IP address makes too many requests in a short window, the server returns HTTP 429 (Too Many Requests) or blocks the IP entirely.
Third, Instagram frequently updates its DOM structure. CSS class names are auto-generated and change constantly. Relying on hardcoded XPath or CSS selectors leads to brittle data pipelines that break weekly.
Handling headless browser fleets and proxy rotation at scale is tedious. Instead of building this from scratch, you can use the Smart Rendering API to request a URL and receive the fully rendered page state.
Quick start with AlterLab API
You need a reliable way to render the page and extract the data. Before running these examples, ensure you have read the Getting started guide to set up your environment.
Here is how you fetch a fully rendered public profile using Python.
import alterlab
import json
client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
"https://instagram.com/nike",
render_js=True,
wait_for=".x1lliihq"
)
print(f"Status Code: {response.status_code}")If you prefer testing from the command line, you can achieve the exact same result using cURL.
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://instagram.com/nike",
"render_js": true
}'Try fetching a public Instagram profile with rendering enabled.
Extracting structured data
Getting the HTML is only the first step. You need structured JSON. Because Instagram's CSS classes change, the most robust extraction method targets the hidden JSON data embedded in the page or uses an LLM to parse the visual structure.
If you are parsing the HTML manually, look for the <script type="application/ld+json"> tag. Many public pages include structured data for SEO purposes.
Here is a Node.js example showing how to extract basic profile metadata from the rendered HTML using Cheerio.
const cheerio = require('cheerio');
const fs = require('fs');
const html = fs.readFileSync('rendered_profile.html', 'utf8');
const $ = cheerio.load(html);
const ldJson = $('script[type="application/ld+json"]').text();
if (ldJson) {
const data = JSON.parse(ldJson);
console.log(`Profile Name: ${data.name}`);
console.log(`Description: ${data.description}`);
} else {
console.log("Structured data not found. DOM parsing required.");
}For posts, the logic is similar. You request the public post URL, wait for the image and comment containers to render, and then parse the resulting DOM.
Best practices
Building a resilient data pipeline requires discipline. Follow these rules to ensure your scraper remains stable and compliant.
Respect the robots.txt. Always check the site's robots.txt file. Do not scrape endpoints explicitly disallowed. Confine your data collection to public pages intended for search engine indexing.
Implement rate limiting. Do not hammer the servers. Even if you use rotating proxies, excessive requests degrade the target infrastructure. Add delays between your requests.
Monitor your success rates. Silently failing scrapers pollute your database with empty records. Log your HTTP status codes. If you see a spike in 403 or 429 errors, pause your pipeline and investigate.
Scaling up
When you move from a local script to a production pipeline, concurrency becomes your main bottleneck. Scraping 10,000 public profiles sequentially takes days.
You must implement batching and asynchronous requests. Python's asyncio combined with a robust queue system like Redis or RabbitMQ handles this well. Push your target URLs to a queue, spin up multiple worker processes, and process the results in parallel.
As your volume increases, infrastructure costs grow. Running thousands of headless browser instances requires significant compute. Review the AlterLab pricing to understand the cost dynamics of rendering heavy JavaScript pages at scale. Optimize your queries. Only render JS when absolutely necessary. If you just need the initial HTML state, disable rendering to speed up the request and lower your costs.
Key takeaways
Scraping Instagram is an exercise in managing dynamic content and connection reliability. Raw HTTP requests fail against modern SPAs. You must execute JavaScript to access the underlying data.
Focus on public endpoints. Build pipelines that handle dynamic DOM structures gracefully, either by finding embedded JSON or using smart extraction. Handle your infrastructure responsibly by implementing rate limits and monitoring your request success rates.
Related guides
Was this article helpful?
Frequently Asked Questions
Related Articles
Popular Posts
Recommended
Newsletter
Scraping insights and API tips. No spam.
Recommended Reading

Selenium Bot Detection: Why You Get Flagged and How to Fix It

How to Scrape Glassdoor: Complete Guide for 2026

How to Scrape AliExpress: Complete Guide for 2026

Why Your Headless Browser Gets Detected (and How to Fix It)

How to Scrape Cloudflare-Protected Sites in 2026
Stay in the Loop
Get scraping insights, API tips, and platform updates. No spam — we only send when we have something worth reading.


