How to Scrape Instagram with Python and Astro residential proxies: A Step-by-Step Guide
02 July 2026

Parsing Instagram remains one of the most in-demand tasks in web scraping, competitive analysis, and marketing research. Companies use parsers to monitor public profiles, track brand activity, and automate the collection of open data.
As request volume grows, a key challenge emerges – keeping the parser stable. Sending all requests from a single IP address can eventually lead to rate limits, slower response times, and blocks. Residential proxies solve this problem: requests are distributed across a pool of real IP addresses, keeping the load on any single address minimal.
In the previous article, we covered why proxies are essential for Instagram scraping. In this case study, we'll show how to connect Astro residential proxies to a Python parser and set up collection of public Instagram data. You'll learn how to configure an HTTP client, connect a proxy, correctly handle responses, and prepare the project for scaling.
After reading this article, you'll be able to:
- connect Astro proxies to Python;
- configure an HTTP client using `httpx`;
- request public Instagram pages and verify response correctness;
- handle service-side restrictions (redirects, rate limits);
- scale the parser without changing the application's architecture.
Important to Know Before You Start
Instagram actively restricts automated access to its pages. In practice, this means:
- Without authorization, the service may redirect the request to the login page. The parser must be able to recognize this situation rather than treat it as a successful response.
- A significant portion of the content is loaded via JavaScript. A simple HTTP request reliably extracts primarily data from the page's meta tags (profile name, description, follower and post counts from `og:description`). Deeper data collection requires other tools, such as browser automation.
- Don't send requests too frequently. Pauses between requests and handling of the 429 status code are a mandatory part of any working parser.
- Before launching the project, make sure your data collection complies with the service's terms of use and the laws of your jurisdiction. Work only with publicly available information.
Why Astro Is a Good Fit for Parsing Instagram
When scaling Instagram parsing, it's not just the code that matters – the quality of the proxy infrastructure matters too. If the IP pool is small or rotation is limited, even a well-written parser will eventually run into rate limits and reduced stability.
Astro provides tools that help avoid these problems:
- 50 million IP addresses across 150 countries let you distribute requests across a large number of real residential addresses and scale data collection without changing the application's architecture.
- Multiple IP rotation modes – a new address on every connection, rotation on a timer (starting from 1 minute), or forced rotation via API. This lets you pick the optimal strategy for a specific parsing scenario.
- Flexible geo-targeting choose a country, city, or ISP, with the option to use up to 10 countries on a single port or a random country while excluding unwanted regions.
- HTTP(S) and SOCKS5 support, plus up to 250 concurrent TCP connections per port, which is convenient for parallel processing of large numbers of profiles.
- 99.9% uptime helps ensure stable operation for long-running parsers and automated tasks.
A free $3 trial is available, giving you enough credit to test the service and verify that the proxy works with your code before purchasing.
That's why Astro works well both for small Python scripts and for large-scale systems that collect public Instagram data on an ongoing basis.
Connection Specifics: Domain Names Only
Astro proxies handle requests by domain name – direct access to resources by IP address is not available. This doesn't limit our parser in any way: all requests are formed using the domain `www.instagram.com`, and DNS resolution happens on the proxy server's side.
The one thing to keep in mind: don't pre-resolve the domain to an IP address in your code (for example, via `socket.gethostbyname`) and then make a request to that IP – the proxy won't allow such a request through. Always pass a URL with the domain name to the HTTP client.
What You'll Need
- Python 3.11 or newer;
- the `httpx` library;
- the `beautifulsoup4` library (for HTML parsing);
- Astro proxy credentials.
Install the dependencies:
```bash
pip install httpx beautifulsoup4
```
Project Structure
```
instagram-parser/
│
├── config.py
├── parser.py
└── requirements.txt
```
Contents of `requirements.txt`:
```
httpx>=0.27
beautifulsoup4>=4.12
```
Step 1. Configure the Proxy
Create a `config.py` file:
```python
PROXY_HOST = "YOUR_PROXY_HOST"
PROXY_PORT = "YOUR_PROXY_PORT"
PROXY_LOGIN = "YOUR_LOGIN"
PROXY_PASSWORD = "YOUR_PASSWORD"
```
In production projects, store credentials in environment variables or a secrets manager, not in the repository code.
Step 2. Connect the Astro Proxy to Python
```python
from urllib.parse import quote
import httpx
from config import PROXY_HOST, PROXY_PORT, PROXY_LOGIN, PROXY_PASSWORD
proxy = (
f"http://{quote(PROXY_LOGIN)}:{quote(PROXY_PASSWORD)}"
f"@{PROXY_HOST}:{PROXY_PORT}"
)
```
Note the `quote()` call: if the login or password contains special characters (`@`, `:`, `/`), the proxy URL will be malformed without escaping.
Create the client using a context manager – this guarantees connections are properly closed:
```python
with httpx.Client(
proxy=proxy,
timeout=30,
follow_redirects=True,
) as client:
...
```
Now all HTTP requests will go through the Astro proxy.
Step 3. Build the List of Profiles
```python
profiles = [
"natgeo",
"nasa",
"nike",
"github",
"instagram",
]
```
Step 4. Request a Profile Page and Validate the Response
```python
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/137.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
def fetch_profile(client: httpx.Client, username: str) -> str:
url = f"https://www.instagram.com/{username}/"
response = client.get(url, headers=HEADERS)
response.raise_for_status()
Instagram may respond with status 200 but redirect to the login page. Such a response contains no profile data – we flag this explicitly.
if "/accounts/login" in str(response.url):
raise RuntimeError("Instagram redirected to the login page")
return response.text
```
Checking the final URL is a critical step. Since the client follows redirects, the login page will come back with a 200 status code, and without this check the parser would treat it as a successful result.
Step 5. Extract Data from the HTML
The meta tags are the most reliable source of data on a public profile page – Instagram places a brief summary of the profile there:
```python
from bs4 import BeautifulSoup
def extract_summary(html: str) -> str | None:
soup = BeautifulSoup(html, "html.parser")
tag = soup.find("meta", property="og:description")
return tag["content"] if tag else None
```
`og:description` typically includes follower, following, and post counts – enough for monitoring the dynamics of public profiles.
Step 6. Run the Parser
```python
import time
def main() -> None:
with httpx.Client(
proxy=proxy,
timeout=30,
follow_redirects=True,
) as client:
for username in profiles:
try:
html = fetch_profile(client, username)
summary = extract_summary(html)
print(f"{username}: {summary or 'no meta data found'}")
except httpx.HTTPStatusError as e:
status = e.response.status_code
if status == 429:
print(f"{username}: rate limit exceeded, pausing for 60 sec")
time.sleep(60)
else:
print(f"{username}: HTTP {status}")
except Exception as e:
print(f"{username}: {e}")
A pause between requests reduces the risk of being blocked.
time.sleep(5)
if __name__ == "__main__":
main()
```
How the Instagram Parser Works Through the Proxy
Python → httpx Client → Astro Proxy → www.instagram.com
All requests pass through the proxy, while the application code barely changes. Requests are made strictly by domain name, so the proxy's restriction against direct IP access has no effect on how the parser works. The same architecture works for both small projects and large-scale web scraping systems – as load grows, it's enough to parallelize the processing of the profile list and expand the proxy pool.
Where This Approach Can Be Used
This example can be adapted for a variety of tasks:
- monitoring public Instagram profiles and follower dynamics;
- competitor analysis;
- open data collection;
- marketing research;
- building analytics services;
- web scraping automation.
Conclusion
Using Astro proxies lets you quickly plug a networking layer into an existing Python parser without changing the application's business logic. The parser in this case study correctly handles login-page redirects, respects pauses between requests, reacts to rate limits from the service, and works exclusively through domain names – fully in line with Astro's proxy connection model.
As the project grows and data volume increases, it's enough to scale the parser itself while keeping the same proxy connection scheme. This approach works well both for small internal tools and for large-scale projects focused on automated collection and analysis of publicly available information.
Related questions
-
You can scrape public Instagram pages using Python together with an HTTP client such as
httpx. After downloading the HTML page, the required information can be extracted withBeautifulSoup. For reliable operation, residential proxies and proper rate limiting are recommended. -
Sending all requests from a single IP address increases the chance of rate limits, redirects to the login page, or temporary blocks. Proxies distribute requests across multiple IP addresses, making large-scale data collection more reliable.
-
Residential proxies are commonly used for Instagram scraping because they route traffic through real residential IP addresses, making request distribution more natural for large-scale public data collection.
-
Yes. Some public profile information can be collected without authentication by downloading the profile page and parsing its HTML metadata. However, certain content requires authentication or is rendered dynamically with JavaScript.
-
Instagram may return HTTP 200 while redirecting the request to the login page. This typically happens when automated activity is detected or request limits are exceeded. A scraper should always validate the final URL, not just the status code.
-
HTTP 429 indicates that too many requests have been sent in a short period of time. The usual solution is to reduce the request rate, introduce delays between requests, and rotate proxy IP addresses.
-
A minimal setup typically includes:
httpxfor HTTP requests;beautifulsoup4for HTML parsing.
These libraries are sufficient for extracting data from public profile pages.
-
Yes. Astro proxies support both HTTP(S) and SOCKS5 protocols, making them compatible with popular Python libraries such as
httpx,requests, andaiohttpwithout requiring changes to the application's architecture.


