Guides

How to Avoid Getting Your Proxy Blocked

Avoid proxy blocks with TLS fingerprint matching, realistic headers, and crawl rate control. Proxy detection bypass for Python and browser automation.

Jun 12, 2026 - 00:17

Jun 2, 2026 - 12:15

Why Proxies Get Blocked: The Full Picture {#why-blocked}
Rotate Proxies to Avoid Rate Limits {#rotate-proxies}
Fix Your TLS Fingerprint (JA3/JA4) {#tls-fingerprint}
- Session-based usage with curl_cffi
Set Consistent HTTP Headers {#consistent-headers}
Respect Crawl Rates and robots.txt {#crawl-rate}
Handle Cookies and Sessions Correctly {#cookies-sessions}
Avoid Common Bot Detection Signals {#bot-signals}
- Headless browser detection (Selenium / Playwright)
- Other bot signals to eliminate
Check for Proxy Header Leaks {#header-leaks}
Test Whether You Are Being Detected {#detection-test}
Common Blocking Patterns and How to Respond {#blocking-patterns}
About the Author

Why Proxies Get Blocked: The Full Picture {#why-blocked}

Most people trying to avoid proxy blocks focus entirely on the IP — rotating more proxies, switching providers, trying residential instead of datacenter. That fixes maybe 30% of blocks. The other 70% come from fingerprinting signals that have nothing to do with the IP address. Cloudflare, Akamai, DataDome, and PerimeterX all fingerprint your TLS handshake, HTTP headers, and browser behavior before they even look at the IP. This guide covers every layer of proxy detection bypass — from TLS fingerprinting to cookie handling — with code for Python and browser automation.

Sites use multiple independent detection layers. A proxy can fail any one of them regardless of how clean the IP is:

| Detection Layer | What Is Checked | How to Pass |

|---|---|---|

| IP reputation | IP in blocklists, ASN flagged as datacenter | Use clean datacenter IPs; rotate frequently |

| TLS fingerprint (JA3/JA4) | TLS client hello parameters — cipher suites, extensions, curves | Match browser TLS fingerprint with curl_cffi or tls-client |

| HTTP headers | User-Agent, Accept, Accept-Language, Accept-Encoding consistency | Send a full, internally consistent browser header set |

| HTTP version | HTTP/1.1 vs HTTP/2 vs HTTP/3 | Use HTTP/2 (httpx) to match modern browsers |

| Request rate | Requests per second per IP | Add delays; rotate proxies per request or per domain |

| Cookie/session consistency | Same session ID, different IP = suspicious | Bind one proxy to one session; reset cookies with proxy |

| JavaScript / browser behavior | navigator.webdriver, canvas fingerprint, mouse events | Use undetected-chromedriver or Playwright with stealth patch |

| Proxy headers (X-Forwarded-For) | Real IP leaked in proxy headers | Use proxies that strip these headers |

Changing the proxy only fixes the first row. The others require changes to your HTTP client or browser configuration.
Rotate Proxies to Avoid Rate Limits {#rotate-proxies}

Rate-based blocks are the most common and the easiest to fix. A target site counts requests per IP per time window — typically 10–100 requests per minute before triggering a CAPTCHA or 429.

```python

import time

import random

import requests

PROXIES = [

"http://gateway.sparkproxy.io:10000",

"http://gateway.sparkproxy.io:10001",

"http://gateway.sparkproxy.io:10002",

]

def get(url: str, min_delay: float = 1.0, max_delay: float = 3.0) -> requests.Response:

proxy = random.choice(PROXIES)

resp = requests.get(

url,

proxies={"http": proxy, "https": proxy},

timeout=15,

)

time.sleep(random.uniform(min_delay, max_delay)) # Human-like pacing

return resp

```

Two rotation strategies and when to use each:

| Strategy | When to Use | Implementation |

|---|---|---|

| Per-request rotation | Stateless pages — product listings, search results | random.choice(PROXIES) before each request |

| Per-session rotation | Login flows, multi-step checkouts, account-based scraping | One requests.Session per proxy; do not rotate mid-session |

proxy rotation
Fix Your TLS Fingerprint (JA3/JA4) {#tls-fingerprint}

This is the most overlooked aspect of proxy detection bypass. Every TLS client sends a "Client Hello" message at the start of an HTTPS connection. The combination of cipher suites, TLS extensions, elliptic curves, and signature algorithms in this message creates a fingerprint — JA3 (older) or JA4 (newer). This fingerprint is computed entirely from the network packet, not from any header your code sets.

Python's requests library uses urllib3 under the hood, which has its own distinctive TLS fingerprint that matches no browser. Even if you set User-Agent: Mozilla/5.0 (Chrome/124), the TLS fingerprint still says "Python/urllib3." Cloudflare and DataDome block on this mismatch.

The fix: curl_cffi — a Python binding to curl that impersonates the actual TLS fingerprint of Chrome, Firefox, or Safari:

```bash

pip install curl_cffi

```

```python

from curl_cffi import requests as cffi_requests

Impersonates Chrome 124's exact TLS fingerprint, HTTP/2, and headers

resp = cffi_requests.get(

"https://httpbin.org/ip",

proxies={"http": "http://gateway.sparkproxy.io:10000",

"https": "http://gateway.sparkproxy.io:10000"},

impersonate="chrome124",

)

print(resp.json())

```

Available impersonation targets (as of curl_cffi 0.7):

| Impersonation Target | TLS + HTTP Version | Use When |

|---|---|---|

| chrome124 | TLS 1.3, HTTP/2 | General-purpose — matches most desktop Chrome users |

| chrome110 | TLS 1.3, HTTP/2 | Sites that specifically detect Chrome 124+ as too new |

| firefox117 | TLS 1.3, HTTP/2 | Firefox user demographic |

| safari17_0 | TLS 1.3, HTTP/2 | iOS/Mac targets with Safari-heavy traffic |

| edge99 | TLS 1.3, HTTP/2 | Corporate or Microsoft-heavy environments |

curl_cffi also handles HTTP/2 automatically — requests only does HTTP/1.1 by default, which is another detectable mismatch against modern browsers.
- Session-based usage with curl_cffi
  
```python

from curl_cffi import requests as cffi_requests

session = cffi_requests.Session(impersonate="chrome124")

session.proxies = {

"http": "http://gateway.sparkproxy.io:10000",

"https": "http://gateway.sparkproxy.io:10000",

}

All requests in this session use Chrome's TLS fingerprint

resp = session.get("https://example.com/login")

```
Set Consistent HTTP Headers {#consistent-headers}

Even when your TLS fingerprint is correct, inconsistent HTTP headers reveal automation. A real Chrome 124 browser always sends a specific set of headers in a specific order. Sending User-Agent: Chrome/124 while omitting sec-ch-ua or sending Accept: / (Python default) instead of Chrome's real Accept header is a detectable anomaly.

Minimum consistent header set for Chrome 124 on Windows:

```python

import requests

CHROME_HEADERS = {

"User-Agent": (

"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "

"AppleWebKit/537.36 (KHTML, like Gecko) "

"Chrome/124.0.0.0 Safari/537.36"

),

"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8",

"Accept-Language": "en-US,en;q=0.9",

"Accept-Encoding": "gzip, deflate, br",

"sec-ch-ua": '"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',

"sec-ch-ua-mobile": "?0",

"sec-ch-ua-platform": '"Windows"',

"Sec-Fetch-Dest": "document",

"Sec-Fetch-Mode": "navigate",

"Sec-Fetch-Site": "none",

"Sec-Fetch-User": "?1",

"Upgrade-Insecure-Requests": "1",

}

session = requests.Session()

session.headers.update(CHROME_HEADERS)

session.proxies = {"http": "http://gateway.sparkproxy.io:10000",

"https": "http://gateway.sparkproxy.io:10000"}

```

Header order matters. Browsers send headers in a consistent, fixed order. HTTP/2 makes this especially detectable — the HPACK compression encodes header order into the frame. curl_cffi handles this automatically. With plain requests, you can use requests.structures.CaseInsensitiveDict only for storage; the actual wire order may vary by Python version.

Combining curl_cffi for TLS + correct headers is the most reliable approach:

```python

from curl_cffi import requests as cffi_requests

resp = cffi_requests.get(

"https://target-site.com",

headers={

"Accept-Language": "en-US,en;q=0.9",

"Referer": "https://www.google.com/",

},

impersonate="chrome124",

proxies={"http": "http://gateway.sparkproxy.io:10000",

"https": "http://gateway.sparkproxy.io:10000"},

)

```

curl_cffi already sets the correct TLS fingerprint, HTTP/2, and base browser headers for the chosen impersonation target. You only need to add request-specific headers like Referer.
Respect Crawl Rates and robots.txt {#crawl-rate}

Sending 10 requests per second from the same proxy will trigger rate limiting before any fingerprint detection kicks in. Respecting the target site's intended crawl rate is both effective and ethical.

```python

import time

import random

def polite_delay(min_s: float = 1.0, max_s: float = 4.0) -> None:

"""Sleep for a random human-like interval between requests."""

time.sleep(random.uniform(min_s, max_s))

```

Guidelines:

| Crawl Rate | Risk Level | When Appropriate |

|---|---|---|

| > 5 req/s per IP | High — near-certain block | Never |

| 1–5 req/s per IP | Medium | Only with very large proxy pool |

| 1 req / 1–3 s | Low | General scraping |

| 1 req / 5–10 s | Very low | Sensitive sites, e-commerce, login-required |

To check a site's declared crawl rate:

```python

import urllib.robotparser

rp = urllib.robotparser.RobotFileParser()

rp.set_url("https://example.com/robots.txt")

rp.read()

print(rp.can_fetch("*", "/products")) # True or False

print(rp.crawl_delay("*")) # Returns float or None

```

If crawl_delay returns a value, respect it. Ignoring Crawl-delay is one of the most common reasons scraper IPs get added to permanent blocklists.
Handle Cookies and Sessions Correctly {#cookies-sessions}

Sites use cookies to track session continuity. Two behaviors trigger blocks:

1. Rotating the proxy but keeping the same cookies. The site sees the same session ID arriving from two different IPs — a signal consistent with proxying. Fix: reset cookies when rotating to a new proxy.

2. Making requests with no cookies at all. Real browsers accumulate cookies progressively — homepage sets a session cookie, subsequent pages send it back. Scripts that skip directly to a product page with no cookies look like bots.

```python

import requests

import random

PROXIES = [

"http://gateway.sparkproxy.io:10000",

"http://gateway.sparkproxy.io:10001",

]

def new_session(proxy_url: str) -> requests.Session:

"""Fresh session (no cookies) bound to one proxy."""

session = requests.Session()

session.proxies = {"http": proxy_url, "https": proxy_url}

return session

def scrape_product(product_url: str) -> str:

proxy = random.choice(PROXIES)

session = new_session(proxy)

Step 1: Hit the homepage to receive initial cookies

session.get("https://example.com/", timeout=10)

Step 2: Now request the target page — cookies are sent automatically

resp = session.get(product_url, timeout=10)

return resp.text

```

Each call to new_session() creates a session with empty cookies. The homepage visit seeds the session with whatever cookies the site sets on first visit, making subsequent requests look like normal browsing.
Avoid Common Bot Detection Signals {#bot-signals}
- Headless browser detection (Selenium / Playwright)
  
  Selenium sets navigator.webdriver = true in the browser's JavaScript environment. This is readable by any JavaScript on the page and is the primary detection signal used by Cloudflare and DataDome for browser-based scraping.
  
  Selenium fix — undetected-chromedriver:
  
```bash

pip install undetected-chromedriver

```
  
```python

import undetected_chromedriver as uc

options = uc.ChromeOptions()

options.add_argument("--proxy-server=gateway.sparkproxy.io:10000")

driver = uc.Chrome(options=options)

driver.get("https://example.com")

```
  
  undetected-chromedriver patches the Chrome binary to remove webdriver flags and other Selenium-specific modifications.
  
  Playwright fix — stealth plugin:
  
```bash

pip install playwright playwright-stealth

playwright install chromium

```
  
```python

from playwright.sync_api import sync_playwright

from playwright_stealth import stealth_sync

with sync_playwright() as p:

browser = p.chromium.launch(proxy={

"server": "http://gateway.sparkproxy.io:10000",

})

page = browser.new_page()

stealth_sync(page) # Patches navigator.webdriver and other leaks

page.goto("https://example.com")

browser.close()

```
- Other bot signals to eliminate
  
  | Signal | What Bots Do | What Real Browsers Do |
  
  |---|---|---|
  
  | Mouse movement | None — direct element click | Random Bezier-curve paths before clicking |
  
  | Viewport size | 800×600 default headless | 1280×800 or 1920×1080 common sizes |
  
  | Timezone | UTC (server default) | Match the proxy's country timezone |
  
  | WebGL renderer | SwiftShader / LLVMpipe (headless default) | Real GPU renderer string |
  
  | Canvas fingerprint | Empty or identical across sessions | Unique noise per session |
  
  For Playwright, set a realistic viewport and timezone:
  
```python

context = browser.new_context(

viewport={"width": 1280, "height": 800},

locale="en-US",

timezone_id="America/New_York", # Match your proxy's datacenter location

proxy={"server": "http://gateway.sparkproxy.io:10000"},

)

```
Check for Proxy Header Leaks {#header-leaks}

Some proxies add X-Forwarded-For or X-Real-IP headers to outgoing requests, inadvertently revealing your real IP address to the target server. Always verify your proxy does not leak before using it in production.

```python

import requests

def check_header_leak(proxy_url: str) -> dict:

proxies = {"http": proxy_url, "https": proxy_url}

resp = requests.get("https://httpbin.org/headers", proxies=proxies, timeout=10)

headers = resp.json().get("headers", {})

leak_headers = {

k: v for k, v in headers.items()

if k.lower() in ("x-forwarded-for", "x-real-ip", "via", "forwarded")

}

return {

"proxy": proxy_url,

"leak_headers": leak_headers,

"has_leak": bool(leak_headers),

}

result = check_header_leak("http://gateway.sparkproxy.io:10000")

print(result)

{"proxy": "...", "leak_headers": {}, "has_leak": False} ← clean proxy

{"proxy": "...", "leak_headers": {"X-Forwarded-For": "203.0.113.1"}, "has_leak": True} ← leaks real IP

```

If has_leak is True, switch to a proxy that strips these headers. SparkProxy datacenter proxies do not forward client IP headers.
Test Whether You Are Being Detected {#detection-test}

Before deploying, test your setup against detection services:

```python

from curl_cffi import requests as cffi_requests

def detection_test(proxy_url: str) -> None:

proxies = {"http": proxy_url, "https": proxy_url}

session = cffi_requests.Session(impersonate="chrome124")

session.proxies = proxies

tests = {

"Exit IP": "https://httpbin.org/ip",

"Headers": "https://httpbin.org/headers",

"Cloudflare check": "https://www.cloudflare.com/cdn-cgi/trace",

}

for name, url in tests.items():

try:

resp = session.get(url, timeout=10)

if name == "Cloudflare check":

Parse key=value text response

data = dict(line.split("=", 1) for line in resp.text.strip().splitlines() if "=" in line)

print(f"[Cloudflare] ip={data.get('ip')} uag={data.get('uag', '')[:40]}")

else:

print(f"[{name}] {resp.status_code}: {resp.text[:120]}")

except Exception as exc:

print(f"[{name}] FAILED: {exc}")

detection_test("http://gateway.sparkproxy.io:10000")

```

https://www.cloudflare.com/cdn-cgi/trace returns the IP Cloudflare sees, the User-Agent string it received, and whether it thinks the request is from a bot. If uag matches your User-Agent header and ip matches the proxy IP, the request looks legitimate to Cloudflare.
Common Blocking Patterns and How to Respond {#blocking-patterns}

| Block Pattern | How to Identify | Fix |

|---|---|---|

| Immediate 403 on every request | IP in blocklist; or TLS fingerprint flagged instantly | Switch proxy; switch to curl_cffi with browser impersonation |

| 403 after N requests | Rate limit hit | Slow down; rotate proxy per request; add delays |

| CAPTCHA (JavaScript challenge) | Cloudflare "Checking your browser" page | Use curl_cffi; or switch to headless browser with stealth |

| Soft block: returns empty results | Anti-scraping at application layer (not HTTP) | Check session/cookie handling; simulate homepage visit first |

| 407 Proxy Authentication Required | Proxy credentials wrong or expired | Verify credentials in SparkProxy dashboard |

| Block only on HTTPS | CONNECT tunneling disabled; or TLS fingerprint mismatch | Switch to proxy with CONNECT support; use curl_cffi |

| Works from laptop, blocked from cloud | Cloud provider ASN flagged (AWS, GCP, Azure ranges known) | Use residential or mobile proxies instead of datacenter |

datacenter vs residential proxies
About the Author

SparkProxy Technical Team — The SparkProxy engineering team builds and maintains global datacenter and residential proxy infrastructure. This guide reflects anti-detection patterns validated with Python 3.11+, curl_cffi 0.7+, playwright-stealth 0.1.3+, and undetected-chromedriver 3.5+ (May 2026).

Citations: curl_cffi — Browser TLS impersonation for Python · Playwright Stealth — playwright-stealth on PyPI