Concurrent Connections in Proxies: What They Are and Why They Matter
Concurrent connections determine how many requests your proxy handles in parallel. Learn how to calculate your needs, avoid limits, and scale reliably.
Table of Contents
- What Are Concurrent Connections in a Proxy? {#what-are-concurrent-connections}
- How Do You Calculate the Concurrent Connections You Need? {#how-to-calculate}
- What Is the Difference Between Connections, Threads, and Processes in a Proxy? {#connections-vs-threads}
- How Does Connection Pooling Reduce Concurrency Pressure? {#connection-pooling}
- What Limits Concurrent Connections at Each Layer? {#what-limits-connections}
- How Do HTTP/1.1 and HTTP/2 Affect Proxy Concurrency? {#http-versions-and-concurrency}
- What Happens When You Exceed Your Concurrency Limit? {#exceeding-concurrency-limits}
- How Do You Configure Concurrent Connections in Your Client? {#how-to-configure}
- How Do You Scale Beyond a Single Proxy Concurrency Limit? {#how-to-scale}
- Conclusion {#conclusion}
-
What Are Concurrent Connections in a Proxy? {#what-are-concurrent-connections}
Bandwidth and uptime get most of the attention when people evaluate proxy performance. Concurrent connections usually don't — until a scraping job stalls at 20% throughput or a data pipeline starts returning connection errors at scale.
Concurrent connections are the number of proxy connections open at the same time. Every open request holds a connection. If your proxy plan allows 100 concurrent connections and your application tries to open 150, the excess requests queue or fail. Understanding this limit — and how to calculate what you actually need — is the difference between a proxy operation that scales and one that hits invisible ceilings.
Key Takeaways
- Concurrent connections = the number of TCP connections open through your proxy at any given moment. Each in-flight request holds one connection until a response is fully received.
- Your required concurrency follows Little's Law:
C = λ × W, whereCis concurrent connections,λis request throughput (requests/second), andWis average response time (seconds). - HTTP/1.1 uses one request per connection. HTTP/2 multiplexes many requests over a single connection — but proxy-side concurrency limits still apply per stream.
- Proxy providers enforce concurrency limits per plan tier, per IP, or per authentication credential. Exceeding them produces
503 Service Unavailableor connection queuing, not silent degradation. - Connection pooling — reusing established TCP connections across multiple requests — reduces handshake overhead and lets you serve higher throughput within the same concurrency limit.
A concurrent connection is a TCP connection that is open and in use at a given point in time. In proxy terms, each in-flight request holds two TCP connections: one from your client to the proxy, and one from the proxy to the destination server. Your proxy provider's concurrency limit typically refers to the number of simultaneous client-side connections your credential or IP is allowed to hold open.
The critical distinction is concurrent vs total. A plan that allows 100 concurrent connections doesn't limit you to 100 total requests per session — it limits how many can be active at the same moment. If each request takes 500 ms and you send 10 requests per second, you hold an average of 5 connections open at any time. That fits well within a 100-connection plan. If response times increase to 5 seconds, your average open connections become 50 at the same throughput.
This is why slow target sites are a concurrency multiplier: the slower the destination, the more connections your application holds open waiting for responses, and the higher the concurrency your plan needs to sustain.
What counts as one connection:
- One HTTP/1.1 GET request in flight = 1 connection
- One HTTPS CONNECT tunnel with an active request inside = 1 connection
- One keep-alive connection idle between requests = 1 connection (if your client holds it open)
- One HTTP/2 session = 1 TCP connection, but potentially many concurrent streams inside it
What does not count:
- Completed requests with closed connections
- DNS lookups (not TCP connections to the proxy)
- Time between requests when the connection is closed
-
How Do You Calculate the Concurrent Connections You Need? {#how-to-calculate}
Little's Law is the right tool for this calculation. Originally formulated by John D.C. Little at MIT (1961) for queuing theory, it applies directly to proxy connection planning:
$$C = \lambda \times W$$
Where:
- C = average number of concurrent connections
- λ (lambda) = average request throughput (requests per second)
- W = average response time (seconds per request)
Example calculation:
Your scraping job sends 50 requests per second. The target site responds in an average of 2.5 seconds.
$$C = 50 \times 2.5 = 125 \text{ concurrent connections}$$
You need a plan that supports at least 125 concurrent connections to sustain this throughput without queuing. Add a 20–30% buffer for variance in response times and connection overhead:
$$C_{\text{plan}} = 125 \times 1.25 = 156 \text{ connections}$$
Round up to the nearest plan tier — in this case, a 200-connection plan.
Calculating from targets per hour instead:
If you know you need to collect 180,000 pages in 1 hour, and average response time is 3 seconds:
$$\lambda = \frac{180{,}000}{3600} = 50 \text{ req/s}$$
$$C = 50 \times 3 = 150 \text{ concurrent connections}$$
Response time is the most sensitive variable. A 3× increase in destination response time requires a 3× increase in concurrent connections to maintain the same throughput. Before purchasing a concurrency tier, test average response time on your actual target — not on a latency benchmark.
INFO-GAIN: Little's Law gives the average concurrent connection count. Your peak will be higher. For workloads with response time variance, use the 95th-percentile response time rather than the mean when sizing your concurrency requirement. If p95 response time is 8 seconds and your throughput target is 50 req/s, your sizing calculation becomes
50 × 8 = 400connections needed at peak — more than 3× the mean-based estimate.
-
What Is the Difference Between Connections, Threads, and Processes in a Proxy? {#connections-vs-threads}
Understanding these three terms prevents misreading proxy provider documentation and capacity specs.
| Term | What it is | Limits what |
|---|---|---|
| TCP connection | A bidirectional network socket between two endpoints | Network file descriptors; set by OS
ulimitand kernel TCP stack || Thread | An OS execution unit that can handle one or more connections | CPU and memory; each thread uses ~1–8 MB stack by default |
| Process | An OS process — one or more threads with isolated memory | Memory; proxy workers are often one process per CPU core |
| Concurrent connections (plan limit) | The number of simultaneous client connections your proxy credential allows | Enforced by the proxy provider's gateway, independent of your OS |
Thread-per-connection model (Apache httpd, older proxy servers):
One thread handles one connection. Concurrency is limited by the number of threads the server can spawn. Each thread consumes memory even when waiting for I/O. This model works at low concurrency but struggles at thousands of simultaneous connections — the C10K problem documented by Dan Kegel (1999) demonstrated that thread-per-connection servers hit OS limits before network limits.
Event-loop / async I/O model (Nginx, HAProxy, Node.js-based proxies):
A small number of threads (typically one per CPU core) handle thousands of connections through non-blocking I/O and an event loop. A single Nginx worker process can hold tens of thousands of concurrent connections with minimal memory overhead per connection. This is why modern high-throughput proxy infrastructure uses event-loop architectures.
What this means for you: When a proxy provider advertises "100 concurrent connections per plan," they're describing a gateway-layer policy limit enforced on your credential or IP — not a reflection of their server's thread capacity. The server can handle far more; the limit is a billing and fair-use control.
-
How Does Connection Pooling Reduce Concurrency Pressure? {#connection-pooling}
A TCP connection requires a three-way handshake before any data flows. For HTTPS (most proxy traffic), the TLS handshake adds additional round trips. Opening a new connection per request wastes time and briefly spikes concurrency during connection establishment.
Connection pooling reuses established TCP connections across multiple sequential requests. Instead of opening a new connection for each request, your client maintains a pool of open connections and checks out an idle one when needed, returning it after the response is received.
Effect on concurrency: A pool of 10 connections can serve 100 sequential requests without ever exceeding 10 simultaneous open connections — as long as each request completes before the next starts on that connection. If requests arrive faster than connections serve them, the pool fills up and requests queue.
HTTP keep-alive vs connection pooling:
HTTP keep-alive (RFC 7230 §6.3, IETF 2014) allows a TCP connection to remain open after a request completes, so the next request can reuse it without a new handshake. Connection pooling extends this: your client actively manages a pool of keep-alive connections, distributing requests across them.
Practical implementation:
```python
import requests
from requests.adapters import HTTPAdapter
session = requests.Session()
Pool 20 connections per host, max 50 total
adapter = HTTPAdapter(pool_connections=20, pool_maxsize=50)
session.mount("https://", adapter)
session.mount("http://", adapter)
proxies = {"https": "http://user:[email protected]:8080"}
All requests reuse pooled connections — no new handshake per request
for url in urls:
response = session.get(url, proxies=proxies, timeout=15)
```
Idle connection timeout: Most proxies close idle keep-alive connections after 30–120 seconds of inactivity. If your pool holds connections open longer than the proxy's idle timeout, the next request on a stale connection will fail and need to reconnect. Set your pool's
socket_optionsorkeepalive_timeoutto a value shorter than the proxy's idle timeout — or let the client detect and retry onConnectionResetError.
-
What Limits Concurrent Connections at Each Layer? {#what-limits-connections}
Concurrent connections in a proxy operation face limits at four independent layers. Hitting any one of them produces connection failures or queuing.
| Layer | What enforces the limit | Typical symptom |
|---|---|---|
| Proxy plan limit | Proxy provider's gateway (per credential or per IP) |
503 Service Unavailable, queuing,429 Too Many Requests|| Client OS file descriptors | OS
ulimit -n(Linux) / handle table (Windows) |OSError: [Errno 24] Too many open files|| Destination server | Target site rate limiting or connection limits |
429,503, or connection reset from destination || Network stack / NAT | Router/firewall connection tracking table | Silent connection drops, SYN packets not answered |
Proxy plan limit: The most common constraint for commercial proxy users. This is enforced by the proxy provider and is the spec you're paying for. It's independent of your hardware.
Client OS file descriptors: Each TCP connection uses one file descriptor on Linux. The default
ulimit -nvalue is 1024 on many systems — enough for development, not enough for high-concurrency production. To raise it:```bash
Check current limit
ulimit -n
Raise for current session
ulimit -n 65536
Raise permanently (Linux) — add to /etc/security/limits.conf
- soft nofile 65536
- hard nofile 65536
```
On Windows, the default handle limit per process is approximately 10,000 (adjustable via registry). Python's
asyncio/aiohttpon Windows also has a default limit of 100 simultaneous connections perTCPConnector— setlimit=0to remove it when needed.Destination rate limiting: Even if your proxy plan supports 500 concurrent connections, many target sites enforce their own per-IP rate limits. Since each proxy IP serves some number of your connections, the effective concurrency against any single destination is bounded by that site's tolerance per IP — typically a driver for using rotating residential proxies with large IP pools rather than a small set of static IPs.
-
How Do HTTP/1.1 and HTTP/2 Affect Proxy Concurrency? {#http-versions-and-concurrency}
The HTTP version your client uses affects how many TCP connections it opens and how proxy concurrency limits apply.
HTTP/1.1: One request per connection at a time. To achieve parallelism, clients open multiple connections. Browsers default to 6 connections per origin (RFC 7230). Scraping clients and HTTP libraries typically open as many as your pool size or OS limits allow.
HTTP/2: Multiplexes multiple requests over a single TCP connection using stream IDs (RFC 9113, IETF 2022). A single HTTP/2 connection can carry hundreds of simultaneous request/response exchanges. This dramatically reduces TCP connection overhead but doesn't eliminate proxy concurrency limits — the proxy counts each HTTP/2 stream as a concurrent unit of work, even though they share a TCP connection.
| | HTTP/1.1 | HTTP/2 |
|---|---|---|
| TCP connections needed for N parallel requests | N connections | 1 connection (N streams) |
| Proxy concurrency consumption | 1 connection per request | 1 stream per request (counted differently by provider) |
| Head-of-line blocking | Per connection | Eliminated at HTTP layer (TCP HoL blocking remains) |
| TLS overhead | 1 handshake per connection | 1 handshake per TCP connection |
| Proxy support | Universal | Not all proxy implementations support HTTP/2 |
Practical note for proxy users: Most residential and datacenter proxy endpoints operate as HTTP/1.1 proxies. Sending HTTP/2 requests through an HTTP/1.1 proxy triggers connection downgrade negotiation. Check your provider's documentation before relying on HTTP/2 multiplexing to reduce your effective connection count.
-
What Happens When You Exceed Your Concurrency Limit? {#exceeding-concurrency-limits}
Provider behavior when the concurrency limit is reached varies. Three common patterns:
Hard rejection (
503 Service Unavailable): The proxy gateway returns a503immediately when a new connection request would exceed the plan limit. Your client receives an error response with no body or a provider-specific error message. This is the cleanest behavior — you know immediately that you've hit the limit and can implement backpressure.Queuing: The gateway holds new connection requests in a queue until an existing connection closes. Requests eventually succeed but with higher latency. This is invisible to callers that don't time out — your application appears slower rather than failing. Queue depth and timeout values vary by provider.
Per-IP throttling: Some providers enforce concurrency per individual proxy IP rather than per credential. As your connections fill up the capacity of one IP, new requests are routed to other IPs in the pool. This is the typical behavior for residential proxy pools and is transparent to your application.
How to detect which behavior your provider uses:
```python
import concurrent.futures
import requests
import time
def probe_connection(url: str, proxy: str) -> tuple[int, float]:
start = time.monotonic()
try:
r = requests.get(url, proxies={"https": proxy}, timeout=30)
return r.status_code, time.monotonic() - start
except requests.exceptions.ProxyError as e:
return 503, time.monotonic() - start
proxy = "http://user:[email protected]:8080"
url = "https://httpbin.org/delay/5" # 5-second response to hold connections open
Open 20 simultaneous connections
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
futures = [executor.submit(probe_connection, url, proxy) for _ in range(20)]
results = [f.result() for f in concurrent.futures.as_completed(futures)]
status_codes = [r[0] for r in results]
latencies = [r[1] for r in results]
print(f"Status codes: {set(status_codes)}")
print(f"Latency range: {min(latencies):.1f}s – {max(latencies):.1f}s")
```
503s at
max_workers= hard rejection. Uniform 5-second latencies across all workers = pool has sufficient headroom. Rising latencies beyond 5 seconds = queuing behavior.
-
How Do You Configure Concurrent Connections in Your Client? {#how-to-configure}
Python requests + HTTPAdapter:
```python
from requests import Session
from requests.adapters import HTTPAdapter
session = Session()
pool_connections: number of distinct proxy hosts to maintain pools for
pool_maxsize: max connections per pool (= max concurrency per proxy host)
adapter = HTTPAdapter(pool_connections=1, pool_maxsize=100)
session.mount("https://", adapter)
session.mount("http://", adapter)
```
Set
pool_maxsizeto your target concurrency — not higher than your plan allows.Python aiohttp (async):
```python
import aiohttp
import asyncio
connector = aiohttp.TCPConnector(
limit=100, # Total max concurrent connections
limit_per_host=100, # Per-host max (set equal to limit for single-proxy configs)
ttl_dns_cache=300,
enable_cleanup_closed=True,
)
async def fetch_all(urls: list[str], proxy: str) -> list[dict]:
async with aiohttp.ClientSession(connector=connector) as session:
tasks = [session.get(url, proxy=proxy) for url in urls]
responses = await asyncio.gather(*tasks, return_exceptions=True)
return responses
```
Python httpx (sync and async):
```python
import httpx
Sync with connection limits
with httpx.Client(
proxies="http://user:[email protected]:8080",
limits=httpx.Limits(max_connections=100, max_keepalive_connections=50),
timeout=30.0,
) as client:
response = client.get("https://httpbin.org/ip")
```
Scrapy (spider-level settings):
```python
settings.py
CONCURRENT_REQUESTS = 100 # Total concurrent Scrapy requests
CONCURRENT_REQUESTS_PER_DOMAIN = 10 # Per destination domain
CONCURRENT_REQUESTS_PER_IP = 0 # 0 = no per-IP limit (use per-domain)
DOWNLOAD_DELAY = 0 # Seconds between requests to same domain
```
-
How Do You Scale Beyond a Single Proxy Concurrency Limit? {#how-to-scale}
When a single proxy credential's concurrency limit isn't enough, three patterns scale further:
Multiple credentials / sub-users: Some proxy providers allow you to create multiple sub-accounts or API keys, each with its own concurrency limit. Distributing your application's connections across multiple credentials multiplies available concurrency without changing provider plans.
Multiple proxy endpoints with a load balancer: Run a local proxy multiplexer (like HAProxy or a custom round-robin dispatcher) that distributes outbound connections across multiple upstream proxy endpoints. Each endpoint serves its plan-limit share of traffic; the local balancer fills them evenly.
```
haproxy.cfg snippet:
backend proxy_pool
balance roundrobin
server proxy1 proxy1.provider.com:8080 check
server proxy2 proxy2.provider.com:8080 check
server proxy3 proxy3.provider.com:8080 check
```
Rotating residential pools: For large-scale web data collection, residential proxy pools with large IP inventories effectively have higher aggregate concurrency because connections distribute across many IPs. Each IP has its own connection budget at the destination level, and the provider's gateway manages IP assignment. This scales both concurrency and destination-side per-IP rate limit headroom simultaneously. For context on how concurrency interacts with bandwidth consumption, a 100-connection plan at 1 MB/response averages 100 MB/s of bandwidth at 1-second response times — verify your bandwidth plan matches your concurrency plan.
Async over sync: Switching from threaded synchronous HTTP (one thread per connection) to async I/O (one event loop thread, many concurrent connections) typically increases the concurrency ceiling per application instance by 10–100×. A single
aiohttpworker withconnector limit=500can hold 500 simultaneous connections with the memory footprint of a few threads. See the proxy uptime and reliability guide for how concurrency interacts with proxy-side connection reuse and failure recovery.
-
Conclusion {#conclusion}
Concurrent connections are the capacity unit that actually governs proxy throughput at scale. Bandwidth and uptime matter — but if your concurrency ceiling is lower than what your workload demands, you'll hit a wall that neither metric predicts.
The calculation is straightforward: apply Little's Law with your actual target site response time, add a 25% buffer for variance, and size your proxy plan accordingly. For response times you can't measure in advance, assume the worst case rather than the best.
Two practical priorities once you have the right plan tier: implement connection pooling to reduce handshake overhead and stay within your concurrency budget, and configure your client's pool size to match your plan limit — not to exceed it. Exceeding it produces
503errors or silent queuing that inflates your effective response times, which in turn increases your concurrency requirement in a self-reinforcing loop.For operations that outgrow a single plan's concurrency limit, distributing connections across multiple credentials or switching to a large residential proxy pool with per-IP rate limit headroom at the destination are the cleanest scaling paths.
Related guides: What Is Proxy Bandwidth? · Proxy Uptime and Reliability · What Is a Forward Proxy? · Proxy Ports Explained · What Is a Residential Proxy?