How to Use Proxies in Python: Requests, Scrapy, and Selenium

Sophie Marchand Sophie Marchand 15 min read

Learn how to configure proxies in Python using the requests library, Scrapy middleware, Selenium drivers, and async libraries with rotation and error handling.

Proxy Format Standards Every Python Developer Should Know

Before writing any integration code, you need to understand the proxy URL format that virtually every Python HTTP library accepts. The standard format is protocol://username:password@host:port — for example, http://user123:[email protected]:8080. Each component serves a specific purpose, and getting any of them wrong produces cryptic connection errors.

The protocol prefix determines how your client communicates with the proxy server. Use http:// for HTTP proxies (the most common type, which handles both HTTP and HTTPS target traffic via CONNECT tunneling), https:// if your proxy provider offers TLS-encrypted connections to the proxy itself, and socks5:// for SOCKS proxies that operate at the TCP level. A common mistake is assuming that the proxy protocol must match the target URL's protocol — it does not. An http:// proxy URL handles HTTPS target sites just fine by establishing a CONNECT tunnel.

The username:password segment is optional and omitted entirely when using IP whitelisting authentication. When credentials contain special characters — and proxy passwords frequently include characters like @, :, #, or / — you must URL-encode them before embedding them in the proxy string. In Python, use urllib.parse.quote with the safe parameter set to an empty string to encode the password. Failing to encode special characters is the single most common reason proxy authentication fails in Python applications.

The host can be either an IP address or a hostname, and the port is required. Most proxy providers assign specific ports for different proxy types or protocols — check your provider's documentation rather than assuming standard ports like 8080 or 3128.

Using Proxies with the Requests Library

Python's requests library is the default HTTP client for most Python projects, and its proxy support is straightforward. You pass a proxies dictionary to any request method, mapping protocol schemes to proxy URLs. The dictionary keys should be 'http' and 'https', each pointing to the proxy URL that handles that protocol's traffic.

For one-off requests, pass the proxies dictionary directly to requests.get or requests.post. This works but creates a new connection for every request, which adds latency and wastes proxy provider resources. For any workflow involving multiple requests — which is nearly every proxy use case — use a requests.Session object instead. Create a Session, assign the proxies dictionary to session.proxies, and all subsequent requests through that session automatically route through the proxy. The Session also maintains cookies, connection pooling, and authentication state, which matters for scraping workflows that depend on session continuity.

Timeout configuration is critical when working with proxies. Proxy connections have more points of failure than direct connections — your client connects to the proxy, the proxy connects to the target, and both connections can time out independently. Set the timeout parameter explicitly on every request. A value of 30 seconds is a reasonable starting point, but adjust based on your proxy's geographic distance from the target and the target site's response characteristics. Without explicit timeouts, a hung proxy connection can block your entire application indefinitely.

For HTTPS requests, the requests library automatically uses the CONNECT method to tunnel through HTTP proxies. You do not need to configure this manually — it happens transparently when the target URL uses HTTPS and the proxy URL uses HTTP.

Handling Proxy Authentication Edge Cases in Requests

While embedding credentials in the proxy URL works for most scenarios, certain edge cases require different approaches. Understanding these saves hours of debugging when the standard method fails.

If your proxy provider embeds session parameters in the username — a format like user-session-abc123-country-us — the colon and hyphen characters in the username can conflict with URL parsing. The requests library generally handles this correctly because it splits on the last @ symbol to separate credentials from the host, but some proxy URL formats can still trip up the parser. When you encounter authentication failures with complex usernames, construct the proxy URL programmatically using urllib.parse rather than string concatenation.

Some corporate proxy environments use NTLM or Kerberos authentication instead of basic auth. The standard requests library does not support these protocols natively. The requests-ntlm package adds NTLM support, and you configure it through a custom authentication handler attached to the session rather than through the proxy URL.

Connection retry behavior with authenticated proxies needs explicit handling. When a proxy returns a 407 (Proxy Authentication Required) response, the requests library does not retry with credentials automatically unless they were pre-configured in the session. If you receive unexpected 407 errors despite providing credentials, verify that the Proxy-Authorization header is actually being sent by enabling debug logging. Set the logging level for urllib3 to DEBUG, and the full request headers — including the authentication header — appear in the log output.

Scrapy Proxy Integration Through Middleware

Scrapy's architecture separates concerns through middleware, making proxy integration clean and maintainable. The proxy for each request is controlled by the request's meta dictionary — specifically the 'proxy' key. Setting request.meta['proxy'] to a proxy URL routes that individual request through the specified proxy, giving you per-request control over proxy selection.

The simplest integration is the built-in HttpProxyMiddleware, which Scrapy enables by default. To use a single proxy for all requests, set the http_proxy and https_proxy environment variables before running your spider, and HttpProxyMiddleware picks them up automatically. For more control, assign the proxy URL in your spider's start_requests method or in a custom middleware.

Building a custom downloader middleware for proxy rotation is the production-grade approach. Your middleware class implements the process_request method, which fires before every outgoing request. Inside this method, select a proxy from your pool — either cycling sequentially, selecting randomly, or using a weighted algorithm that favors proxies with higher recent success rates — and assign it to request.meta['proxy']. This architecture lets you swap rotation strategies without modifying spider code.

Scrapy's retry middleware integrates naturally with proxy rotation. Configure the RETRY_HTTP_CODES setting to include 403, 429, and 503 status codes. When a request fails with one of these codes, Scrapy re-enqueues it through the middleware pipeline, where your proxy middleware assigns a fresh proxy before the retry. This creates an automatic recovery loop: failed requests get retried through different IPs without any manual intervention. Set RETRY_TIMES to 3-5 to prevent infinite retry loops on genuinely blocked URLs.

Selenium Proxy Setup with Chrome and Firefox Drivers

Selenium proxies the entire browser session, including JavaScript execution, asset loading, and AJAX requests — making it essential for scraping JavaScript-heavy sites. The proxy configuration method differs between ChromeDriver and GeckoDriver (Firefox), and each has distinct limitations around authentication.

For Chrome through Selenium, proxy configuration happens through Chrome options. You add the --proxy-server argument with the format host:port. This routes all browser traffic through the specified proxy. The limitation: Chrome's command-line proxy argument does not support authentication credentials. If your proxy requires username:password authentication, you need an additional mechanism — either a Chrome extension that injects credentials, or a local proxy forwarder that handles authentication between your browser and the remote proxy.

For Firefox through Selenium, proxy configuration uses the Firefox profile's network settings — the same settings available in Firefox's GUI. You create a FirefoxProfile or FirefoxOptions object and set the network.proxy preferences: network.proxy.type to 1 (manual configuration), network.proxy.http and network.proxy.http_port for the proxy address, and similarly for SSL. Firefox profiles also support SOCKS proxy configuration and the DNS-over-proxy option, giving more flexibility than Chrome.

The selenium-wire package is the most practical solution for authenticated proxy use in Selenium. It wraps the standard Selenium WebDriver and adds proxy support with authentication, request interception, and response modification capabilities. You specify the proxy configuration — including credentials — in the seleniumwire_options dictionary when creating the driver. Selenium-wire works with both Chrome and Firefox and handles the CONNECT tunneling and authentication handshake transparently.

Async Libraries: aiohttp and httpx Proxy Configuration

Asynchronous HTTP libraries dramatically improve throughput for proxy-based workflows because they handle thousands of concurrent connections without threading overhead. Both aiohttp and httpx — the two dominant async HTTP libraries in Python — support proxy configuration, but their approaches differ.

In aiohttp, proxy configuration is per-request. You pass the proxy parameter to session.get or session.post with the proxy URL string. For authenticated proxies, aiohttp accepts a proxy_auth parameter with a BasicAuth object containing the username and password. Alternatively, embed credentials directly in the proxy URL. The critical detail: aiohttp creates a connector that manages the underlying connection pool, and the proxy setting applies at the request level, not the connector level. This means you can use different proxies for different requests within the same session — useful for implementing rotation directly in your async request logic.

httpx takes a more declarative approach. You configure proxies when creating the AsyncClient by passing a proxies dictionary, similar to the requests library's format. All requests made through that client route through the configured proxy. httpx also supports SOCKS proxies natively through the httpx-socks extension, while aiohttp requires the aiohttp-socks package for SOCKS support.

For both libraries, connection pool sizing is a critical tuning parameter when working with proxies. The default pool sizes (100 connections for aiohttp, configurable in httpx) may need adjustment based on your proxy provider's concurrent connection limits. Exceeding the provider's connection limit results in refused connections that look like proxy failures but are actually pool exhaustion. Match your client's connection pool size to your proxy plan's concurrency limit, and add a semaphore to enforce the ceiling if your application logic could otherwise exceed it.

Error Handling Patterns for Proxy-Based Python Applications

Production proxy code must handle a specific set of failure modes that do not exist in direct HTTP requests. Each failure type demands a different response — not all errors should trigger the same retry logic.

HTTP 403 and HTTP 429 responses indicate the target site has detected or rate-limited your proxy IP. The correct response is to rotate to a different proxy and retry. Do not retry the same proxy for the same domain — the block is IP-specific and persists. Track which proxy IPs have been blocked on which domains to prevent repeated failures. A dictionary mapping (domain, proxy_ip) tuples to block timestamps works well for this.

HTTP 407 (Proxy Authentication Required) means your credentials were rejected by the proxy provider. This is usually a configuration error rather than a transient failure, so retrying does not help. Log the error and raise an alert — it indicates wrong credentials, an expired account, or a credential format issue.

HTTP 502 and 503 errors can originate from either the proxy or the target server. If the error comes from the proxy gateway, rotating to a different proxy resolves it. If it comes from the target, the proxy is working fine and the target is genuinely unavailable. Differentiate by inspecting response headers — proxy gateways typically include identifying headers in their error responses.

Connection timeouts and ConnectionError exceptions need layered retry logic. First retry with the same proxy — the failure might be transient network issues. After two consecutive failures on the same proxy, switch to a different one. After exhausting three proxies, back off for 30 seconds before trying again. This escalating strategy prevents wasting proxy bandwidth on targets that are genuinely down while recovering quickly from transient proxy issues.

Rotating Proxies: List Cycling vs Backconnect Endpoints

Proxy rotation in Python follows one of two architectural patterns, and choosing between them affects your application's complexity, reliability, and cost.

The first pattern is client-side rotation from a proxy list. Your application maintains a list of proxy addresses and selects one for each request — either sequentially (round-robin), randomly, or based on performance metrics. This approach gives you complete control over rotation logic, lets you implement domain-specific proxy assignment (always use certain IPs for certain targets), and makes debugging straightforward because you know exactly which proxy handled each request. The downside is management overhead: you must monitor proxy health, remove dead proxies, and handle list updates. Build a ProxyPool class that tracks each proxy's success rate, last-used timestamp, and cooldown status. When selecting a proxy, filter out those currently in cooldown for the target domain.

The second pattern uses backconnect (or gateway) endpoints. You connect to a single proxy address, and the provider's infrastructure handles rotation automatically. Each request — or each session, depending on configuration — exits through a different IP from the provider's pool. This dramatically simplifies your code: no list management, no health monitoring, no rotation logic. You just point every request at the same gateway endpoint.

Backconnect endpoints are the right default for most Python applications. They eliminate an entire category of operational complexity, and providers like Databay optimize their gateway rotation algorithms far beyond what client-side logic typically achieves. Reserve client-side list rotation for scenarios where you need deterministic proxy assignment — for example, when specific IPs must access specific targets, or when you need to guarantee that a failed request is retried through a genuinely different IP rather than trusting the gateway to rotate.

Session Management for Stateful Proxy Workflows

Many proxy workflows require maintaining state across multiple requests — login sessions, pagination sequences, multi-step form submissions, and shopping cart operations. Mismanaging sessions through proxies is one of the most common sources of inexplicable failures in scraping applications.

The fundamental rule: a session must use the same proxy IP for its entire lifecycle. If you authenticate on a website through proxy IP A and then make subsequent requests through proxy IP B, the target site sees a different IP presenting session cookies issued to IP A. At best, the session is invalidated. At worst, the account is flagged for suspicious activity. Use sticky sessions from your proxy provider — these route all requests with the same session identifier through the same IP for a configurable duration, typically 10-30 minutes.

In the requests library, session stickiness means assigning a single proxy (with a sticky session identifier in the credentials) to a requests.Session object and using that session for all related requests. Do not create a new Session per request — that defeats the purpose. The session object maintains cookies, TCP connections, and the proxy binding together.

For Scrapy, pass session context through request.meta and the cb_kwargs parameter. Tag related requests with a session identifier, and have your proxy middleware use that identifier to select the same sticky proxy for all requests in the same session. This is cleaner than storing session state in the spider itself because the middleware encapsulates proxy management away from scraping logic.

Set explicit session TTLs in your application. Even if the proxy provider offers 30-minute sticky sessions, design your workflow to complete within 10-15 minutes. This provides a buffer for slow responses and prevents session expiration mid-workflow.

Testing and Validating Your Python Proxy Integration

Testing proxy integrations requires verifying multiple layers — connectivity, authentication, rotation, error handling, and target-specific behavior. Skipping any layer invites production failures that are difficult to diagnose under load.

Start with a connectivity smoke test. Make a simple GET request through your proxy to an IP echo service that returns the requesting IP address in its response. Compare the returned IP against your real IP. If they match, your proxy configuration is not being applied — check your proxy URL format, verify the protocol scheme, and ensure the proxy is reachable on the specified port.

Test authentication separately from connectivity. Intentionally use wrong credentials and verify you receive a 407 response. Then use correct credentials and verify a 200 response. This confirms your error handling distinguishes authentication failures from other errors. Test with credentials containing special characters (@, :, #) to verify your URL encoding is correct.

Validate rotation by making 10-20 sequential requests through your proxy and collecting the returned IP addresses. With a rotating proxy, you should see multiple distinct IPs. With a sticky session, you should see the same IP for all requests within the session TTL. If rotation is not working as expected, check whether you are reusing a connection — some proxy providers rotate per-connection rather than per-request, so connection pooling can suppress rotation.

Finally, test against your actual target site, not just echo services. Some targets behave differently with proxied traffic — they may challenge proxied connections with CAPTCHAs, serve different content, or enforce stricter rate limits. Run a small-scale test (50-100 requests) against your real target before scaling up. Monitor success rates, response times, and content validity to establish baselines that your production monitoring can compare against.

Frequently Asked Questions

How do I handle proxy authentication with special characters in the password in Python?
URL-encode the password before embedding it in the proxy URL string. Use urllib.parse.quote with safe set to an empty string to encode all special characters. For example, a password containing @ becomes %40 in the encoded URL. This prevents the URL parser from misinterpreting special characters as URL delimiters. Alternatively, some libraries like aiohttp accept credentials as separate parameters rather than embedded in the URL, which avoids the encoding issue entirely.
What is the difference between using a proxy list and a backconnect proxy in Python?
A proxy list requires your application to manage multiple proxy addresses, handle rotation logic, and monitor proxy health. A backconnect proxy provides a single gateway endpoint that automatically routes each request through a different IP from the provider's pool. Backconnect proxies simplify your code significantly — no list management or rotation logic needed. Use backconnect for most applications and reserve client-side list rotation for cases requiring deterministic proxy-to-target assignment.
Can I use proxies with Python's built-in urllib instead of the requests library?
Yes. Python's urllib supports proxies through the ProxyHandler class. Create a ProxyHandler with a dictionary mapping protocols to proxy URLs, build an opener with that handler using build_opener, and use the opener to make requests. However, the requests library and httpx offer significantly cleaner APIs, better error handling, and built-in session management. There is no practical advantage to using urllib for proxy-based workflows unless you cannot install third-party packages.
How do I rotate proxies in Scrapy without an external service?
Build a custom downloader middleware that maintains a list of proxy URLs. In the process_request method, select a proxy from the list using round-robin, random selection, or weighted scoring based on success rates, and assign it to request.meta['proxy']. Pair this with Scrapy's built-in retry middleware configured to retry on 403 and 429 status codes — retried requests pass through your middleware again and receive a fresh proxy. Store proxy health metrics in the middleware to avoid reusing proxies that have been recently blocked.
Why do my Selenium proxy connections work for HTTP but fail for HTTPS sites?
ChromeDriver's --proxy-server argument supports HTTPS tunneling via CONNECT, but misconfigurations can prevent it. First verify the proxy supports CONNECT on port 443. If using Firefox through Selenium, ensure you have set both the HTTP and SSL proxy preferences separately — they are independent settings. For authenticated HTTPS proxies, standard Selenium cannot inject credentials into the CONNECT handshake. Use selenium-wire, which intercepts the connection and handles authentication transparently for both HTTP and HTTPS traffic.

Start Collecting Data Today

35M+ IPs across 200+ countries. Pay as you go, starting at $0.50/GB.

Latest from the Blog

Expert guides on proxies, web scraping, and data collection.

Start Using Rotating Proxies Today

Join 8,000+ users using Databay's rotating proxy infrastructure for web scraping, data collection, and automation. Access 35M+ residential, datacenter, and mobile IPs across 200+ countries with pay-as-you-go pricing from $0.50/GB. No monthly commitment, no connection limits - start collecting data in minutes.