The proxy doesn't choose your TLS fingerprint. The HTTP client does. We captured JA3 and JA4 hashes through Databay's residential, mobile, and datacenter gateways in Germany using three common HTTP clients, and the result was consistent across all runs: the client determines the hash and the proxy is invisible to the handshake. Cloudflare reads that hash alongside the IP, matches it against a known-automation list, and rejects you before the HTTP request layer. Buy a residential proxy, run python-requests through it, you still get blocked.
The 403 You Can't Explain
The proxy is doing its job. Your IP really is in Berlin, the reverse DNS resolves to a Deutsche Telekom consumer pool, the route looks like home broadband. None of that is what got you blocked. The block decision was made before the HTTP request reached the application. It was made during the TLS handshake, in the first 200 milliseconds of the connection, by a Cloudflare worker reading a fingerprint of your client's
ClientHello packet.Here is the claim, and we have the captures to back it up: the proxy doesn't choose your TLS fingerprint. The HTTP client does.
python-requests in Berlin still looks like python-requests. The IP changed. The handshake didn't. Cloudflare reads the handshake alongside the IP, matches the hash against a list of known automation tools, and rejects you with 403 while your carefully-crafted headers are still in flight. A clean residential IP does not override what the fingerprint says.If you bought your residential proxies because vendors said they would make your scraper look like a real user, you bought half a solution.
What a TLS Handshake Actually Says About You
ClientHello. Before any HTTP request goes out, before the server even knows what URL you want, your client sends a packet that declares: the TLS version it speaks (1.2 or 1.3), the cipher suites it supports and the order it prefers them in, a list of extensions, a list of supported elliptic curves (called supported groups in TLS 1.3), the signature algorithms it can verify, the EC point formats it understands, and an ALPN field listing which application protocols it can talk over the connection (typically h2 for HTTP/2 and http/1.1 as fallback).Most of those fields are not secret. They are the first thing the server sees, before encryption is even negotiated, in the clear. And here's the thing: real Chrome 124 emits a very specific, very recognisable combination of those fields. Real Firefox 128 emits a different one.
python-requests emits a third. Naked curl 8.18 emits a fourth. The combination is so client-specific that you can identify the library and major version from a single packet, with no help from the User-Agent header at all.JA3 was the first widely-adopted way to hash that combination into a string you could match on. John Althouse, Jeff Atkinson, and Josh Atkins published it open-source from Salesforce in 2017 (github.com/salesforce/ja3). The format: TLS version, ciphers, extensions, supported groups, EC point formats, joined with commas and dashes, then MD5-hashed. A 32-character hex string per client. Cloudflare, Akamai, and DataDome all consume JA3, and threat-intel feeds publish JA3 hashes for known automation tools and malware families the way they used to publish IPs. (We cover the practical implications of those products in our web scraping with proxies guide; for now, just know that the JA3 read is happening on every commercial anti-bot product you're going to encounter.)
In 2023 Althouse (now at FoxIO) shipped JA4 (github.com/FoxIO-LLC/ja4), a redesign that fixes the parts of JA3 that broke when modern browsers started shipping random padding extensions and GREASE values. JA4 is structured (a short prefix that encodes TCP-vs-QUIC, TLS version, SNI presence, cipher count, extension count, and the first ALPN value, plus two 12-character truncated SHA-256 hashes — one over sorted ciphers, one over sorted extensions and signature algorithms), it strips GREASE before hashing, and it stays stable across handshakes from the same client. JA4 is the one that matters in 2026.
JA3 has a problem with modern browsers. We'll get to that in section 7. For now: the handshake says who you are, and the next section shows exactly what it said about three different clients we ran through three of our gateways.
The Captures: Same Client, Three Proxy Types
The clients:
curl 8.18 with default options (no special flags, no impersonation), python-requests 2.32 on CPython 3.13 (the default install you get with pip install requests), and curl_cffi 0.7 running in chrome124 impersonation mode (a Python wrapper around curl-impersonate that compiles BoringSSL with Chrome's exact TLS settings). One capture per client per network path, three clients across four paths, twelve handshakes total. We also re-ran curl_cffi and curl three times each through the residential gateway as a sanity check on a separate question, GREASE behaviour, which we'll come back to in section 7. All runs on 2026-05-06, all proxies in geo=DE. The full methodology, including the exact commands, the reflector raw responses, the per-handshake byte-level diffs, and the SHA-256 of every JSON we collected, lives in the methodology appendix at the end of this post.One sentence on why we picked those three clients.
curl is the lingua franca of one-line scrapers and stack-overflow-driven debugging, python-requests is what most production scrapers actually use under the hood (yes, including the ones built on Scrapy), and curl_cffi is the popular Python library that exists specifically to mimic Chrome's TLS stack and bypass JA3-based gates. Three clients, three different fingerprint stories.Here's what we got.
The Hash Table
Twelve handshakes are below. Three clients, four network paths, the JA3 hash the server saw on the wire in columns two through five, and the JA4 fingerprint in column six. JA3 strings are truncated to the first eight and last four hex characters so the table fits the page; the unredacted hashes and the per-handshake JSON are in the methodology appendix.
| Client | Direct | Residential DE | Mobile DE | Datacenter DE | JA4 (all four paths) |
|---|---|---|---|---|---|
curl 8.18 | fae0e5d9…0363 | fae0e5d9…0363 | fae0e5d9…0363 | fae0e5d9…0363 | t13d2013h1_2b729b4bf6f3_e24568c0d440 |
python-requests 2.32 | a48c0d5f…5da1 | a48c0d5f…5da1 | a48c0d5f…5da1 | a48c0d5f…5da1 | t13d1812h1_85036bcba153_375ca2c5e164 |
curl_cffi 0.7 chrome124 | 57392bf6…ffa5 | e6e2506a…801e | 18e6fc3e…12ab | a577999f…e1a2 | t13d1516h2_8daaf6152771_02713d6af862 |
curl row. Read across the python-requests row. JA3 is byte-identical at every hop. The Databay gateways are forwarding the ClientHello packet untouched, which is the only way two different residential exits and a datacenter exit can all produce the same hash for the same client. There is no MITM, no re-encryption, no SSL bump rewriting cipher orders to match a profile.The
curl_cffi row is the only place JA3 disagrees with itself across columns. That is GREASE doing exactly what GREASE was designed to do, padding the handshake with a different random extension and cipher value on every connection so middleboxes that hard-match on the byte sequence break. JA4 strips GREASE before hashing, which is why the JA4 column is steady where JA3 wobbles. We come back to this distinction in section 7.Now flip the question.
The Hash Table, the Other Way Around
One residential gateway in Germany. Three clients. The proxy is held constant; the only variable is the binary on our end of the connection. Same five-second window, same target reflector, same SNI. We pulled JA3, JA4, the ALPN list the client offered, and the count of TLS ciphers in the
ClientHello directly out of the captured JSON for each run.| Client | JA3 | JA4 | ALPN offered | Ciphers offered |
|---|---|---|---|---|
curl 8.18 | fae0e5d9…0363 | t13d2013h1_2b729b4bf6f3_e24568c0d440 | http/1.1 | 20 |
python-requests 2.32 | a48c0d5f…5da1 | t13d1812h1_85036bcba153_375ca2c5e164 | http/1.1 | 18 |
curl_cffi 0.7 chrome124 | e6e2506a…801e | t13d1516h2_8daaf6152771_02713d6af862 | h2, http/1.1 | 16 (15 + 1 GREASE) |
python-requests says one thing, curl says a second thing, curl_cffi(chrome124) says a third, and a server-side reflector reads them out loud.Notice the JA4 prefix.
t13d2013h1 decodes directly: TLS 1.3, SNI present, 20 ciphers offered, 13 extensions offered, http/1.1 as the first ALPN value. t13d1812h1 is python-requests (18 ciphers, 12 extensions). t13d1516h2 is a Chrome-class client (15 non-GREASE ciphers, 16 non-GREASE extensions, h2 as the first ALPN). The prefix alone tells you the client class before you ever look at the two truncated SHA-256 hashes that follow. FoxIO publishes a community database of JA4 strings mapped to known clients and malware families at ja4db.com. You do not need to be Cloudflare to do the lookup. You need a regex.The IP is downstream of the fingerprint. Cloudflare Bot Management publishes a documented score from 1 to 99 that combines TLS fingerprint, HTTP/2 fingerprint, behavioural signals, and IP reputation; a known-automation JA4 alone is enough to push the score over the threshold that most customer rules block on. Once you are over the threshold, IP reputation only decides which flavour of rejection you get back: 403, interactive challenge, soft-throttle. A clean Berlin residential IP buys you a politer error message. It does not buy you a pass. Fix the JA4 first, then worry about the IP.
So how do the bot-management products actually use these hashes? Three vendors publish enough about their internals to answer.
What Cloudflare, Akamai, and DataDome Actually Do With This
Akamai. Akamai's Bot Manager Premier marketing page and the Akamai Security Research blog at akamai.com/blog/security describe a layered approach: device fingerprinting, browser fingerprinting, TLS handshake analysis, and behavioural telemetry. Akamai's published research on automated bot evolution states that adversaries who upgrade past JA3 by impersonating Chrome handshakes are detected at the HTTP/2 and behavioural layers, not at TLS alone. Akamai's framing: TLS is one signal, never the deciding one. Compared to Cloudflare, Akamai talks less publicly about specific fingerprint families and more about the fusion model.
DataDome. DataDome's threat-research and learning-center material at datadome.co/learning-center covers their sub-2-millisecond bot decisioning, citing over 5 trillion signals analysed per day across server-side and client-side detection. DataDome's published bot-detection methodology calls out TLS fingerprinting as a server-side signal evaluated in their first-pass scoring, alongside ASN reputation, header anomalies, and HTTP/2 frame ordering. They publish less raw fingerprint data than Cloudflare and frame the technique as one ingredient in a proprietary ensemble.
Position: Cloudflare publishes the most about JA4 and is the easiest vendor to test against because their thresholds and signal names are public. Akamai and DataDome treat the same techniques as proprietary. If you're benchmarking a client's TLS posture in 2026, point it at a Cloudflare-protected origin first; you'll see the fingerprint reflected in their published bot-score header. The other two will just block you and not say why.
Beyond JA3: HTTP/2 SETTINGS, ALPN Order, GREASE
GREASE (Generate Random Extensions And Sustain Extensibility) was specified in RFC 8701 by Adam Langley at Google, with one job: prevent middlebox ossification. The spec reserves a set of 16 cipher values, 16 extension types, and 16 supported_versions values, and tells browsers to pick one of each at random per connection and stuff it into the
ClientHello. Servers that follow the spec ignore the unknown values. Servers and middleboxes that hard-match on byte sequences break the moment a real Chrome shows up, which is exactly the point: if your network appliance breaks because Chrome added a value it didn't know about, your appliance was the problem.Chrome shipped GREASE by default around 2018, Firefox followed, and by 2020-2021 every major browser was emitting different cipher and extension values on every connection. JA3 hashes the cipher list and extension list directly. GREASE values land in those lists. JA3 of a real Chrome connection therefore changes on every handshake, defeating any signature-based blocklist that hard-matches on the JA3 hex string.
Our own captures show this directly. The 3-run sanity check we ran through the residential gateway with
curl_cffi(chrome124) produced these JA3 hashes on three back-to-back connections: run 1 d587229962034b5419d527a08260101a, run 2 5b914a9fecba541d99ae660ef12adc71, run 3 9919449f364b95a19ce0956ef138a847. Three different MD5s. The JA4 stayed identical across all three: t13d1516h2_8daaf6152771_02713d6af862. Same proxy, same client, same Berlin exit, three connections. Three JA3s. One JA4. JA4 strips GREASE before hashing; JA3 doesn't.HTTP/2 SETTINGS frame and ALPN order. When the connection upgrades to HTTP/2, the client sends a SETTINGS frame declaring values for HEADER_TABLE_SIZE, ENABLE_PUSH, MAX_CONCURRENT_STREAMS, INITIAL_WINDOW_SIZE, MAX_FRAME_SIZE, and MAX_HEADER_LIST_SIZE. The numeric values, the order they appear in, and which ones are omitted are all client-specific. Chrome 124 sends a particular combination, Firefox 128 sends a different one, Go's
net/http default sends a third. JA4H captures this layer.This is why anti-bot vendors who relied on JA3 in 2022 added JA4 and JA4H to their stacks in 2023-2024. GREASE made JA3 signatures useless against modern browsers and against any client (curl-impersonate, curl_cffi) that copies a real Chrome handshake. The vendors caught up. JA4 will hold longer because it was designed with GREASE awareness from the start. But the moment a popular impersonation library starts emitting deliberately-chaotic HTTP/2 SETTINGS to break JA4H, the JA4 generation will be 2018-era JA3 again. Bet on 2027 for that conversation. For the JS-runtime layer of the same detection stack — what a browser leaks above TCP, not on it — see the companion post: Headless Browser Detection in 2026.
What Works in 2026
1. Match a real browser TLS stack. The single biggest lever. Tools that compile a browser's TLS settings into something you can drive from code: curl_cffi 0.7 (Python wrapper around curl-impersonate, ships Chrome and Safari profiles, fast, ~1ms overhead per connection), tls-client 1.7 (Go library used as an HTTP server you call into, broadest profile coverage), Playwright with patched TLS via
playwright-extra, or a real headless browser via Playwright 1.49 / Puppeteer 23. Pick by workflow. Pure-API scraping with no JS execution: curl_cffi or tls-client. Anything that needs the page to render or fire XHR: Playwright. The tradeoff is performance versus fidelity: curl_cffi handles tens of thousands of requests a minute on one box, Playwright handles tens. Use the lowest-fidelity tool that clears the target.2. Use proxies with clean IP reputation. Necessary, not sufficient. The IP score classes, in ascending order of trust on protected sites: residential consumer ASNs, ISP (static residential), mobile carrier (highest trust on Instagram/TikTok specifically because of CGNAT, see section 9), datacenter (lowest, often pre-blocked at the edge). The deeper comparison is in residential vs datacenter proxies. The IP doesn't override the fingerprint. The fingerprint doesn't override the IP. They multiply.
3. Keep request behaviour plausible. Pacing (jitter the request interval, don't hammer at exact intervals), header consistency (the User-Agent has to claim a browser whose TLS stack you're actually emitting;
User-Agent: Chrome/124 with a python-requests JA4 is a contradiction the vendor sees in one rule), referrer chain (real users don't deep-link from nowhere), cookie handling (persist them, send them back, don't strip on retry). Production patterns are written up in our web scraping with proxies guide.The position I'd defend on Hacker News: most scraper failures in 2026 are TLS-fingerprint failures dressed up as IP problems, and the cottage industry of residential-proxy vendors who tell their customers otherwise are selling a clean IP as a panacea because they don't sell HTTP clients. Fix the client first. Buy the IP second. The order is not negotiable on serious targets.
When the Proxy Choice Does Still Matter
Datacenter IPs. Pre-filtered out by IP reputation feeds. Spamhaus DROP list, Project Honey Pot, vendor-internal blocklists that ingest known cloud and hosting ASNs (AWS, GCP, Hetzner, OVH, DigitalOcean, all the obvious ones). Most protected sites block on ASN class before the TLS handshake even terminates, which means your beautiful Chrome-impersonating client never gets a chance to show off. Datacenter is fine for unprotected APIs, government open-data portals, public RSS, and internal monitoring. Wrong tool for anything Cloudflare or Akamai is fronting.
Residential IPs. Real consumer ISP ASNs (Comcast, Deutsche Telekom, BT, Verizon FiOS), evaluated alongside the TLS fingerprint by Cloudflare/Akamai/DataDome scoring engines. The IP gets a baseline trust score from the ASN's reputation, then the TLS fingerprint adjusts up or down. Residential is the workhorse for protected e-commerce, classifieds, search engines, anywhere the target is using bot management but not at maximum aggression. The mechanics of how rotation and session stickiness actually work are covered in how residential proxies work.
Mobile IPs. Highest trust because mobile carriers route subscribers through CGNAT (Carrier-Grade NAT). One mobile IP fronts hundreds or thousands of real subscribers behind the carrier's NAT pool. Block one IP and you block real customers. Anti-bot vendors know this and tune their thresholds accordingly. Reserved for the hardest targets: Instagram and TikTok account-authenticated workflows, which is why we wrote up the best mobile proxies for Instagram as a separate playbook.
Synthesised position. Client first, IP second. Either alone fails on serious targets. A clean Chrome JA4 over a datacenter IP fails on ASN. A real residential IP carrying python-requests fails on JA4. The order I gave is the order I'd defend, but you can argue the inverse on Instagram specifically, where mobile-IP CGNAT trust is sometimes enough to carry an imperfect client. Anywhere else, client first.
Methodology
Tools and versions.
curl 8.18 (the build that ships with Windows 11, invoked with no flags beyond -s and --max-time 30), python-requests 2.32.5 running on CPython 3.13, and curl_cffi 0.7.4 in chrome124 impersonation mode. Both Python libraries were installed into a fresh virtualenv at tools/tls-capture/.venv with no other packages present.Reflectors.
https://tls.peet.ws/api/all for the full handshake JSON (JA3, JA4, ALPN, cipher list, raw extension order) and https://check.ja3.zone for an independent JA3 hash. We cross-checked the residential curl_cffi capture against a Wireshark dump of the same connection to verify the reflectors' view matched what was on the wire. It did.Commands. The actual one-liners we shipped were
curl.exe -s --max-time 30 --proxy $env:DATABAY_PROXY_RESIDENTIAL https://tls.peet.ws/api/all for the curl row, requests.get("https://tls.peet.ws/api/all", proxies={"https": proxy}, timeout=30) for python-requests, and curl_cffi.requests.get("https://tls.peet.ws/api/all", proxies={"https": proxy}, timeout=30, impersonate="chrome124") for curl_cffi. Direct runs dropped the proxy argument; everything else stayed the same.Limitations. Single date. A real measurement programme should aggregate over weeks. We claim the twelve captures show a structural fact (the client determines the fingerprint), not a statistical distribution. Single geography too: every proxy was a German exit. Different gateway nodes might emit slightly different L4 behaviour, but JA3 and JA4 are computed from the client's
ClientHello, and the gateway forwards that packet untouched, so we expect identical hashes regardless of region. The captures confirm that expectation. Three clients chosen for typicality, not exhaustiveness. We did not test Firefox-impersonating libraries, naked Go's net/http, Node's undici, Java's HttpClient, or others. The qualitative conclusion is well-established outside our captures.Disclosure. The post is on Databay's blog, the gateways used were Databay's own. Disclosure is the trust signal; concealment is the red flag.
Run the same commands and check our work.
Frequently Asked Questions
Does using a residential proxy hide me from TLS fingerprinting?
ClientHello packet untouched. Our captures show JA3 identical across residential, mobile, and datacenter exits when the client was held constant, and JA4 identical for all three clients across all four network paths. Changing the proxy changes your IP and that IP's reputation score. It does not change a single byte of the handshake. The fingerprint is yours to fix at the client. Buy a clean IP for IP reasons; do not buy it expecting a TLS rewrite.What is the difference between JA3 and JA4?
Can python-requests be configured to look like a real browser?
python-requests uses urllib3, which calls Python's ssl module, which is OpenSSL. Chrome ships BoringSSL with a different cipher order, different extensions, and different supported groups. No amount of header tweaking changes the handshake. Headers are downstream of TLS. The path forward is curl_cffi (Python wrapper around curl-impersonate), tls-client (Go library callable as a sidecar), or httpx with a custom transport adapter. Plain requests with custom headers is not enough on any protected target.Will Cloudflare block me purely on JA3 in 2026?
Is curl-impersonate legal to use?
curl-impersonate is open source under MIT. Imitating a browser's TLS handshake is not regulated; the handshake is public protocol metadata, not a trademark or a copyrighted work. Whether the requests you send through it violate a target site's terms of service is a separate question with a different answer per site and per jurisdiction. We covered the broader compliance picture in are proxies legal and the operational ethics in ethical web scraping. This is informational, not legal advice.Does Databay modify the TLS handshake at the gateway?
curl 8.18's JA3 of fae0e5d973c96ae1888b99538efa0363 is byte-identical whether the request goes direct or through any of the three German proxy zones. python-requests 2.32 produces a48c0d5f95b1ef98f560f324fd275da1 on every path, again byte-identical. The handshake on the wire is what the client emitted. We disclose this because the question is fair, the answer is verifiable from your own laptop, and concealment would be the more suspicious choice.