TLS fingerprint mimicry: the cat-and-mouse explained
The first thing every scraper learns is to set a custom User-Agent. The second thing they learn, usually after a frustrating afternoon, is that the User-Agent does not matter. The block they are looking at happened before the HTTP request was even parsed.
This post is about what is happening underneath: how TLS fingerprinting works, why curl is structurally distinguishable from a browser, what curl-impersonate does to close that gap, and why the gap keeps reopening. If you have ever wondered why your TLS handshake leaks “automation tool” before you have even spoken a syllable of HTTP, this is the explainer.
What is in the ClientHello
The TLS handshake starts with a ClientHello message. It is a structured binary record that the client sends to the server to begin negotiating an encrypted connection. The fields, roughly:
- TLS version. Which version of TLS the client supports.
- Cipher suites. A list of cipher suites, in preference order.
- Extensions. A list of TLS extensions the client supports — SNI, ALPN, supported groups, signature algorithms, key share, padding, and a dozen others.
- Supported groups. Which elliptic curves the client can use for key exchange, in preference order.
- Signature algorithms. Which signature schemes the client accepts.
- ALPN. Which application protocols the client supports — h2, http/1.1.
Every one of those lists has an order. Every one of those lists is populated by the client’s TLS library based on what the library supports and how it was configured. Different libraries produce different orderings. Different versions of the same library produce different orderings.
Chrome’s ClientHello, generated by BoringSSL, has a specific cipher list, a specific extension order, and ships with GREASE values (deliberate random padding designed to keep middleboxes flexible). Firefox’s ClientHello, generated by NSS, has a different cipher list and a different extension order. Safari’s, generated by SecureTransport, looks different again. curl’s ClientHello, generated by OpenSSL (or one of GnuTLS, mbedTLS, depending on build), looks nothing like any of them.
These differences are structural, not configurable. You cannot make OpenSSL produce a BoringSSL-shaped ClientHello by setting flags. The library does not work that way.
JA3: the first widely-deployed fingerprint
JA3 was published by Salesforce in 2017. It is a simple recipe:
- Take the TLS version, cipher list, extension list, supported groups, and elliptic curve point formats from the ClientHello.
- Concatenate them in a defined order with delimiters.
- MD5 the result.
The output is a 32-character hexadecimal string that uniquely identifies the shape of the ClientHello. Same client library and version, with the same config, produces the same JA3 every time. Two different clients almost never share a JA3.
The bot management industry built JA3 databases of every common automation tool. curl on Linux with OpenSSL 3.x has a known JA3. python-requests has a known JA3. Go’s net/http has a known JA3. The WAFs match incoming JA3 against the database and block on automation hits, often before any application-layer logic runs.
The defence, briefly: pad the cipher list, rotate it, randomise GREASE values. The problem with that defence is that real browsers do not do any of those things. A randomised JA3 is not “looks like Chrome.” It is “looks like nothing in particular,” which is itself a signal.
JA4: refinement, not replacement
JA4 was published in 2023 to address some operational issues with JA3 — primarily that the hash is a single MD5 string, which makes it hard to do partial matches. JA4 splits the fingerprint into components: protocol, TLS version, SNI presence, ALPN choice, cipher count, extension count, and so on. You can match on any subset.
Practically, JA4 catches a lot of clients that managed to evade JA3 through cipher-list padding, because JA4 can match on extension count and signature algorithm independently. JA4 also has companions — JA4H for HTTP fingerprinting, JA4L for connection-level features — that combine to make evasion considerably harder than it was in 2018.
The arms race continues. The fingerprints are public, the libraries are public, and updates happen on both sides. The bot management vendors are not standing still and neither is anyone else.
What curl-impersonate does
curl-impersonate is a patched curl that ships with a modified TLS stack designed to produce ClientHello messages indistinguishable from specific browser versions. The project is open-source, maintained by lwthiker, and the engineering is impressive: it links against patched BoringSSL or NSS instead of OpenSSL, replays the exact cipher list and extension order of the targeted browser, and matches the HTTP/2 SETTINGS frame and header order to boot.
The user-facing version is a binary per profile: curl_chrome, curl_chrome119, curl_firefox, curl_safari, curl_edge. You invoke them like regular curl. They speak the right TLS dialect.
Two real limitations are worth knowing:
- It is Linux and macOS only. The patched BoringSSL and NSS builds are not currently packaged for Windows. recurl picks this up and degrades gracefully on Windows by skipping the impersonation layer and going straight to JS preflight.
- It impersonates the TLS layer, not the JS layer. If a site uses a JavaScript challenge in addition to TLS fingerprinting, curl-impersonate gets through the first gate and stops at the second. recurl wraps curl-impersonate with a JS preflight that handles the second gate.
For TLS-only protection, curl-impersonate is the cleanest open-source bypass available. recurl uses it directly when escalation is needed.
Why the cat-and-mouse persists
Two structural facts keep this arms race going indefinitely.
First, browsers and curl have different goals. Browsers prioritise compatibility with the long tail of badly-configured servers and ship with conservative defaults that change slowly. curl prioritises being the easiest possible HTTP client and ships with library defaults from OpenSSL. The libraries are not converging. They are diverging — every BoringSSL update introduces another small structural difference that the JA3/JA4 databases pick up within days.
Second, the people running bot management are competent. They are paid to keep distinguishable signals distinguishable. When curl-impersonate ships a perfect Chrome 120 impersonation, the WAFs do not give up. They add a second check — maybe TCP-level features like initial window size, maybe HTTP/2-specific behaviour, maybe behavioural signals on the second request that an impersonator cannot replicate.
The right way to think about this is not “bypass” versus “no bypass” but a continuous curve. Plain curl is at one end: cheap, fast, blocked by almost any modern WAF. curl-impersonate is in the middle: handles most TLS-only protection, still blocked by JS challenges. A real browser is at the far end: handles almost everything, slow and resource-heavy.
recurl tries to put you on the cheapest point of that curve that still works for the request you are actually making. If plain curl gets through, you pay plain curl cost. If you need impersonation, you pay impersonation cost. If you need a browser, you pay browser cost, but only on the one request that needed it.
The practical takeaway
If you write scrapers or automation in 2026, stop spoofing User-Agents and assuming you have done the work. The User-Agent is the most ignored header in your request. The shape of your TLS handshake is the most attended-to. Either use a tool that handles the TLS layer for you or accept that a meaningful percentage of the public internet is going to return 403s.
We built recurl to make the first option a one-line change. The second option is fine too — sometimes a 403 really does mean the request is not welcome, and the right answer is to walk away. Just do not waste an afternoon spoofing headers when the problem was the bytes underneath.