Why your curl scripts fail on Cloudflare-fronted sites in 2026
You wrote a curl line in 2019. It worked. You wrote the same curl line in 2026. It returns a 403 before the response body even loads. The site is up — you can see it in your browser, on the same machine, over the same network — but curl gets nothing. The internet did not break. The shape of “what an HTTPS request looks like on the wire” changed, and the bot management industry built an enormous, mostly-invisible apparatus around the difference.
This post is the explainer we wish someone had handed us in 2022. It covers what actually changed, why it changed, and what the 403 you are looking at really means.
The pre-2020 model: requests were requests
In the old world, an HTTPS request was effectively just a TCP connection, a TLS handshake, and an HTTP request. The TLS handshake said “I support these cipher suites and extensions, here is my SNI, please respond.” The HTTP request said “GET /foo HTTP/1.1, Host: example.com, User-Agent: curl/7.x”. The server could not meaningfully distinguish a curl that lied about its User-Agent from a real browser, because the bytes on the wire were the same after that header was rewritten.
That was the world curl shipped into and the world most scraper code was written against. Spoof the User-Agent, maybe spoof the Accept-Language and a Referer, and you were indistinguishable from Chrome.
Three things changed it.
Change 1: TLS fingerprinting (JA3, JA4, and friends)
The TLS handshake is not just “hi I am a client.” It is a structured ClientHello message that lists which ciphers the client supports, in what order, which extensions it offers, which elliptic curves it understands, and a half-dozen other fields. Real browsers update these lists on a roughly monthly cadence as their TLS libraries get patched. curl uses OpenSSL or one of its forks, which produces a stable, distinctive ClientHello that looks nothing like Chrome’s.
In 2017 the JA3 fingerprint hashed a concatenation of the ClientHello fields into a short string. JA4, which has largely replaced it, splits the fingerprint into multiple components so you can match on TLS version, SNI, ALPN, and the cipher list independently. Either way, the upshot is the same: a server-side WAF can look at the first packet of your TLS handshake and conclude with very high confidence that you are curl, not Chrome, before the HTTP request even reaches the application layer.
There is no header you can spoof that fixes this. The fingerprint is in the structure of the TLS handshake itself, and curl’s handshake is structurally different from Chrome’s. The bot management vendors have JA3 and JA4 databases for every common automation tool — curl, requests, Go’s net/http, Java’s HttpClient, the major scraping libraries — and they block by default.
Change 2: HTTP/2 fingerprinting
HTTP/2 was supposed to be a faster, multiplexed transport. It also turned into a fingerprinting surface. The HTTP/2 SETTINGS frame, which a client sends at the start of every connection, declares initial window size, max concurrent streams, and a handful of other parameters. Each client library picks specific values and orders them in a specific way. Chrome’s SETTINGS frame is recognisably different from Firefox’s, which is recognisably different from curl’s.
On top of that, HTTP/2 pseudo-headers (:method, :path, :scheme, :authority) have an order. Real browsers emit them in a specific order; client libraries emit them in their own, different orders. That order is part of the fingerprint.
The combined HTTP/2 fingerprint, sometimes called Akamai’s H2 fingerprint or just “H2 hash”, catches a lot of clients that managed to spoof TLS. A request that has a Chrome-looking TLS handshake but a curl-looking HTTP/2 SETTINGS frame is, from the WAF’s point of view, obviously not Chrome.
Change 3: behavioural challenges (the JS layer)
The first two changes are passive. They look at bytes you already had to send. The third is active: the server returns a small JavaScript program and waits to see whether your client executes it correctly.
Cloudflare’s Turnstile, Akamai’s JS challenge, DataDome’s bot defence, PerimeterX, Kasada, Imperva, and most other bot-management products use some variant of this. The challenge typically: runs in the browser, makes some measurements (canvas fingerprint, timing of events, presence of specific Web APIs), produces a token, and submits the token back. The site then sets a cookie that grants access for some period.
curl does not execute JavaScript. There is no way to make it execute JavaScript. The only way to satisfy a JS challenge is to either run the JS yourself in something that looks browser-shaped, or to obtain a valid cookie through some other means.
What the 403 means
When you get a 403 from a Cloudflare-fronted site, one of three things is happening:
- Passive fingerprint mismatch. Your TLS handshake or HTTP/2 SETTINGS frame matches a known automation tool. The request was rejected at the edge before it ever reached the origin. No challenge was offered because you did not look like a browser that could solve one.
- Active challenge not completed. You look browser-shaped enough that a challenge was offered, but you did not execute it. The response often contains the challenge HTML and a token endpoint. If you replayed in a browser you would see the spinner-and-checkbox UX.
- Real authorization failure. Sometimes a 403 is actually a 403 — you do not have access. This is rarer than the first two in practice but it happens and it is worth checking.
The way to distinguish them is to look at the response body and the response headers. A passive fingerprint rejection is short, often a single line of HTML or JSON. An active challenge response is several kilobytes of obfuscated JavaScript. A real authorization failure usually has a meaningful body explaining what is missing.
What recurl does about it
recurl is opinionated about two things. First, the cheapest layer should run first. Plain curl is fast, has no side effects, and works on the majority of sites. Most of the internet is still not behind aggressive bot management, and paying browser cost on every request is silly.
Second, the escalation should be automatic and the layers should be debuggable. When recurl gets a 403, it inspects the response, decides which kind of failure it is looking at, and picks the next layer accordingly. A passive-fingerprint failure escalates to curl-impersonate, which patches the TLS handshake and HTTP/2 SETTINGS to match a real browser. An active challenge escalates to a headless Chromium preflight that solves the challenge and replays through curl with the captured cookies. A real authorization failure does not escalate — it returns the 403 to your script, because retrying with a different fingerprint is not going to fix it.
Run recurl --recurl-debug against a failing URL and you will see which layer ran, why each escalation triggered, and what the response looked like at each step. That diagnostic output is half the reason we built the tool. The other half is so we never have to write another Puppeteer script for what should have been a curl call.
The honest part
This is an arms race and we are on one side of it. The bot management vendors are competent, they update their detectors regularly, and a bypass that works today might not work in three months. We track that and ship updates. The MIT licence and public detector library mean the work compounds across users — every URL you file as a failing test case is one more pattern in the catalogue.
If you are doing automation on the public internet in 2026, the gap between “I can see this site” and “my script can fetch it” is not closing on its own. Tools that close it for you are how you keep your weekends.