Skip to content
recurl_ $install

← back to blog

When to use recurl vs a headless browser

recurl team · ·
scrapingautomationarchitecture

The first instinct of anyone who has hit a bot wall twice is to reach for a headless browser. Playwright, Puppeteer, Selenium — pick one, point it at the URL, and most of the problem goes away. The site loads, the cookies get set, the data shows up, your script keeps running.

It is also wildly overkill for most of the work it ends up doing. A headless browser costs hundreds of megabytes of memory and several hundred milliseconds of startup time per instance. It downloads images, executes ads, runs analytics beacons, and renders fonts you do not need. If the request you wanted was a JSON endpoint returning 200 bytes, you just spent two orders of magnitude more resources than necessary.

This post is the decision framework we use internally. It is not a sales pitch — recurl is a good answer for one shape of problem and a bad answer for another, and we want you to know which one you are looking at.

The four real categories of scraping work

In our experience, every scraping job falls into one of four buckets. The right tool depends on which bucket you are in.

Category 1: open APIs and unprotected sites. Most of the public internet, still, in 2026, is not behind aggressive bot management. Plain curl works. You are wasting time if you reach for anything fancier.

Category 2: TLS-fingerprint-protected APIs. The site checks your JA3/JA4 and serves a 403 if you do not look like a browser. The actual data, once you are past the gate, is plain JSON or HTML. No JavaScript challenge, no behavioural fingerprinting. This is the bucket where recurl’s impersonation layer is the right answer — you need to look like a browser at the TLS layer, but you do not need to be one above it.

Category 3: sites with JS challenges in front of static content. Cloudflare’s Turnstile, DataDome, Akamai’s JS challenge, et al. The challenge runs once, sets a cookie, and the cookie is valid for some window — minutes to hours, depending on the vendor. After the challenge, the actual page is static HTML or a simple JSON endpoint. This is the bucket where recurl’s JS preflight is the right answer — you need a browser exactly once, to satisfy the challenge, and then you can use the cookie for hundreds of subsequent requests.

Category 4: actually interactive sites. Single-page apps where the data only exists after a user clicks through a flow. Sites where you have to scroll to lazy-load content. Sites with multi-step JavaScript that interrogates the DOM at every step. This is the bucket where a headless browser is genuinely the right tool. There is no shortcut. You need a real browser doing real interactive work.

The mistake most teams make is treating category 2 or 3 as if they were category 4. They reach for Playwright because “the site needs JavaScript,” when in reality the site needs JavaScript once and then serves the same static content recurl would have fetched anyway.

How to tell which category you are in

A short diagnostic. Open the site in a real browser, open dev tools, and look at the Network tab.

  1. What is the actual content you want? If it is the first HTML response, you are in category 1, 2, or 3 depending on what blocks plain curl. If it is a later XHR or fetch response, look at that request’s URL. Often you can call that URL directly and skip the page rendering entirely.

  2. What blocks plain curl? Run curl from your terminal with the same URL and look at the response. If you get a 2xx, you are in category 1, and you are done. If you get a 403 with a short body, you are in category 2 — TLS fingerprint mismatch. If you get a 403 with a body that contains JavaScript and a token endpoint, you are in category 3 — JS challenge. If you get 200 but the content is missing data the browser would have, you are in category 4 — the data is rendered client-side.

  3. Is the data per-session or shared? If the cookie you would get from solving the challenge is valid across many subsequent requests, you only need to pay the JS cost once. recurl handles that automatically — solve once via Chromium, replay everything else through curl with the captured cookie.

That diagnostic, done once per target site, will save you a week of accidentally building a Playwright pipeline for what recurl could have handled in a curl line.

The cost differential

Some rough numbers from our own benchmarks. These are not formal — they vary with hardware, target site, and warmup state — but the ballparks are stable.

  • Plain curl request to an unprotected endpoint: 50-200 milliseconds end to end, single-digit megabytes of memory.
  • curl-impersonate request to a TLS-fingerprinted endpoint: 80-250 milliseconds, similar memory footprint.
  • recurl with a warm Chromium daemon, solving a JS challenge and replaying: 1-3 seconds for the first request, then back to curl cost for subsequent requests using the captured cookie.
  • Playwright spinning up Chromium per script run: 2-5 seconds startup, 400-800 megabytes resident memory while the script is running, then teardown cost on exit.

If your workload is one URL fetched once, the differential does not matter much. If your workload is ten thousand URLs fetched in parallel across a worker pool, the differential is the difference between running on a $20-a-month VPS and a $400-a-month one.

When recurl is the wrong answer

We are honest about the cases where recurl is not the right tool.

Multi-step interactive flows. Anything where you need to fill a form, click a button, wait for a transition, and then click another button. recurl can solve a single challenge per request, but it does not maintain a session across user-interaction steps. Use Playwright.

Heavy client-side rendering with dynamic data. Sites where the data is computed in the browser based on user interaction — financial dashboards with charts that respond to clicks, configurators that change pricing based on selected options. The “render once and scrape” model does not apply. Use Playwright.

Visual testing or screenshots. recurl returns response bodies. It does not render pixels. If you need an image or a PDF, you need a browser.

Sites where the bot management uses behavioural signals. Some advanced bot managers (PerimeterX, parts of Kasada) profile mouse movement, scroll velocity, and timing on subsequent requests. A one-shot JS preflight passes the first gate; the second-page request fails because there is no “human” behaviour to fingerprint. For these, you need a browser maintaining a session for the duration of the interaction.

When recurl is the right answer

The other side of the same coin.

APIs that 403 plain curl but return JSON. This is the canonical recurl case. Impersonate the TLS layer, get the JSON, move on.

Static content behind a Cloudflare or DataDome wall. Marketing pages, blog posts, documentation, public datasets. The challenge runs once, you keep the cookie, you scrape happily.

Monitoring and uptime checks. You want to know whether a vendor’s API is up. Plain curl returns a 403 you cannot interpret. recurl returns 200 if the API is actually working or surfaces a real authorization error if it is not. The diagnostic value alone is the win.

CI jobs that hit external services. The job needs to fetch a manifest, a release artifact, an API status. A headless browser in CI is a tax — recurl is a binary that runs in milliseconds.

Replacing a single Puppeteer script that does one HTTP call. Most legacy “we use Puppeteer for this” code paths exist because plain curl failed once and someone reached for the biggest hammer in the room. Half of those scripts can be replaced with a recurl one-liner and reclaim a few hundred megs of CI memory.

The mental model

We think of the choice as picking the cheapest tool that solves the problem you actually have, not the most powerful tool that solves every problem. Plain curl, then curl-impersonate, then recurl, then a headless browser, in that order of cost. Move up the stack only when the level below stops working — and move back down whenever you can. recurl handles the bottom three levels for you and gets out of the way at the top.

If after running that test you decide recurl is not enough, that is fine. Playwright is a great tool for category 4 work. Just please do not use it for category 1.