How TLS Fingerprinting Works: JA3, JA4, and the ClientHello

How TLS Fingerprinting Works: JA3, JA4, and the ClientHello

Written by

in

Before a single byte of HTTP travels, before any JavaScript runs, before a cookie is set, a web client has already told the server a great deal about itself. The very first message of a TLS connection, the ClientHello, is sent in the clear, and the exact way it is built is specific to the software that built it. TLS fingerprinting is the practice of reading that first message and turning it into a short, stable identifier for the client. A real Chrome browser, a Python script using requests, and a piece of malware calling home to its controller each produce a different shape of ClientHello, and that shape gives them away. This post takes the idea apart from the packet up: why the handshake is a fingerprint at all, how the original JA3 method computed one, why JA3 broke, how JA4 fixed it, and what all of this means for catching bots and malware versus the privacy of ordinary users.

Why the handshake is a fingerprint

A TLS connection opens with a negotiation. The client speaks first with a ClientHello, a plaintext message that lists everything the client is willing and able to do so the server can pick a common option. That list is not a single fixed value. It is an ordered set of choices, and every TLS library makes those choices a little differently.

The ClientHello carries, among other things, the highest TLS version the client supports, the ordered list of cipher suites it offers, a list of extensions, the elliptic curves it will accept for key exchange, and the elliptic curve point formats it understands. None of this is secret. It cannot be, because the server needs to read it to agree on parameters before encryption is set up. The values themselves are mundane. What identifies the client is the combination and the order: which ciphers, in which sequence, which extensions, advertised which way.

This matters because the choices come from the TLS stack, not from the application on top of it. OpenSSL, BoringSSL, the schannel library on Windows, the network stack inside Chrome, and the Go standard library each assemble a ClientHello in their own house style. So the fingerprint reflects the runtime, not the label the client puts on itself. A script can set its HTTP user agent header to the exact string a real Chrome sends, but the header is added later, inside the encrypted HTTP request. The TLS handshake underneath was already built by Python’s stack, and it does not look like Chrome at all. That gap between what a client claims and what its handshake reveals is the entire reason the technique is useful.

It helps to picture where this sits in the connection. The TCP handshake completes, then the client sends the ClientHello as the very first TLS record. The server reads it, replies with a ServerHello that picks one cipher and one set of parameters, both sides derive keys, and only then does the channel turn encrypted. So the ClientHello is the last fully readable thing the client ever sends on a healthy connection. A passive observer between the two parties cannot read the page that is requested or the data that comes back, but it can read that opening message in full. TLS fingerprinting is the discipline of getting the most identity out of that one readable message.

The client picks a user agent string to tell you what it is. The handshake tells you what it really is, and the handshake was sent before the client had a chance to lie.

How JA3 computes a TLS fingerprinting hash

The first widely used method for this came from Salesforce in 2017 and is called JA3. Its idea is simple enough to follow by hand. JA3 reads five fields out of the ClientHello, always in the same order:

  • TLS version, the version number from the handshake.
  • Cipher suites, the ordered list of ciphers the client offers.
  • Extensions, the list of TLS extensions, in the order they appear.
  • Elliptic curves, the supported curves, sometimes called supported groups.
  • Elliptic curve point formats, the point format list.

JA3 takes the decimal values from each field, joins the values inside a field with a dash, and joins the five fields with a comma. The result is one long string in a fixed layout: TLSVersion,Ciphers,Extensions,EllipticCurves,EllipticCurvePointFormats. A real example of that intermediate string looks like this:

769,47-53-5-10-49161-49162-49171-49172-50-56-19-4,0-10-11,23-24-25,0

Here 769 is the TLS version, the long middle run is the cipher list, 0-10-11 is the extension list, 23-24-25 is the curve list, and the trailing 0 is the single point format. If a field is empty, JA3 keeps the comma and leaves the field blank, so a client with no extensions produces a string like 769,4-5-10-9-100-98-3-6-19-18-99,,, with the empty positions preserved. That last detail is part of the fingerprint too, because the absence of extensions is itself a property of the client.

The final step is a hash. JA3 runs the whole comma joined string through MD5 and keeps the 32 character result. The string above becomes:

769,47-53-5-10-49161-49162-49171-49172-50-56-19-4,0-10-11,23-24-25,0
  -> ada70206e40642a3e4461f35503241d5

MD5 is a poor choice for security where collisions matter, but here it is only a compact label for a string, so its weakness is not the point. The point is that the same client software, run again, produces the same five fields in the same order and therefore the same hash. A different client produces a different one.

It is worth being precise about what JA3 deliberately leaves out. It does not read the server name indication, the actual hostname being requested, even though that field is present and readable in many ClientHellos. It does not read the contents of every extension, only which extensions are present. And it does not touch anything above TLS. The aim is a fingerprint of the client stack, not of the destination or the request, so two connections from the same software to two different sites share a JA3 hash. That is the property that makes it useful for spotting one tool across many targets, and it is also why JA3 alone cannot tell you what the client was doing, only what it was.

GREASE and the server side twin

Two refinements are worth knowing. First, modern clients inject GREASE values, which are deliberately reserved placeholder numbers sprinkled into the cipher and extension lists to keep servers from getting rigid about what they accept. JA3 ignores GREASE values entirely so that a client which uses GREASE still maps to one stable hash rather than a new one each connection. Second, there is a mirror method called JA3S that fingerprints the server’s response from its version, chosen cipher, and extensions. Pairing the client JA3 with the server JA3S describes a whole conversation, which is handy when the same client always talks to the same controller.

Where JA3 is genuinely useful

The reason security teams cared about JA3 is that it identifies software by how it speaks, not by where it connects or what it claims. That property has three concrete uses.

Malware and command and control detection. A piece of malware is usually built against one TLS library and offers one fixed handshake. It does not matter if the malware rotates its server IP every hour, uses domain generation algorithms to invent new hostnames, or even hides its controller behind a public service. The JA3 hash of the malware’s own handshake stays the same. Salesforce documented that the Trickbot sample consistently produced the JA3 hash 6734f37431670b3ab4292b8f60f29984, which means a sensor can flag that traffic by how it connects rather than by chasing an endless list of addresses. Threat intelligence feeds publish lists of JA3 hashes tied to known malware families for exactly this.

Bot detection by mismatch. The strongest signal is a contradiction. When an HTTP request carries a user agent header that says Chrome 120, but the TLS handshake under it matches the fingerprint of Python’s requests library or a plain curl build, the two stories do not agree. A browser stack and a scripting stack assemble different ClientHellos, so a request that claims to be a browser while handshaking like a script is almost certainly automated. A web application firewall or content delivery network can compare the claimed client to the observed fingerprint and act on the gap.

Allow listing in locked down networks. In an environment where only a known set of applications should ever make outbound TLS connections, you can record the fingerprints of the approved software and alert on anything else. A new fingerprint is a new piece of software talking, which is worth a look.

If you want the broader picture of how servers profile clients across many layers, our writeup on how browser fingerprinting works covers the JavaScript and HTTP signals that sit above the handshake. TLS fingerprinting is the layer beneath all of that, the one that fires first.

Why JA3 broke

JA3 had a structural weakness, and two separate forces pushed on it until it gave way. The weakness is that JA3 reads the extension list in the order it appears in the ClientHello. Order is part of the hash. So anything that changes the order changes the hash, even when the client’s actual capabilities are identical.

The first force was an evasion that costs almost nothing. Because order drives the hash, a client that wants to dodge a JA3 blocklist only has to shuffle its extension list. The set of extensions is the same, the handshake still works, but the bytes are reordered and the hash is new. For an attacker this is close to free. A list of sixteen extensions can be arranged in sixteen factorial ways, which is more than twenty trillion orderings, so a single piece of software can wear an effectively unlimited number of JA3 faces. A blocklist built on a fixed hash cannot keep up with a client that changes the hash on a whim.

The second force was not an attack at all. Starting around early 2023, with the rollout landing in Chrome version 110 and the change merged a release or two earlier, Chrome began randomizing the order of its TLS extensions on purpose. The stated reason was healthy: by shuffling the order on every connection, Chrome forces servers and middleboxes to stop depending on the exact byte layout of its ClientHello, which keeps the wider TLS ecosystem flexible. The side effect was that the single common JA3 hash for Chrome shattered. Overnight a huge share of legitimate traffic stopped matching its old fingerprint, and the same twenty trillion orderings that helped attackers now scattered ordinary users too. JA3 went from a useful client label to noise for the most common browser on the internet.

How JA4 fixes the order problem

JA4, from FoxIO, is the answer to that breakage, and the core fix is almost obvious once you see the failure. If order is the problem, remove order from the parts where it is not meaningful. JA4 sorts the cipher list and sorts the extension list before hashing them. A shuffled ClientHello and an unshuffled one, with the same underlying capabilities, sort to the same sequence and therefore produce the same fingerprint. The evasion of reordering, and Chrome’s deliberate randomization, both stop mattering because the sorted output is identical either way.

JA4 also changes the shape of the output to be readable rather than a single opaque hash. A JA4 fingerprint comes in three parts joined by underscores. A real example:

t13d1516h2_8daaf6152771_b186095e22b6

The first segment is human readable metadata. Reading it left to right: t means TLS over TCP, 13 means TLS version 1.3, d means a server name indication was present so this is a connection to a named domain, 15 is the count of cipher suites with GREASE excluded, 16 is the count of extensions, and h2 is the first and last characters of the negotiated application layer protocol, here HTTP/2 by way of ALPN. The second segment, 8daaf6152771, is a truncated SHA256 hash of the sorted cipher list. The third segment, b186095e22b6, is a truncated hash of the sorted extensions, leaving out the ones that are themselves variable, plus the signature algorithms in their original order.

Two design choices stand out. Sorting is what defeats the shuffle, both the malicious kind and Chrome’s well meaning kind. Adding ALPN is new information that JA3 never captured, since the negotiated protocol is another property of the client stack. And because the leading segment is plain text, an analyst can group and hunt on individual pieces, for example every TLS 1.3 client that offers a certain count of extensions, without decoding a hash.

The readable prefix earns its keep in practice. Suppose a feed of traffic is dominated by ordinary browsers and you want to find the odd one out. With a single opaque MD5 you can only test for exact matches against a known list. With JA4 you can ask coarser questions directly off the string: show every client that negotiated TLS 1.3 with no server name indication, which is unusual for a browser visiting a website and common for automated tooling. The counts and flags in that first segment give you a way to slice traffic before you ever compare a hash, so a new variant that has never been catalogued can still stand out by its shape. JA4 also extends to QUIC and HTTP/3, where the same handshake idea rides on UDP, which is something the older method was never built to cover.

JA4 is one of a family

JA4 by itself fingerprints the TLS client. FoxIO published it as the lead member of a suite called JA4+, where each method fingerprints a different part of a connection: JA4S for the server’s TLS response, JA4H for the HTTP client, JA4X for the certificate, JA4SSH for SSH sessions, and several more for TCP, latency, and DHCP. The JA4X variant works over the X.509 certificate the server presents, and if you want to see what fields live inside one of those certificates, our free X.509 certificate decoder breaks a certificate down into its issuer, validity dates, extensions, and public key. The stated uses for the suite read like a defender’s job list: scanning for threat actors, malware detection, session hijacking prevention, grouping related actors, and detecting reverse shells, among others. The JA4 TLS method itself is published under a BSD license, while the rest of the suite carries the FoxIO license that allows internal use but asks for a license to resell.

The privacy and evasion angle, told honestly

Everything that makes TLS fingerprinting good at catching bots also makes it a tracking tool. A fingerprint identifies a client before any cookie is set and survives a private browsing window, since it comes from the TLS stack rather than from stored state. Two people on the same network running the same browser build share a fingerprint, which limits how precisely it pins down one person, but it still sorts traffic into groups by software without anyone’s consent. This is the same tension that shows up across client identification, and it is the reason Chrome’s randomization was framed as ecosystem hygiene rather than as an anti tracking feature, even though it carried both effects.

Evasion is real and worth naming plainly. There exist tools that rebuild a script’s handshake to match a real browser’s, so that a request claiming to be Chrome also handshakes like Chrome and slips past a mismatch check. The existence of these tools is the reason no serious defender treats a fingerprint as proof on its own. A fingerprint is one signal among several, strong because it fires early and is hard to fake casually, weak because a determined party can copy a known good handshake. This post will not walk through how to build such a forgery. The defensive takeaway is the useful one: combine the fingerprint with other evidence, watch for the contradiction between the claimed client and the observed one, and treat a perfect browser fingerprint from an unexpected source as a question rather than an answer. For more terms in this area, see our web security glossary.

The assumption that breaks

Step back from the cipher lists and the hash construction and one assumption is doing all the work. A client connecting over TLS assumes that encryption hides it. The padlock is up, the channel is private, the payload is unreadable to anyone in the middle. All of that is true for the contents of the conversation. It is not true for the handshake that set the conversation up. The ClientHello is sent in the open by necessity, and its construction is a property of the software, so the very act of asking for a private channel announces who is asking.

That is the gap that JA3 and JA4 read. The client believed the encrypted channel covered its identity, and it was wrong, because the metadata of the handshake identifies the software before a single encrypted byte is exchanged. A real browser, a script wearing a browser’s name, and a malware sample each make the same request for privacy in a different accent, and the accent is the fingerprint. Testing that assumption, the quiet belief that the tunnel hides the traveler, is exactly where the signal lives.

Frequently asked questions

What is TLS fingerprinting?

It is the practice of identifying client software from the way it builds its first TLS handshake message, the ClientHello, which is sent in the clear before any HTTP or JavaScript. Methods like JA3 and JA4 read fields such as the TLS version, the offered cipher suites, the extension list, and the supported curves, then turn that combination into a short stable identifier. Because the values come from the TLS library rather than the application, the fingerprint reflects the real runtime even when the client sets a misleading user agent string.

How is a JA3 hash computed?

JA3 reads five fields from the ClientHello in a fixed order: TLS version, cipher suites, extensions, elliptic curves, and elliptic curve point formats. It joins the values inside each field with dashes and the five fields with commas, producing a string like 769,47-53-5-10,0-10-11,23-24-25,0, then runs that string through MD5 to get a 32 character hash. Empty fields keep their commas, and GREASE placeholder values are ignored so a client still maps to one stable hash. The method comes from Salesforce, documented at github.com/salesforce/ja3.

Why did JA3 stop working and how does JA4 fix it?

JA3 hashes the extension list in the order it appears, so reordering the extensions changes the hash without changing the client. Attackers exploited that to dodge blocklists, and from Chrome 110 in 2023 Chrome began randomizing its extension order on purpose, which shattered the common Chrome JA3 hash. JA4, from FoxIO, sorts the cipher and extension lists before hashing so a shuffled and an unshuffled handshake produce the same fingerprint. The technical format is published at github.com/FoxIO-LLC/ja4.

How does TLS fingerprinting catch bots and malware?

Malware usually offers one fixed handshake from the TLS library it was built with, so its fingerprint stays the same even when it rotates server IP addresses or hostnames, which lets sensors flag it by how it connects. For bots, the strongest signal is a mismatch: a request whose user agent claims to be a browser but whose handshake matches a script like Python requests or curl is almost certainly automated. Fingerprints are one signal among several, since evasion tools exist that copy a real browser handshake.