What Actually Happens In A Kernel Use After Free

Q: Why does the SLUB allocator make use after free bugs exploitable?

SLUB serves objects from per size caches like kmalloc-256, and objects of the same size share slabs. It threads freed slots onto a free list stored inside the freed objects, and allocation pops from the head, so the most recently freed slot is the next one returned. An attacker frees the victim then immediately allocates a same size object to reclaim that exact slot with reliable timing.

A kernel use after free is one of the few bugs that can turn an ordinary local user into root without ever touching a password file. The shape of the bug is simple to state. Some piece of kernel code frees an object, then keeps using a pointer to it. The allocator, meanwhile, hands that same memory to a different object the attacker controls. From the moment of reuse the kernel is reading and writing through a pointer that no longer means what it thinks it means. This post goes to the metal: how the kernel heap is laid out, what a freed object actually looks like in memory, the exact instant a freed slot gets reused by an attacker chosen object, and why that single overlap becomes a privilege escalation primitive rather than just a crash.

The kernel heap is not one big pool

Userspace programmers picture the heap as a single arena that malloc carves up. The kernel works differently, and the difference is the whole reason these bugs are exploitable in the way they are. The kernel allocates small objects through the SLUB allocator, which does not manage one pool. It manages many small pools, each one dedicated to objects of a particular size.

When kernel code calls kmalloc(200, GFP_KERNEL), the request is rounded up to the next size class and served from a cache named for that class. There is a kmalloc-256 cache, a kmalloc-512, a kmalloc-1024, and so on. Each cache owns a set of slabs, where a slab is one or more contiguous pages of memory sliced into equal sized object slots. A kmalloc-256 slab built from a single 4096 byte page holds sixteen slots of 256 bytes each. Every object that the kernel allocates at that size lands in one of those slots.

This matters because objects of the same size share a cache. A network buffer, a filesystem structure, and a credential record can all be 256 bytes, and if so they compete for slots in the same kmalloc-256 slab. That shared residency is the soil every use after free grows in. To reuse a freed object as something dangerous, an attacker needs the kernel to place the dangerous object in the slot that was just vacated. Same size, same cache, same slab. The allocator is doing exactly its job. The attacker is just choosing what fills the hole.

What a freed object actually looks like

Here is the detail most explanations skip. When SLUB frees an object, it does not zero it and it does not hand it back to the page allocator. It threads the slot onto a free list, and the free list lives inside the freed objects themselves. SLUB writes the address of the next free object into the first bytes of the slot being freed. The freed memory becomes a node in a singly linked list of holes.

kmem_cache_cpu.freelist  -->  slot A
slot A: [ next = &slot C ][ stale leftover bytes ... ]
slot C: [ next = &slot D ][ stale leftover bytes ... ]
slot D: [ next = NULL    ][ stale leftover bytes ... ]

Two facts fall out of this layout. First, a freed object still contains its old contents past the embedded free pointer, so a dangling pointer can often still read meaningful stale data. Second, allocation is a pop from the head of this list. The per cpu structure kmem_cache_cpu holds a freelist field pointing at the first free slot. To allocate, SLUB reads the next pointer out of that slot, sets the free list head to it, and returns the slot. To free, it writes the current head into the slot and points the head at the slot. Allocation is last in, first out. The most recently freed object of a given size is the very next one handed out.

That ordering is a gift to an attacker. Free the victim, then immediately allocate an object of the same size, and you get the victim’s slot back with high reliability. No guessing, no spray needed in the simplest case. The allocator’s own efficiency hands the freed slot straight back.

The exact moment of reuse in a kernel use after free

Now we can describe a kernel use after free with precision instead of hand waving. Walk the timeline of a single slot.

At time one the kernel allocates object X into slot S and stores a pointer to it somewhere, say a field in a longer lived structure. The pointer is the reference.
At time two some code path frees X. SLUB threads slot S onto the free list. The reference the kernel kept is now dangling. It still points at slot S, but slot S is officially free memory.
At time three the attacker triggers an allocation of an object Y of the same size class. SLUB pops slot S off the free list and returns it. Object Y now lives in slot S, and crucially the attacker controls the bytes written into Y.
At time four the kernel uses the dangling reference, believing it still points at object X. It reads or writes through that pointer. But the bytes there are now object Y, filled by the attacker.

The reuse at time three is the hinge. Before it, the dangling pointer points at junk and the worst case is a crash. After it, the dangling pointer points at a structure whose contents the attacker chose. The kernel is about to interpret attacker data as a trusted object. Everything that makes this a privilege escalation rather than a denial of service happens in the gap between the kernel’s mental model, which says slot S is still object X, and the physical reality, which says slot S is now object Y.

A use after free is not a memory error in the usual sense. It is a disagreement about ownership. Two objects believe they own the same bytes, and the attacker controls which belief the CPU acts on.

Heap grooming: making the right object land in the hole

In a real bug the freed slot and the reuse rarely line up by luck, so attackers shape the heap first. This is heap grooming, sometimes called heap feng shui. The goal is to arrange the free list so the slot you are about to free, and then reclaim, is predictable.

A common move is to allocate a run of filler objects to fill partially used slabs, free a few at chosen positions to open known holes, then trigger the bug so the vulnerable object lands next to or inside a slot you understand. After the free, the attacker sprays many copies of the replacement object so that even with some noise from other kernel activity, one of the sprayed copies almost certainly captures the freed slot. Message queue objects, socket buffers, and extended attribute buffers are popular spray vehicles because their size is attacker controlled and their contents are largely attacker controlled too. You pick a spray object whose size rounds into the same kmalloc cache as the victim, because reuse only works inside one cache.

There is a second reason grooming is necessary, and it comes from the per cpu free list. SLUB keeps a hot free list per CPU core. If the free and the reclaiming allocation run on different cores, they touch different free lists and the reclaim can miss. Exploits often pin themselves to one CPU with sched_setaffinity so the free and the spray hit the same per cpu list, restoring the clean last in, first out behavior the attack depends on. They also keep the spray objects in their own size band when they want the freed slot to come from a fresh slab rather than a busy one. These are small operational details, but they are the difference between a use after free that reclaims on the first try and one that reclaims one time in fifty.

Cache merging widens the field

SLUB also merges caches to save memory. Two caches that ask for the same object size and compatible flags can be folded into one shared cache at boot. The practical effect for an attacker is that an object you would expect to be isolated may in fact share a slab with general kmalloc allocations of the same size, because the kernel merged them. That expands the set of objects you can use to reclaim a freed slot. It also explains why a defense as simple as giving a sensitive structure a dedicated, non mergeable cache closes a whole class of reuse. If the victim cannot share a slab with anything you can spray, you cannot reclaim its freed slot with a chosen object, and the use after free loses its teeth.

Why reuse becomes power: choosing the victim object

Reuse alone is not escalation. What makes a use after free a root shell is the choice of which object reclaims the freed slot. The attacker wants an object that, once it overlaps the dangling reference, gives control over something the kernel trusts. Three classic targets show the range.

A function pointer you can aim

Some kernel objects hold a pointer to an operations table, a struct full of function pointers the kernel calls to do work. struct pipe_buffer is the textbook example. It carries a field ops that points at a static table such as anon_pipe_buf_ops, and the kernel calls through that table when a pipe is read, released, or confirmed. If an attacker reclaims a freed slot with a pipe_buffer whose ops field they control, the next pipe operation calls a function pointer of the attacker’s choosing. That is control flow hijack, the path toward running a chosen sequence of kernel instructions.

A length or pointer field you can lie about

Other victims do not need a function pointer at all. If the reclaiming object exposes a length field or a data pointer that the kernel later trusts for a copy, overwriting it turns a bounded operation into an arbitrary read or write. A message object whose size field has been inflated lets the kernel copy far more than the original allocation, reading neighboring kernel memory back to the attacker. This is the data only road, and it does not care about code at all.

A credential you can swap

The cleanest escalation skips memory corruption entirely. Every process points at a struct cred that records its uid and gid. A uid of zero is root. The DirtyCred technique, presented at a 2022 conference, builds on exactly this. Rather than forging bytes, it frees a credential or file object the process relies on, then races to allocate a privileged object of the same type into the freed slot. The kernel keeps using its dangling reference, except the reference now resolves to a privileged credential. The process is root because it is pointing at root’s credentials, and no kernel address ever needed to leak. The free list did the swap.

The file flavor of the same idea is worth seeing because it shows how little corruption a strong technique needs. An attacker opens a writable file, which the kernel checks and approves, then begins a write. Between the permission check and the actual write the attacker frees the file object through the bug and reallocates the slot with a file object opened against a read only target. The write the kernel already approved now lands on the read only file, because the reference it followed points at the swapped object. There is no forged pointer and no leaked address. The whole exploit is a well timed free and a reclaim, which is why these data only techniques survive across kernel versions and architectures that break pointer based exploits. They depend only on the allocator doing what it always does: hand a freed slot to the next request of the right size.

A real kernel use after free walked end to end

Concrete beats abstract, so anchor this in a documented bug. CVE-2021-22555 is a heap out of bounds write in the netfilter subsystem that had been present since Linux 2.6.19 in 2006, reachable by an unprivileged user through a user namespace. It is not itself a use after free, but the public writeup turns it into one, and the steps map onto everything above.

The flaw is a small overflow. When the kernel translates 32 bit iptables rules into 64 bit form, a memset writes a short run of zero bytes just past the end of an allocation. A few zero bytes does not sound like much. The exploit makes it enough.

The groom uses System V message queues, whose struct msg_msg headers carry a next pointer to a continuation segment and live in a controllable kmalloc cache. The attacker lays out primary and secondary messages so the two zero bytes land on the next pointer of a message header, clearing its low bytes and bending it to alias a second message. Now two message references point at one underlying object. Reading the message through one path frees the shared object while the other path keeps a stale reference. That stale reference is the use after free, manufactured out of a tiny overflow.

From there the pattern is the one we built. The attacker sprays struct pipe_buffer objects to reclaim the freed slot, reads back through the dangling reference to leak the address of a static kernel table and defeat KASLR, then reclaims again with a pipe_buffer whose ops pointer is forged. Closing the pipe calls through the forged table, redirecting kernel control flow into a chain that runs commit_creds(prepare_kernel_cred(NULL)), which installs root credentials on the current process. One overflow of two zero bytes, groomed into a use after free, reclaimed by a chosen victim, escalated to root. Every link is a piece described above. The MITRE record for the bug is CVE-2021-22555.

Why the kernel cannot just notice

A fair question is why the kernel does not simply detect that an object was freed and refuse to use it. The answer is that at the machine level there is nothing to detect. A pointer is a number. A freed slot is the same bytes it was a microsecond ago, minus the embedded free pointer SLUB wrote at the front. The CPU dereferencing a dangling pointer sees a valid mapped address with plausible contents. Nothing faults. The type system that would have caught this lived in the source code and was compiled away.

Defenses therefore attack the mechanics rather than the intent. Freelist pointer hardening, enabled by CONFIG_SLAB_FREELIST_HARDENED, stores the embedded next pointer obfuscated rather than raw. Instead of writing the next address plainly, SLUB stores it as the address XORed with a per cache random secret and with the slot’s own location, so a value computed roughly as ptr ^ slab_secret ^ slot_address. An attacker who overwrites a freed slot can no longer forge a valid free pointer without knowing the secret, which blocks the trick of pointing the free list at an arbitrary address. Cache separation moves sensitive objects out of the general kmalloc caches so they cannot share a slab with attacker controlled sprays. Credentials, for example, were given their own dedicated cache with account flags so they no longer merge with general allocations, which is why straightforward credential overwrites stopped working and attackers moved to cross cache techniques. Allocator quarantine and randomization delay and shuffle reuse so that the clean last in, first out reclaim is no longer a sure thing.

None of these make the underlying bug disappear. They raise the cost of the step between free and reuse. That is the honest framing: the dangling pointer is still wrong, the hardening only makes the wrongness harder to convert into control. Spotting the dangling pointer in the first place is a reasoning problem, the same kind of assumption testing covered in our piece on how vulnerabilities are actually found, and the escalation that follows is the classic privilege escalation story told at the level of slab slots.

The assumption that outlived its reference

Strip away the slabs and the spray and the forged tables and one assumption is left standing. The allocator assumes that when an object is freed, every reference to it is gone. Freeing is a promise the rest of the kernel makes: I am done with this, you may give the bytes to someone else. A use after free is that promise broken. A reference survived the free, and it kept pointing at the slot after the allocator handed those bytes to another owner.

Everything dangerous follows from that single broken promise. The size class sharing, the last in first out reclaim, the choice of a credential or a function pointer as the new tenant, all of it is just leverage applied to a reference that outlived its assumption. The allocator is not buggy and the victim object is not buggy. The bug is a pointer that should have been forgotten and was not. Finding that surviving reference, the one the code assumed could never still be live, is the whole game, and it is exactly the kind of assumption an autonomous researcher built to question what each component trusts is meant to surface before an attacker does. More on that approach is on our about page.

Frequently asked questions

What is a kernel use after free in simple terms?

It is a bug where the kernel frees an object but keeps a pointer to it, then the allocator hands that same memory to a different object. When the kernel uses the old pointer it reads or writes a structure that someone else now owns. If an attacker controls the contents of that new object, the kernel ends up trusting attacker chosen bytes as if they were a legitimate object.

Why does the SLUB allocator make use after free bugs exploitable?

SLUB serves objects from per size caches like kmalloc-256, and objects of the same size share slabs. It threads freed slots onto a free list stored inside the freed objects, and allocation pops from the head, so the most recently freed slot is the next one returned. An attacker frees the victim then immediately allocates a same size object to reclaim that exact slot with reliable timing.

How does a use after free turn into root access?

The freed slot is reclaimed by a victim object that gives control over something trusted. That can be a function pointer table like the ops field of a struct pipe_buffer, a length field that enables an arbitrary read or write, or a struct cred whose uid the attacker swaps for zero. The DirtyCred technique uses the credential swap path. A documented end to end example is CVE-2021-22555 in netfilter.

Can the kernel detect a dangling pointer on its own?

Not at runtime. A pointer is just a number and a freed slot still holds plausible bytes, so dereferencing it does not fault. Mitigations such as CONFIG_SLAB_FREELIST_HARDENED, dedicated caches for sensitive objects, and reuse randomization raise the cost of converting the bug into control, but they do not remove the surviving reference. The kernel.org documentation describes the hardening option at kernel self protection.