Langsung ke konten utama

Exploiting The Linux Gist Via Bundle Sockets

Guest spider web log post, posted past times Andrey Konovalov

Introduction

Lately I’ve been spending some fourth dimension fuzzing network-related Linux heart as well as soul interfaces amongst syzkaller. Besides the lately discovered vulnerability inward DCCP sockets, I also flora some other one, this fourth dimension inward package sockets. This post describes how the põrnikas was discovered as well as how nosotros tin exploit it to escalate privileges.

The põrnikas itself (CVE-2017-7308) is a signedness issue, which leads to an exploitable heap-out-of-bounds write. It tin last triggered past times providing specific parameters to the PACKET_RX_RING selection on an AF_PACKET socket amongst a TPACKET_V3 band buffer version enabled. As a effect the next sanity banking concern check inward the packet_set_ring() component inward net/packet/af_packet.c tin last bypassed, which later on leads to an out-of-bounds access.

4207                 if (po->tp_version >= TPACKET_V3 &&
4208                     (int)(req->tp_block_size -
4209                           BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
4210                         goto out;

The põrnikas was introduced on Aug 19, 2011 inward the commit f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation") together amongst the TPACKET_V3 implementation. There was an elbow grease to develop it on Aug 15, 2014 inward commit dc808110 ("packet: remove hold also large packets for PACKET_V3") past times adding additional checks, but this was non sufficient, every bit shown below. The põrnikas was fixed inward 2b6867c2 ("net/packet: develop overflow inward banking concern check for priv surface area size") on Mar 29, 2017.

The põrnikas affects a heart as well as soul if it has AF_PACKET sockets enabled (CONFIG_PACKET=y), which is the instance for many Linux heart as well as soul distributions. Exploitation requires the CAP_NET_RAW privilege to last able to create such sockets. However it's possible to exercise that from a user namespace if they are enabled (CONFIG_USER_NS=y) as well as accessible to unprivileged users.

Since package sockets are a quite widely used heart as well as soul feature, this vulnerability affects a number of pop Linux heart as well as soul distributions including Ubuntu as well as Android. It should last noted, that access to AF_PACKET sockets is expressly disallowed to whatever untrusted code within Android, although it is available to some privileged components. Updated Ubuntu kernels are already out, Android’s update is scheduled for July.

Syzkaller


The põrnikas was flora amongst syzkaller, a coverage guided syscall fuzzer, as well as KASAN, a dynamic retentiveness mistake detector. I’m going to supply some details on how syzkaller works as well as how to utilisation it for fuzzing some heart as well as soul interface inward instance someone decides to essay this.

Let’s start amongst a quick overview of how the syzkaller fuzzer works. Syzkaller is able to generate random programs (sequences of syscalls) based on manually written template descriptions for each syscall. The fuzzer executes these programs as well as collects code coverage for each of them. Using the coverage information, syzkaller keeps a corpus of programs, which trigger different code paths inward the kernel. Whenever a novel programme triggers a novel code path (i.e. gives novel coverage), syzkaller adds it to the corpus. Besides generating completely novel programs, syzkaller is able to mutate the existing ones from the corpus.

Syzkaller is meant to last used together amongst dynamic põrnikas detectors similar KASAN (detects retentiveness bugs similar out-of-bounds as well as use-after-frees, available upstream since 4.0), KMSAN (detects uses of uninitialized memory, paradigm was just released) or KTSAN (detects information races, paradigm is available). The thought is that syzkaller stresses the heart as well as soul as well as executes diverse interesting code paths as well as the detectors discovery as well as study bugs.

The park workflow for finding bugs amongst syzkaller is every bit follows:
  1. Setup syzkaller as well as brand sure it works. README as well as wiki provides quite extensive information on how to exercise that.
  2. Write template descriptions for a especial heart as well as soul interface yous desire to test.
  3. Specify the syscalls that are used inward this interface inward the syzkaller config.
  4. Run syzkaller until it finds bugs. Usually this happens quite fast for the interfaces, that haven’t been tested amongst it previously.

Syzkaller uses it’s ain declarative linguistic communication to depict syscall templates. Checkout sys/sys.txt for an instance or sys/README.md for the information on the syntax. Here’s an excerpt from the syzkaller descriptions for AF_PACKET sockets that I used to discovery the bug:

resource sock_packet[sock]

define ETH_P_ALL_BE htons(ETH_P_ALL)

socket$packet(domain const[AF_PACKET], type flags[packet_socket_type], proto const[ETH_P_ALL_BE]) sock_packet

packet_socket_type = SOCK_RAW, SOCK_DGRAM

setsockopt$packet_rx_ring(fd sock_packet, score const[SOL_PACKET], optname const[PACKET_RX_RING], optval ptr[in, tpacket_req_u], optlen len[optval])
setsockopt$packet_tx_ring(fd sock_packet, score const[SOL_PACKET], optname const[PACKET_TX_RING], optval ptr[in, tpacket_req_u], optlen len[optval])

tpacket_req {
tp_block_size int32
tp_block_nr int32
tp_frame_size int32
tp_frame_nr int32
}

tpacket_req3 {
tp_block_size int32
tp_block_nr int32
tp_frame_size int32
tp_frame_nr int32
tp_retire_blk_tov int32
tp_sizeof_priv int32
tp_feature_req_word int32
}

tpacket_req_u [
req tpacket_req
req3 tpacket_req3
] [varlen]

The syntax is generally self-explanatory. First, nosotros declare a novel type sock_packet. This type is inherited from an existing type sock. That way syzkaller volition utilisation syscalls which have got arguments of type sock on sock_packet sockets every bit well.

After that, nosotros declare a novel syscall socket$packet. The component earlier the $ sign tells syzkaller what syscall it should use, as well as the component after the $ sign is used to differentiate betwixt different kinds of the same syscall. This is peculiarly useful when dealing amongst syscalls similar ioctl. The socket$packet syscall returns a sock_packet socket.

Then setsockopt$packet_rx_ring as well as setsockopt$packet_tx_ring are declared. These syscalls laid the PACKET_RX_RING as well as PACKET_TX_RING socket options on a sock_packet socket. I’ll speak most these options inward details below. Both of them utilisation the tpacket_req_u matrimony every bit a socket selection value. This matrimony has 2 struct members tpacket_req as well as tpacket_req3.

Once the descriptions are added, syzkaller tin last instructed to fuzz packet-related syscalls specifically. This is what I provided inward the syzkaller director config:

"enable_syscalls": [
"socket$packet", "socketpair$packet", "accept$packet", "accept4$packet", "bind$packet", "connect$packet", "sendto$packet", "recvfrom$packet", "getsockname$packet", "getpeername$packet", "listen", "setsockopt", "getsockopt", "syz_emit_ethernet"
],

After a few minutes of running syzkaller amongst these descriptions I started getting heart as well as soul crashes. Here’s ane of the syzkaller programs that triggered the mentioned bug:

mmap(&(0x7f0000000000/0xc8f000)=nil, (0xc8f000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
r0 = socket$packet(0x11, 0x3, 0x300)
setsockopt$packet_int(r0, 0x107, 0xa, &(0x7f000061f000)=0x2, 0x4)
setsockopt$packet_rx_ring(r0, 0x107, 0x5, &(0x7f0000c8b000)=@req3={0x10000, 0x3, 0x10000, 0x3, 0x4, 0xfffffffffffffffe, 0x5}, 0x1c)

And here’s ane of the KASAN reports. It should last noted, that since the access is quite far past times the block bounds, allotment as well as deallocation stacks don’t stand upwards for to the overflown object.

==================================================================
BUG: KASAN: slab-out-of-bounds inward prb_close_block net/packet/af_packet.c:808
Write of size four at addr ffff880054b70010 past times chore syz-executor0/30839

CPU: 0 PID: 30839 Comm: syz-executor0 Not tainted 4.11.0-rc2+ #94
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x292/0x398 lib/dump_stack.c:52
print_address_description+0x73/0x280 mm/kasan/report.c:246
kasan_report_error mm/kasan/report.c:345 [inline]
kasan_report.part.3+0x21f/0x310 mm/kasan/report.c:368
kasan_report mm/kasan/report.c:393 [inline]
__asan_report_store4_noabort+0x2c/0x30 mm/kasan/report.c:393
prb_close_block net/packet/af_packet.c:808 [inline]
prb_retire_current_block+0x6ed/0x820 net/packet/af_packet.c:970
__packet_lookup_frame_in_block net/packet/af_packet.c:1093 [inline]
packet_current_rx_frame net/packet/af_packet.c:1122 [inline]
tpacket_rcv+0x9c1/0x3750 net/packet/af_packet.c:2236
packet_rcv_fanout+0x527/0x810 net/packet/af_packet.c:1493
deliver_skb net/core/dev.c:1834 [inline]
__netif_receive_skb_core+0x1cff/0x3400 net/core/dev.c:4117
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4244
netif_receive_skb_internal+0x1d6/0x430 net/core/dev.c:4272
netif_receive_skb+0xae/0x3b0 net/core/dev.c:4296
tun_rx_batched.isra.39+0x5e5/0x8c0 drivers/net/tun.c:1155
tun_get_user+0x100d/0x2e20 drivers/net/tun.c:1327
tun_chr_write_iter+0xd8/0x190 drivers/net/tun.c:1353
call_write_iter include/linux/fs.h:1733 [inline]
new_sync_write fs/read_write.c:497 [inline]
__vfs_write+0x483/0x760 fs/read_write.c:510
vfs_write+0x187/0x530 fs/read_write.c:558
SYSC_write fs/read_write.c:605 [inline]
SyS_write+0xfb/0x230 fs/read_write.c:597
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x40b031
RSP: 002b:00007faacbc3cb50 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000002a RCX: 000000000040b031
RDX: 000000000000002a RSI: 0000000020002fd6 RDI: 0000000000000015
RBP: 00000000006e2960 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000708000
R13: 000000000000002a R14: 0000000020002fd6 R15: 0000000000000000

Allocated past times chore 30534:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:617
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:555
slab_post_alloc_hook mm/slab.h:456 [inline]
slab_alloc_node mm/slub.c:2720 [inline]
slab_alloc mm/slub.c:2728 [inline]
kmem_cache_alloc+0x1af/0x250 mm/slub.c:2733
getname_flags+0xcb/0x580 fs/namei.c:137
getname+0x19/0x20 fs/namei.c:208
do_sys_open+0x2ff/0x720 fs/open.c:1045
SYSC_open fs/open.c:1069 [inline]
SyS_open+0x2d/0x40 fs/open.c:1064
entry_SYSCALL_64_fastpath+0x1f/0xc2

Freed past times chore 30534:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:590
slab_free_hook mm/slub.c:1358 [inline]
slab_free_freelist_hook mm/slub.c:1381 [inline]
slab_free mm/slub.c:2963 [inline]
kmem_cache_free+0xb5/0x2d0 mm/slub.c:2985
putname+0xee/0x130 fs/namei.c:257
do_sys_open+0x336/0x720 fs/open.c:1060
SYSC_open fs/open.c:1069 [inline]
SyS_open+0x2d/0x40 fs/open.c:1064
entry_SYSCALL_64_fastpath+0x1f/0xc2

Object at ffff880054b70040 belongs to cache names_cache of size 4096
The buggy address belongs to the page:
page:ffffea000152dc00 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
flags: 0x500000000008100(slab|head)
raw: 0500000000008100 0000000000000000 0000000000000000 0000000100070007
raw: ffffea0001549a20 ffffea0001b3cc20 ffff88003eb44f40 0000000000000000
page dumped because: kasan: bad access detected

Memory land around the buggy address:
ffff880054b6ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880054b6ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880054b70000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                        ^
ffff880054b70080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880054b70100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

You tin discovery to a greater extent than details most syzkaller inward it’s repository as well as to a greater extent than details most KASAN inward the heart as well as soul documentation. If yous create upwards one's heed to essay syzkaller or KASAN as well as come across whatever troubles drib an e-mail to syzkaller@googlegroups.com or to kasan-dev@googlegroups.com.

Introduction to AF_PACKET sockets


To meliorate empathize the bug, the vulnerability it leads to as well as how to exploit it, nosotros require to empathize what AF_PACKET sockets are as well as how they are implemented inward the kernel.

Overview


AF_PACKET sockets allow users to transportation or have packets on the device driver level. This for instance lets them to implement their ain protocol on top of the physical layer or to sniff packets including Ethernet as well as higher levels protocol headers. To create an AF_PACKET socket a procedure must have got the CAP_NET_RAW capability inward the user namespace that governs its network namespace. More details tin last flora inward the packet sockets documentation. It should last noted that if a heart as well as soul has unprivileged user namespaces enabled, so an unprivileged user is able to create package sockets.

To transportation as well as have packets on a package socket, a procedure tin utilisation the transportation as well as recv syscalls. However, package sockets supply a way to exercise this faster past times using a band buffer, that’s shared betwixt the heart as well as soul as well as the userspace. Influenza A virus subtype H5N1 band buffer tin last created via the PACKET_TX_RING as well as PACKET_RX_RING socket options. The band buffer tin so last mmaped past times the user as well as the package information tin so last read or written straight to it.

There are a few different variants of the way the band buffer is handled past times the kernel. This variant tin last chosen past times the user past times using the PACKET_VERSION socket option. The departure betwixt band buffer versions tin last flora inward the kernel documentation (search for “TPACKET versions”).

One of the widely known users of AF_PACKET sockets is the tcpdump utility. This is roughly what happens when tcpdump is used to sniff all packets on a especial interface:

# strace tcpdump -i eth0
...
socket(PF_PACKET, SOCK_RAW, 768)        = 3
...
bind(3, {sa_family=AF_PACKET, proto=0x03, if2, pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
...
setsockopt(3, SOL_PACKET, PACKET_VERSION, [1], 4) = 0
...
setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=31, frame_size=65616, frame_nr=31}, 16) = 0
...
mmap(NULL, 4063232, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f73a6817000
...

This sequence of syscalls corresponds to the next actions:
  1. A socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)) is created.
  2. The socket is boundary to the eth0 interface.
  3. Ring buffer version is laid to TPACKET_V2 via the PACKET_VERSION socket option.
  4. A band buffer is created via the PACKET_RX_RING socket option.
  5. The band buffer is mmapped inward the userspace.

After that the heart as well as soul volition start putting all packets coming through the eth0 interface inward the band buffer as well as tcpdump volition read them from the mmapped part inward the userspace.



Ring buffers


Let’s regard how to utilisation band buffers for package sockets. For consistency all of the heart as well as soul code snippets below volition come upwards from the Linux heart as well as soul 4.8. This is the version the latest Ubuntu 16.04.2 heart as well as soul is based on.

The existing documentation generally focuses on TPACKET_V1 as well as TPACKET_V2 band buffer versions. Since the mentioned põrnikas solely affects the TPACKET_V3 version, I’m going to assume that nosotros bargain amongst that especial version for the residuum of the post. Also I’m going to generally focus on PACKET_RX_RING ignoring PACKET_TX_RING.

A band buffer is a retentiveness part used to shop packets. Each package is stored inward a separate frame. Frames are grouped into blocks. In TPACKET_V3 band buffers frame size is non fixed as well as tin have got arbitrary value every bit long every bit a frame fits into a block.

To create a TPACKET_V3 band buffer via the PACKET_RX_RING socket selection a user must supply the exact parameters for the band buffer. These parameters are passed to the setsockopt telephone recollect via a pointer to a request struct called tpacket_req3, which is defined as:

274 struct tpacket_req3 {
275         unsigned int    tp_block_size;  /* Minimal size of contiguous block */
276         unsigned int    tp_block_nr;    /* Number of blocks */
277         unsigned int    tp_frame_size;  /* Size of frame */
278         unsigned int    tp_frame_nr;    /* Total number of frames */
279         unsigned int    tp_retire_blk_tov; /* timeout inward msecs */
280         unsigned int    tp_sizeof_priv; /* offset to person information surface area */
281         unsigned int    tp_feature_req_word;
282 };

Here’s what each land way inward the tpacket_req3 struct:
  1. tp_block_size - the size of each block.
  2. tp_block_nr - the number of blocks.
  3. tp_frame_size - the size of each frame, ignored for TPACKET_V3.
  4. tp_frame_nr - the number of frames, ignored for TPACKET_V3.
  5. tp_retire_blk_tov - timeout after which a block is retired, fifty-fifty if it’s non fully filled amongst information (see below).
  6. tp_sizeof_priv - the size of per-block person area. This surface area tin last used past times a user to shop arbitrary information associated amongst each block.
  7. tp_feature_req_word - a laid of flags (actually just ane at the moment), which allows to enable some additional functionality.

Each block has an associated header, which is stored at the really get-go of the retentiveness surface area allocated for the block. The block header struct is called tpacket_block_desc as well as has a block_status field, which indicates whether the block is currently existence used past times the heart as well as soul or available to the user. The park workflow is that the heart as well as soul stores packets into a block until it’s total as well as so sets block_status to TP_STATUS_USER. The user so reads required information from the block as well as releases it dorsum to the heart as well as soul past times setting block_status to TP_STATUS_KERNEL.

186 struct tpacket_hdr_v1 {
187         __u32   block_status;
188         __u32   num_pkts;
189         __u32   offset_to_first_pkt;
...
233 };
234
235 matrimony tpacket_bd_header_u {
236         struct tpacket_hdr_v1 bh1;
237 };
238
239 struct tpacket_block_desc {
240         __u32 version;
241         __u32 offset_to_priv;
242         union tpacket_bd_header_u hdr;
243 };

Each frame also has an associated header described past times the struct tpacket3_hdr. The tp_next_offset land points to the next frame within the same block.

162 struct tpacket3_hdr {
163         __u32 tp_next_offset;
...
176 };

When a block is fully filled amongst information (a novel package doesn’t gibe into the remaining space), it’s closed as well as released to userspace or “retired” past times the kernel. Since the user normally wants to regard packets every bit before long every bit possible, the heart as well as soul tin release a block fifty-fifty if it’s non filled amongst information completely. This is done past times setting upwards a timer that retires electrical flow block amongst a timeout controlled past times the tp_retire_blk_tov parameter.

There’s also a way so specify per-block person area, which the heart as well as soul won’t touching as well as the user tin utilisation to shop whatever information associated amongst a block. The size of this surface area is passed via the tp_sizeof_priv parameter.

If you’d similar to meliorate empathize how a userspace programme tin utilisation TPACKET_V3 band buffer yous tin read the instance provided in the documentation (search for “TPACKET_V3 example“).


Implementation of AF_PACKET sockets


Let’s convey a quick await at how some of this is implemented inward the kernel.

Struct definitions


Whenever a package socket is created, an associated packet_sock struct is allocated inward the kernel:

103 struct packet_sock {
...
105         struct sock             sk;
...
108         struct packet_ring_buffer       rx_ring;
109         struct packet_ring_buffer       tx_ring;
...
123         enum tpacket_versions   tp_version;
...
130         int                     (*xmit)(struct sk_buff *skb);
...
132 };

The tp_version land inward this struct holds the band buffer version, which inward our instance is laid to TPACKET_V3 past times a PACKET_VERSION setsockopt call. The rx_ring as well as tx_ring fields depict the have as well as transmit band buffers inward instance they are created via PACKET_RX_RING as well as PACKET_TX_RING setsockopt calls. These 2 fields have got type packet_ring_buffer, which is defined as:

56 struct packet_ring_buffer {
57         struct pgv              *pg_vec;
...
70         struct tpacket_kbdq_core        prb_bdqc;
71 };

The pg_vec land is a pointer to an array of pgv structs, each of which holds a reference to a block. Blocks are genuinely allocated separately, non every bit a ane contiguous retentiveness region.

52 struct pgv {
53         char *buffer;
54 };



The prb_bdqc land is of type tpacket_kbdq_core as well as its fields depict the electrical flow land of the band buffer:

14 struct tpacket_kbdq_core {
...
21         unsigned brusque  blk_sizeof_priv;
...
36         char            *nxt_offset;
...
49         struct timer_list retire_blk_timer;
50 };

The blk_sizeof_priv fields contains the size of the per-block person area. The nxt_offset land points within the currently active block as well as shows where the next package should last saved. The retire_blk_timer land has type timer_list as well as describes the timer which retires electrical flow block on timeout.

12 struct timer_list {
...
17         struct hlist_node       entry;
18         unsigned long           expires;
19         void                    (*function)(unsigned long);
20         unsigned long           data;
...
31 };

Ring buffer setup


The heart as well as soul uses the packet_setsockopt() component to remove hold setting socket options for package sockets. When the PACKET_VERSION socket selection is used, the heart as well as soul sets po->tp_version to the provided value.

With the PACKET_RX_RING socket selection a have band buffer is created. Internally it’s done past times the packet_set_ring() function. This component does a lot of things, so I’ll just present the of import parts. First, packet_set_ring() performs a bunch of sanity checks on the provided band buffer parameters:

4202                 err = -EINVAL;
4203                 if (unlikely((int)req->tp_block_size <= 0))
4204                         goto out;
4205                 if (unlikely(!PAGE_ALIGNED(req->tp_block_size)))
4206                         goto out;
4207                 if (po->tp_version >= TPACKET_V3 &&
4208                     (int)(req->tp_block_size -
4209                           BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
4210                         goto out;
4211                 if (unlikely(req->tp_frame_size < po->tp_hdrlen +
4212                                         po->tp_reserve))
4213                         goto out;
4214                 if (unlikely(req->tp_frame_size & (TPACKET_ALIGNMENT - 1)))
4215                         goto out;
4216
4217                 rb->frames_per_block = req->tp_block_size / req->tp_frame_size;
4218                 if (unlikely(rb->frames_per_block == 0))
4219                         goto out;
4220                 if (unlikely((rb->frames_per_block * req->tp_block_nr) !=
4221                                         req->tp_frame_nr))
4222                         goto out;

Then, it allocates the band buffer blocks:

4224                 err = -ENOMEM;
4225                 order = get_order(req->tp_block_size);
4226                 pg_vec = alloc_pg_vec(req, order);
4227                 if (unlikely(!pg_vec))
4228                         goto out;

It should last noted that alloc_pg_vec() uses the heart as well as soul page allocator to allocate blocks (we’ll utilisation this inward the exploit):

4104 static char *alloc_one_pg_vec_page(unsigned long order)
4105 {
...
4110         buffer = (char *) __get_free_pages(gfp_flags, order);
4111         if (buffer)
4112                 return buffer;
...
4127 }
4128
4129 static struct pgv *alloc_pg_vec(struct tpacket_req *req, int order)
4130 {
...
4139         for (i = 0; i < block_nr; i++) {
4140                 pg_vec[i].buffer = alloc_one_pg_vec_page(order);
...
4143         }
...
4152 }

Finally, packet_set_ring() calls init_prb_bdqc(), which performs some additional steps to laid upwards a TPACKET_V3 have band buffer specifically:

4229                 switch (po->tp_version) {
4230                 case TPACKET_V3:
...
4234                         if (!tx_ring)
4235                                 init_prb_bdqc(po, rb, pg_vec, req_u);
4236                         break;
4237                 default:
4238                         break;
4239                 }

The init_prb_bdqc() component copies provided band buffer parameters to the prb_bdqc land of the band buffer struct, calculates some other parameters based on them, sets upwards the block retire timer as well as calls prb_open_block() to initialize the foremost block:

604 static void init_prb_bdqc(struct packet_sock *po,
605                         struct packet_ring_buffer *rb,
606                         struct pgv *pg_vec,
607                         union tpacket_req_u *req_u)
608 {
609         struct tpacket_kbdq_core *p1 = GET_PBDQC_FROM_RB(rb);
610         struct tpacket_block_desc *pbd;
...
616         pbd = (struct tpacket_block_desc *)pg_vec[0].buffer;
617         p1->pkblk_start = pg_vec[0].buffer;
618         p1->kblk_size = req_u->req3.tp_block_size;
...
630         p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
631
632         p1->max_frame_len = p1->kblk_size - BLK_PLUS_PRIV(p1->blk_sizeof_priv);
633         prb_init_ft_ops(p1, req_u);
634         prb_setup_retire_blk_timer(po);
635         prb_open_block(p1, pbd);
636 }

On of the things that the prb_open_block() component does is it sets the nxt_offset land of the tpacket_kbdq_core struct to dot right after the per-block person area:

841 static void prb_open_block(struct tpacket_kbdq_core *pkc1,
842         struct tpacket_block_desc *pbd1)
843 {
...
862         pkc1->pkblk_start = (char *)pbd1;
863         pkc1->nxt_offset = pkc1->pkblk_start + BLK_PLUS_PRIV(pkc1->blk_sizeof_priv);
...
876 }

Packet reception


Whenever a novel package is received, the heart as well as soul is supposed to salve it into the band buffer. The telephone substitution component hither is __packet_lookup_frame_in_block(), which does the following:
  1. Checks whether the currently active block has plenty infinite for the packet.
  2. If yes, saves the package to the electrical flow block as well as returns.
  3. If no, dispatches the next block as well as saves the package there.

1041 static void *__packet_lookup_frame_in_block(struct packet_sock *po,
1042                                             struct sk_buff *skb,
1043                                                 int status,
1044                                             unsigned int len
1045                                             )
1046 {
1047         struct tpacket_kbdq_core *pkc;
1048         struct tpacket_block_desc *pbd;
1049         char *curr, *end;
1050
1051         pkc = GET_PBDQC_FROM_RB(&po->rx_ring);
1052         pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
...
1075         curr = pkc->nxt_offset;
1076         pkc->skb = skb;
1077         end = (char *)pbd + pkc->kblk_size;
1078
1079         /* foremost essay the electrical flow block */
1080         if (curr+TOTAL_PKT_LEN_INCL_ALIGN(len) < end) {
1081                 prb_fill_curr_block(curr, pkc, pbd, len);
1082                 return (void *)curr;
1083         }
1084
1085         /* Ok, unopen the electrical flow block */
1086         prb_retire_current_block(pkc, po, 0);
1087
1088         /* Now, essay to dispatch the next block */
1089         curr = (char *)prb_dispatch_next_block(pkc, po);
1090         if (curr) {
1091                 pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
1092                 prb_fill_curr_block(curr, pkc, pbd, len);
1093                 return (void *)curr;
1094         }
...
1101 }

Vulnerability


Bug


Let’s await closely at the following check from packet_set_ring():

4207                 if (po->tp_version >= TPACKET_V3 &&
4208                     (int)(req->tp_block_size -
4209                           BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
4210                         goto out;

This is supposed to ensure that the length of the block header together amongst the per-block person information is non bigger than the size of the block. Which totally makes sense, otherwise nosotros won’t have got plenty infinite inward the block for them permit lonely the package data.

However turns out this banking concern check tin last bypassed. In instance req_u->req3.tp_sizeof_priv has the higher flake set, casting the aspect to int results inward a large positive value instead of negative. To illustrate this behavior:

A = req->tp_block_size = 4096 = 0x1000
B = req_u->req3.tp_sizeof_priv = (1 << 31) + 4096 = 0x80001000
BLK_PLUS_PRIV(B) = (1 << 31) + 4096 + 48 = 0x80001030
A - BLK_PLUS_PRIV(B) = 0x1000 - 0x80001030 = 0x7fffffd0
(int)0x7fffffd0 = 0x7fffffd0 > 0

Later, when req_u->req3.tp_sizeof_priv is copied to p1->blk_sizeof_priv inward init_prb_bdqc() (see the snippet above), it’s clamped to 2 lower bytes, since the type of the latter is unsigned short. So this põrnikas basically allows us to laid the blk_sizeof_priv of the tpacket_kbdq_core struct to arbitrary value bypassing all sanity checks.

Consequences


If nosotros search through the net/packet/af_packet.c source looking for blk_sizeof_priv usage, we’ll discovery that it’s existence used inward the 2 next places.

The foremost ane is inward init_prb_bdqc() right after it gets assigned (see the code snippet above) to laid max_frame_len. The value of p1->max_frame_len denotes the maximum size of a frame that tin last saved into a block. Since nosotros command p1->blk_sizeof_priv, nosotros tin brand BLK_PLUS_PRIV(p1->blk_sizeof_priv) bigger than p1->kblk_size. This volition effect inward p1->max_frame_len having a huge value, higher than the size of a block. This allows us to bypass the size check when a frame is existence copied into a block, thus causing a kernel heap out-of-bounds write.

That’s non all. Another user of blk_sizeof_priv is prb_open_block(), which initializes a block (the code snippet is higher upwards every bit well). There pkc1->nxt_offset denotes the address, where the heart as well as soul volition write a novel package when it’s existence received. The heart as well as soul doesn’t intend to overwrite the block header as well as per-block person data, so it makes this address to dot right after them. Since nosotros command blk_sizeof_priv, nosotros tin command the lowest 2 bytes of nxt_offset. This allows us to command offset of the out-of-bounds write.

To total up, this põrnikas leads to a heart as well as soul heap out-of-bounds write of controlled maximum size as well as controlled offset upwards to most 64k bytes. 

Exploitation


Let’s regard how nosotros tin exploit this vulnerability. I’m going to last targeting x86-64 Ubuntu 16.04.2 amongst 4.8.0-41-generic heart as well as soul version amongst KASLR, SMEP as well as SMAP enabled. Ubuntu heart as well as soul has user namespaces available to unprivileged users (CONFIG_USER_NS=y as well as no restrictions on it’s usage), so the põrnikas tin last exploited to gain source privileges past times an unprivileged user. All of the exploitation steps below are performed from within a user namespace.

The Linux heart as well as soul has back upwards for a few hardening features that brand exploitation to a greater extent than difficult. KASLR (Kernel Address Space Layout Randomization) puts the heart as well as soul text at a random offset to brand jumping to a especial fixed address useless. SMEP (Supervisor Mode Execution Protection) causes an oops whenever the heart as well as soul tries to execute code from the userspace retentiveness as well as SMAP (Supervisor Mode Access Prevention) does the same whenever the heart as well as soul tries to access the userspace retentiveness directly.

Shaping heap


The thought of the exploit is to utilisation the heap out-of-bounds write to overwrite a component pointer inward the retentiveness next to the overflown block. For that nosotros require to specifically shape the heap, so some object amongst a triggerable component pointer is placed right after a band buffer block. I chose the already mentioned packet_sock struct to last this object. We require to discovery a way to brand the heart as well as soul allocate a band buffer block as well as a packet_sock struct ane next to the other.

As I mentioned above, band buffer blocks are allocated amongst the kernel page allocator (buddy allocator). It allows to allocate blocks of 2^n contiguous retentiveness pages. The allocator keeps a freelist of such block for each n as well as returns the freelist caput when a block is requested. If the freelist for some n is empty, it finds the foremost m > n, for which the freelist is non empty as well as splits it inward halves until the required size is reached. Therefore, if nosotros start repeatedly allocating blocks of size 2^n, at some dot they volition start coming from ane high social club retentiveness block existence divide as well as they volition last next each ane to the next.

A packet_sock is allocated via the kmalloc() component past times the slab allocator. The slab allocator is generally used to allocate objects of a smaller-than-one-page size. It uses the page allocator to allocate a large block of retentiveness as well as splits this block into smaller objects. The large blocks are called slabs, therefore the call of the allocator. Influenza A virus subtype H5N1 laid of slabs together amongst their electrical flow land as well as a laid of operations similar “allocate an object” as well as “free an object” is called a cache. The slab allocator creates a laid of full general purpose caches for objects of size 2^n. Whenever kmalloc(size) is called, the slab allocator rounds size upwards to the nearest powerfulness of 2 as well as uses the cache of that size.

Since the heart as well as soul uses kmalloc() all the time, if nosotros essay to allocate an object it volition most probable come upwards from ane of the slabs already created during previous usage. However, if nosotros start allocating objects of the same size, at some dot the slab allocator volition run out of slabs for this size as well as volition have got to allocate some other ane via the page allocator.

The size of a newly allocated slab depends on the size of objects this slab is meant for. The size of the packet_sock struct is 1920 as well as 1024 < 1920 <= 2048, which way that it’ll last rounded to 2048 as well as the kmalloc-2048 cache volition last used. Turns out, for this especial cache the SLUB allocator (which is the variety of slab allocator used inward Ubuntu) uses slabs of size 0x8000. So whenever the allocator runs out of slabs for the kmalloc-2048 cache, it allocates 0x8000 bytes amongst the page allocator.

Keeping all that inward mind, this is how nosotros tin allocate a kmalloc-2048 slab next to a band buffer block:
  1. Allocate a lot (512 worked for me) of objects of size 2048 to fill upwards currently existing slabs inward the kmalloc-2048 cache. To exercise that nosotros tin create a bunch of package sockets to crusade allotment of packet_sock structs.
  2. Allocate a lot (1024 worked for me) page blocks of size 0x8000 to drain the page allocator freelists as well as crusade some high-order page block to last split. To exercise that nosotros tin create some other package socket as well as attach a band buffer amongst 1024 blocks of size 0x8000.
  3. Create a package socket as well as attach a band buffer amongst blocks of size 0x8000. The final ane of these blocks (I’m using 2 blocks, the argue is explained below) is the ane we’re going to overflow.
  4. Create a bunch of package sockets to allocate packet_sock structs as well as crusade an allotment of at to the lowest degree ane novel slab.
This way nosotros tin shape the heap inward the next way:



The exact number of allocations to drain freelists as well as shape the heap the way nosotros desire mightiness last different for different setups as well as depend on the retentiveness usage activity. The numbers higher upwards are for a generally idle Ubuntu machine.

Controlling the overwrite


Above I explained that the põrnikas results inward a write of a controlled maximum size at a controlled offset out of the bounds of a band buffer block. Turns out non solely nosotros tin command the maximum size as well as offset, nosotros tin genuinely command the exact information (and it’s size) that’s existence written. Since the information that’s existence stored inward a band buffer block is the package that’s passing through a especial network interface, nosotros tin manually transportation packets amongst arbitrary content on a raw socket through the loopback interface. If we’re doing that inward an isolated network namespace no external traffic volition interfere.

There are a few caveats though.

First, it seems that the size of a package must last at to the lowest degree fourteen bytes (12 bytes for 2 mac addresses as well as 2 bytes for the EtherType apparently) for it to last passed to the package socket layer. That way that nosotros have got to overwrite at to the lowest degree fourteen bytes. The information inward the package itself tin last arbitrary.

Then, the lowest iii bits of nxt_offset ever have got the value of 2 due to the alignment. That way that nosotros can’t start overwriting at an 8-byte aligned offset.

Besides that, when a package is existence received as well as saved into a block, the heart as well as soul updates some fields inward the block as well as frame headers. If nosotros dot nxt_offset to some especial offset nosotros desire to overwrite, some information where the block as well as frames headers terminate upwards volition belike last corrupted.

Another number is that if nosotros brand nxt_offset dot past times the block end, the foremost block volition last forthwith closed when the foremost package is existence received, since the heart as well as soul volition (correctly) create upwards one's heed that there’s no infinite left inward the foremost block (see the __packet_lookup_frame_in_block() snippet). This is non genuinely an issue, since nosotros tin create a band buffer amongst 2 blocks. The foremost ane volition last closed, the 2d ane volition last overflown.

Executing code


Now, nosotros require to figure out which component pointers to overwrite. There are a few of component pointers fields inward the packet_sock struct, but I ended upwards using the next two:
  1. packet_sock->xmit
  2. packet_sock->rx_ring->prb_bdqc->retire_blk_timer->func

The foremost ane is called whenever a user tries to send a packet via a package socket. The park way to bring upwards privileges to source is to execute the commit_creds(prepare_kernel_cred(0)) payload inward a procedure context. The xmit pointer is called from a procedure context, which way nosotros tin just dot it to the executable retentiveness region, which contains the payload.

To exercise that nosotros require to position our payload to some executable retentiveness region. One of the possible ways for that is to position the payload inward the userspace, either past times mmapping an executable retentiveness page or past times just defining a global component within our exploit program. However, SMEP & SMAP volition foreclose the heart as well as soul from accessing as well as executing user retentiveness directly, so nosotros require to bargain amongst them first.

For that I used the retire_blk_timer land (the same land used past times Philip Pettersson inward his dmesg_restrict as well as it restricts the powerfulness of unprivileged users to read the heart as well as soul syslog. It should last noted, that fifty-fifty amongst dmesg restricted the foremost user on Ubuntu tin even so read the syslog from /var/log/kern.log as well as /var/log/syslog since he belongs to the adm group.

Another characteristic is called kptr_restrict as well as it doesn’t allow unprivileged users to regard pointers printed past times the heart as well as soul amongst the %pK format specifier. However inward 4.8 the free_reserved_area() component uses %p, so kptr_restrict doesn’t assist inward this case. In 4.10 free_reserved_area() was fixed non to impress address ranges at all, but the modify was non backported to older kernels.

Fix


Let’s convey a await at the fix. The vulnerable code every bit it was earlier the develop is below. Remember that the user fully controls both tp_block_size as well as tp_sizeof_priv.

4207                 if (po->tp_version >= TPACKET_V3 &&
4208                     (int)(req->tp_block_size -
4209                           BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv)) <= 0)
4210                         goto out;

When thinking most a way to develop this, the foremost thought that comes to heed is that nosotros tin compare the 2 values every bit is without that weird conversion to int:

4207                 if (po->tp_version >= TPACKET_V3 &&
4208                     req->tp_block_size <=
4209                           BLK_PLUS_PRIV(req_u->req3.tp_sizeof_priv))
4210                         goto out;

Funny enough, this doesn’t genuinely help. The argue is that an overflow tin occur spell evaluating BLK_PLUS_PRIV inward instance tp_sizeof_priv is unopen to the unsigned int maximum value.

177 #define BLK_PLUS_PRIV(sz_of_priv) \
178         (BLK_HDR_LEN + ALIGN((sz_of_priv), V3_ALIGNMENT))

One of the ways to develop this overflow is to cast tp_sizeof_priv to uint64 earlier passing it to BLK_PLUS_PRIV. That’s just what I did inward the develop that was sent upstream.

4207                 if (po->tp_version >= TPACKET_V3 &&
4208                     req->tp_block_size <=
4209                           BLK_PLUS_PRIV((u64)req_u->req3.tp_sizeof_priv))
4210                         goto out;

Mitigation


Creating package socket requires the CAP_NET_RAW privilege, which tin last acquired past times an unprivileged user within a user namespaces. Unprivileged user namespaces expose a huge heart as well as soul assault surface, which resulted inward quite a few exploitable vulnerabilities (CVE-2017-7184, CVE-2016-8655, ...). This variety of heart as well as soul vulnerabilities tin last mitigated past times completely disabling user namespaces or disallowing using them to unprivileged users.

To disable user namespaces completely yous tin rebuild your heart as well as soul amongst CONFIG_USER_NS disabled. Restricting user namespaces usage solely to privileged users tin last done past times writing 0 to /proc/sys/kernel/unprivileged_userns_clone inward Debian-based kernel. Since version 4.9 the upstream heart as well as soul has a similar /proc/sys/user/max_user_namespaces setting.

Conclusion


Right at nowadays the Linux heart as well as soul has a huge number of poorly tested (from a safety standpoint) interfaces as well as a lot of them are enabled as well as exposed to unprivileged users inward pop Linux distributions similar Ubuntu. This is evidently non skillful as well as they require to last tested or restricted.

Syzkaller is an amazing tool that allows to essay heart as well as soul interfaces via fuzzing. Even adding barebone descriptions for some other syscall normally uncovers numbers of bugs. We for certain require people writing syscall descriptions as well as fixing existing ones, since there’s a huge surface that’s even so non covered as well as belike a ton of safety bugs buried inward the kernel. If yous create upwards one's heed to contribute, we’ll last glad to regard a draw request.

Links


Just a bunch of related links.


Our Linux heart as well as soul põrnikas finding tools:

A collection of Linux heart as well as soul exploitation materials: https://github.com/xairy/linux-kernel-exploitation

Komentar

Postingan populer dari blog ini

Chrome Bone Exploit: 1 Byte Overflow As Well As Symlinks

The next article is an invitee weblog post from an external researcher (i.e. the writer is non a or Google researcher). This post is most a Chrome OS exploit I reported to Chrome VRP inward September. The folks were squeamish to allow me do a invitee post most it, therefore hither goes. The study includes a detailed writeup , therefore this post volition have got less detail. 1 byte overflow inward a DNS library In Apr I constitute a TCP port listening on localhost inward Chrome OS. It was an HTTP proxy built into shill, the Chrome OS network manager. The proxy has at nowadays been removed equally component of a fix, but its source tin give notice nonetheless move seen from an one-time revision: shill/http_proxy.cc . The code is unproblematic in addition to doesn’t seem to incorporate whatever obvious exploitable bugs, although it is real liberal inward what it accepts equally incoming HTTP. It calls into the c-ares library for resolving DNS. There was a possible 1 byte ov...

Exception-Oriented Exploitation On Ios

Posted past times Ian Beer, This postal service covers the regain in addition to exploitation of CVE-2017-2370 , a heap buffer overflow inwards the mach_voucher_extract_attr_recipe_trap mach trap. It covers the bug, the evolution of an exploitation technique which involves repeatedly in addition to deliberately crashing in addition to how to build alive meat introspection features using onetime meat exploits. It’s a trap! Alongside a large number of BSD syscalls (like ioctl, mmap, execve in addition to so on) XNU also has a pocket-sized number of extra syscalls supporting the MACH side of the meat called mach traps. Mach trap syscall numbers start at 0x1000000. Here’s a snippet from the syscall_sw.c file where the trap tabular array is defined: /* 12 */ MACH_TRAP(_kernelrpc_mach_vm_deallocate_trap, 3, 5, munge_wll), /* xiii */ MACH_TRAP(kern_invalid, 0, 0, NULL), /* xiv */ MACH_TRAP(_kernelrpc_mach_vm_protect_trap, 5, 7, munge_wllww), Most of the mach traps a...

Lifting The (Hyper) Visor: Bypassing Samsung’S Real-Time Total Protection

Posted yesteryear Gal Beniamini, Traditionally, the operating system’s total is the concluding security boundary standing betwixt an assaulter together with total command over a target system. As such, additional aid must hold upwards taken inwards lodge to ensure the integrity of the kernel. First, when a organization boots, the integrity of its primal components, including that of the operating system’s kernel, must hold upwards verified. This is achieved on Android yesteryear the verified kicking chain . However, only booting an authenticated total is insufficient—what most maintaining the integrity of the total spell the organization is executing? Imagine a scenario where an assaulter is able to abide by together with exploit a vulnerability inwards the operating system’s kernel. Using such a vulnerability, the assaulter may endeavor to subvert the integrity of the total itself, either yesteryear modifying the contents of its code, or yesteryear introducing novel attacker-co...