Let’s go bug hunting
At ForAllSecure, our mission is to help developers find critical bugs in their software quicker, easier, and faster than standard development practices and tools. To facilitate this mission, we have looked to the open source world for exemplar software we can analyze with our fuzzer, Mayhem, in order to get a stronger sense of its effectiveness and ease of integration into existing projects.
This process has proven invaluable for ForAllSecure, providing hands-on experience in ingesting additional real-world software targeting a variety of environments and build systems and ensuring that the process is as streamlined as possible for new adopters.
We have also had the opportunity to not only discover and report multiple security-relevant defects to open source projects, but also assist in the vulnerability fix and verification process, thus improving the security of their users.
You may also like: Getting Started With Static Analysis Without Overwhelming the Team.
In this post, we will examine how we analyzed two open source libraries using Mayhem in a specific workflow that we’ve found to be particularly effective for finding bugs. We will cover building fuzz targets, dockerizing then, and running them inside of Mayhem. Following this process, we found eight previously unknown security-relevant defects across these projects, which were assigned:
stb is a suite of single-file C libraries in the public domain, containing utility functions useful to developers working on computer graphics, applications, or games. Their liberal license and ease of integration have made these libraries a popular choice for developers in these domains.
The various components in this project provide an abundance of functionality, including image file parsing and manipulation, font file parsing, a voxel rendering engine, Ogg Vorbis audio file parsing (the functionality explored in this post), and more.
To follow along with this post, check out commit c72a95d766b8cbf5514e68d3ddbf6437ac9425b1 for an unpatched version of the library.
For our analysis of stb_vorbis, we will generate two different Mayhem-compatible targets, experimentally found to be an excellent combination for bug hunting: a LibFuzzer target and a standalone uninstrumented target compatible with Mayhem’s symbolic execution engine. These two targets not only complement each other for greater coverage in less time, but also require minimal setup to function.
The standalone target is itself sufficient for Mayhem to analyze using its support for black-box compiled binaries, but will incur overhead. Users may experience reduced performance in the form of execs per second, compared to when a LibFuzzer target is also provided. Setting up a LibFuzzer target requires a marginal amount of work for improved analysis efficiency.
One of the first questions that arise when attempting to fuzz a library target is “how do I feed fuzzed input to the target code?” In application targets, this input may be delivered via a file or the network. For a library, this is usually determined by the host application.
To simulate the usage of the library by a representative application, we will set up a small bit of code, so the target can take in raw bytes from the fuzzer and convert it into inputs that the library can use. To set up a LibFuzzer target, we require a function with a specific name and signature:
LLVMFuzzerTestOneInput(const uint8_t *data, size_t size), that accepts raw bytes and sends them to the target function(s). If you are unfamiliar with LibFuzzer and would like to know more, their documentation goes into further depth.
The target function, in this case,
stb_vorbis_decode_memory, is the function that takes the raw content of an Ogg Vorbis file and parses it into meaningful audio data. Parsing functions, in general, have the advantages (for bug hunters) of being notoriously hard to get right, but easy to send fuzzed data into.
A good place to look for examples on how to write this function is in the project’s test suite (if available). In our case, the file tests/test_vorbis.c provides a good example of what we need to do. Setting up the LibFuzzer target is as easy as adapting this test into the following code:
To convert the above into a standalone target as well, we require a
main() function that accepts input as a file and passes it to the
LLVMFuzzerTestOneInput function. Because the fuzz target function has a standardized ABI, we can define this once and link against any libFuzzer target to also generate a standalone target. This will prove to be useful when we use this same file for analyzing Matio later in this post. Our “driver” code looks like this:
Although this code is a little long, it only needs to be written once. Its purpose is to
mmap() in a file as input and pass its contents to our LibFuzzer fuzz target function.
Now that the code for our targets is written, we’ll build and link against the library to generate our targets.
Build systems for C and C++ code widely vary, which creates complexity when trying to analyze a new project or integrate a new library distributed as source-code into your project. This pain is familiar to most C and C++ developers. Fortunately, stb was designed as a set of “single file” libraries to specifically alleviate this pain and is easy to integrate.
Assuming our fuzz target cpp file and driver cpp file are in a subdirectory from the repository root, as noted in the comments at the top of each file, we can write a simple Makefile to generate our target binaries:
Generating target binaries
One way to prepare a set of target binaries for Mayhem is to build a Docker image containing the necessary binaries and supporting environment. Other workflows are available, but this is the recommended way to ensure all necessary dependencies are encapsulated in a way that can be effortlessly run on other systems. Detailed information about building Docker images can be found in the Docker documentation. In our case, creating the Dockerfile is straightforward:
To run in Mayhem, we must build, tag, and push this image to a Docker repository accessible to the Mayhem installation (such as the Mayhem installation’s built-in Docker repository).
Running With Mayhem
The last step is to configure our target. We require a Mayhemfile. To learn more about its contents, you can find more details in our documentation. As we use the Docker workflow, we need to tell Mayhem what Docker image to use and what targets exist inside the image. Our Mayhemfile looks like this:
Notice how there are two commands listed: one for the libfuzzer target and one for the standalone target. It is also recommended to provide a starting corpus of valid inputs (Ogg Vorbis files), which can be readily found on the internet or in the test suites of other Ogg Vorbis parsers. These can be placed in a corpus directory next to the Mayhemfile.
Once these steps are done, we can run the package with mayhem run and see the results of our analysis!
Once defects are found, Mayhem will automatically classify them, provide additional analysis, and share a test case that can be run with a debugger to pinpoint the bug. This information is incredibly useful, as it allows consistent reproduction of the crash, further examination of the program state leading up to it, and automated analysis. This information allows relatively easy root-causing and patching of the discovered defects. Fixes for these issues were included in stb_vorbis version 1.17.
On this target, Mayhem found the following defects:
- Heap buffer overflow.
- Stack buffer overflow.
- Division by zero.
- Null pointer dereference.
- Usage of uninitialized stack variables.
- Global out of bounds read.
- Reachable assertion.
The security impact of these vulnerabilities depends on the host application using this library and its deployment scenario. The two buffer overflows can be exploited to execute arbitrary code, the out of bounds read could be used to leak sensitive information from the process, and the remainder are likely limited to causing a crash/denial of service of the application.
As another example, we will examine Matio, which is an open source C library for parsing MATLAB files and an alternative to MATLAB’s own shared libraries for performing these functions. The steps we will follow are the same as stb:
Create a LibFuzzer target.
Create a standalone target.
Create a Docker image containing both.
Create a Mayhemfile for our target.
Assemble a starting corpus.
To follow along, checkout an unpatched version of Matio at tag v1.5.15.
Through an examination of the test suite and a Github search for API usages by downstream applications and libraries, users can develop a function that takes in bytes and feeds them to the MATLAB parsing core.
In this case, we write the bytes to a file on a
tmpfs filesystem on
/dev/shm. This prevents data dropping to disk unnecessarily, as there is no “read data from memory” function like there was in stb. This file is used as the input to the Matio MATLAB file parsing function.
Once parsed, we also iterate all variables contained in the file and read their data to include these code paths in our executions. This is critical. When coverage is not driven down these paths, these functions may hide relevant bugs.
We can use the same
fuzz/driver.cpp as we did for stb to generate the standalone target.
Matio uses GNU autotools for its build system, which is fairly easy to work with but requires extra steps to compile due to the flags required for the LibFuzzer and standalone case.
To compile for LibFuzzer, we use:
For the standalone case, we use just:
This generates a static library inside of
src/.libs/libmatio.a, which we can directly link our targets to in our Makefile:
Linking to targets in Makefile
We will also generate a Docker image with the following Dockerfile:
Again we build, tag, and push this image to a Docker registry.
Running With Mayhem
Next, we’ll set up a Mayhemfile and populate our corpus directory with sample MATLAB files:
Generating a Mayhemfile
Once these steps are done, we can run the package with
mayhem run .
In this case, Mayhem found one crash caused by an integer overflow. The overflow was detected, but in response, the resulting computation was set to 0, and that error condition was not checked. This resulted in a heap overflow by writing to a buffer allocated with
Although our simple fuzz target only exercised enough code to find one instance of this, a manual review of the code revealed several additional cases where this exact pattern existed. Fixing this bug involved analyzing all cases where the results of these checked multiplications were passed directly to
malloc() or were otherwise improperly used.
The example provided by Mayhem provided enough information to manually locate and correct several instances of this bug, including in e.g. the MATLAB 7.3 file parser, which was not directly exercised by our targets.
The impact of this vulnerability again depends on how an application uses the library. An application that parses untrusted MATLAB files with an unpatched version of Matio could conceivably allow an attacker to execute arbitrary code by leveraging this heap overflow.
While libraries require slightly more effort to fuzz than e.g. an application that directly reads input out of a file, fuzzing library functions is relatively straightforward and can lead to greater performance in terms of execs per second than repeatedly forking a full application. Combining the speed of LibFuzzer with other techniques, such as symbolic execution, furthers the efficiency of this approach.
When writing LibFuzzer fuzz target functions however, one must remain cognizant of the code paths that are being exercised. Creating multiple targets that cover different slices of the program is one approach to maximize the effectiveness of fuzzing. Thank you to the maintainers of stb (Sean Barrett, Github user nothings) and Matio (Thomas Beutlich, Github user tbeu) for their excellent handling of these reports and timely patches for the underlying issues!