Helix: a Platform for Efficiently Transforming Binaries

Helix is a platform for safely and efficiently transforming binaries. To date, Helix transformations have enabled code and data diversification, hardening and fuzzing, all without requiring source code or build artifacts. Helix has been applied to all kinds of software, ranging from web servers, crypto-currencies, to control software running on a flying drone.

Helix has also been used as part of CI/CD pipelines (DevSecOps), either to augment binaries with security or for automatically finding potential vulnerabilities via fuzzing (video).

Workflow

The Helix toolchain takes as input a binary and outputs a diversified set of functionally-equivalent and hardened binaries. A key central element of the Helix architecture is its Intermediate Representation Database (IRDB). The IRDB stores a representation of a binary and allows modification of this representation via a standard SQL interface.

The first stage in the Helix pipeline performs a reverse-engineering pass using various binary analysis tools, and inserts the initial representation of the input binary into the IRDB. Helix plugin modules can then effect their desired transformations by modifying the state representation of binaries stored in the IRDB using a high-level API that provides abstractions for instructions, functions, data and control-flow information. The last stage in the Helix pipeline, e.g. Zipr, emits an executable binary.

Highlighted Transformations

  • Block-level Instruction Layout Randomization (BILR): BILR is a high-entropy diversification technique that randomizes more than 99% of the location of instructions.
  • Selective Control-Flow Integrity (SCFI): SCFI analyzes a binary to recover a control-flow graph specification and rewrites the binary to enforce the inferred specification at run-time.
  • Stack-layout Randomization (SLX): SLX rewrites binaries to modify stack frames and stack variable layout. SLX supports the addition of random padding and canaries, as well as XOR’ing of return values. In its aggressive configuration, SLX also reorders stack variables.
  • Fast Binary Fuzzing (ZAFL): Coverage-guided fuzzers such as AFL have been spectacularly effective at uncovering security-critical bugs. The standard AFL workflow is to augment clang or gcc with a pass to add instrumentation for every block of a program. ZAFL inlines this instrumentation directly into binaries, thus obviating the need for source code availability or for understanding the build system (for real-world software, modifying the build system is not an activity to be undertaken lightly).

As Helix is a generic binary transformation infrastructure, it has been used as a key technology in several cyber-security related projects.

  • Cyber Grand Challenge (DARPA): Xandra, our cyber reasoning system developed with Grammatech, used Helix to augment challenge binaries with control-flow integrity and diversity techniques. Helix’s low overhead enabled effective defenses while remaining within the stringent performance envelope imposed by DARPA. Xandra achieved the #1 ranking in defense. In addition, Helix was used in the qualifying event for enabling the fuzzing of binaries with the AFL fuzzer.
  • Cyber Fault-Tolerant Attack Recovery (DARPA): Helix was used to produce both structured and probabilistic variants of binaries. As a testament to its robustness, Helix was even able to transform binaries generated from ADA source code!
  • Trusted and Resilient Mission Operation (Air Force): Helix was used to harden and diversify uncrewed autonomous vehicle software against a wide-range of attack classes that target memory-related vulnerabilities.
  • Anti-Fragility Workstation for Resilience (Air Force): Led by Assured Information Security, this project used Helix to protect and diversify binaries, and to enable the efficient fuzzing of binaries.

Software

Sponsors

We gratefully acknowledge our sponsors: DARPA, AFOSR and AFRL.

Jack W. Davidson
Jack W. Davidson
Professor of Computer Science

Jack Davidson is an ACM and IEEE Fellow. His research interests include compilers, programming languages, computer architecture, embedded systems, and computer security. His current research interests are focused on the areas of computer security, run-time management of applications running on multi-core systems, and computer science education.