Format Strings – Behind the Scenes

Format string vulnerabilities belong to a special family of vulnerabilities: a family of vulnerabilities that were once destructive but now days receive a decreasing amount of attention. Since most vulnerable code samples are based on poor C/C++ programming education, much like SQL Injections in SQL, most researchers believe that this vulnerability group can be completely mitigated with proper coding standards. However, this blog post will address a hidden aspect of the format string vulnerability, one that can impact quite a large number of high level programming languages.

Format String 101

Formatted strings are designed to be used for building fancy strings, mostly while printing human-readable messages, logs or even automatically generating file paths. Most implementations of this useful feature are similar to the sprintf() C function. As one can see in the man pages, there are some known basic primitives that can be used in the format:

  • %c – prints a single character
  • %d – prints a decimal number
  • %x – prints a hexadecimal number
  • %s – prints a null terminated string

In addition, the formatted print can use several options, or specifiers, to make consecutive prints match each other. For example:

  • %08x – print the hexadecimal number with a width of at least 8 digits, padding with zeros
  • %.8f – print a fraction with a precision of at least 8 digits after the ‘.’

A close look over the man pages will show that there are more formatting features than one would expect, making the task of formatting a string quite complicated.

Classic Format String Vulnerabilities

While format string vulnerabilities are mostly coupled with C/C++ code, there where some articles trying to asses the potential damage of such vulnerabilities in other languages, such as: ruby, java, perl, python, php, etc. One such article is: C/C++ isn’t alone. Format string vulnerabilities occur when a “tainted” format is being used for the formatting operation. For example, the following C/C++ code is used for printing a string received from the user:


This vulnerability is built of 3 main aspects:

  1. printf() expects the 1st argument to be a format
  2. C/C++ function with a variadic amount of parameters can’t “know” how many arguments where passed to it
  3. A controlled format will control the entire behavior of the function, including the amount of “arguments” it will use

By using the format “%d” the attacker will be able to print a decimal number from the stack, and by chaining more format operators, he can achieve an extensive and controllable Information Disclosure (INF). Older implementations supported the “%n” operator that enabled a write primitive into an argument, leveraging the attack into a code-execution threat.

High level languages

High level languages: Java and C#, and interpreter-based languages: python, php, perl, and ruby, are often treated as “immune” to such attacks. Mainly because the 2nd condition now doesn’t apply: these languages know how many arguments were passed to the format function, and they can even check their types for correctness. Nevertheless, over the years there were several reports about format string implementation vulnerabilities, such as CVE 2008-2664 (CRuby) and CVE 2005-3962 (Perl). I believe these CVEs shed only a minor light on implementations that are far more prone to security vulnerabilities.

(C)Ruby and M(Ruby)

CRuby is the popular implementation of the Ruby language. Extensive parts of the implementation of this Japanese library were designed and written by Yukihiro “Matz” Matsumoto. For reasons that I don’t know, CRuby is now maintained by other programmers, while “Matz” is the main developer of yet another Ruby implementation: MRuby. Mruby strives to be a more “lightweight” implementation of the Ruby programming language, meaning that some features were completely rewritten, while others are based on a shared code base.

More MRuby Background

Mruby is used by “Shopify” in a VM-like commercial environment. Meaning that an isolated interpreter process receives a partially-hostile (maybe badly written) ruby script, executes it with limited resources, and returns the result to the main process. This is a relatively interesting scenario in which an interpreter based VM (or “cave”) is being used. From an attacker’s standpoint, this pose a challenge of a VM-escape using vulnerabilities in the interpreter-based language.

(C/M)Ruby sprintf

An example for a feature with a “shared” code base is the sprintf module (or gem). The implementations started with the same original code, but were maintained separately, by teams with different programming “styles”, and once in a while they share bug fixes with each other. The implementation itself goes like this:

  • Two variables blen and bsiz track the size and capacity of the result string
  • An initial capacity of (bsiz=)120 chars is used for the base result string object
  • CHECK(l) macro is responsible for doubling the capacity of the result string until it is big enough for the wanted new content
  • All chars up to a ‘%’ are directly appended to the result string
  • Specifiers are accumulated into “state” variables
  • Each case has it’s own treatment (%d, %c, %f, …)

I started searching for vulnerabilities in the MRuby library during February, a time point in which the implementation was in a fragile state:

  • Almost all internal variables and calculations uses signed integers
  • The CHECK(l) macro was implemented like this:
#define CHECK(l) do {\
  while (blen + (l) >= bsiz)\
  mrb_str_resize(mrb, result, bsiz);\
  buf = RSTRING_PTR(result);\
} while (0);

This code check, together with signed integers offers many bypass options, including:

  1. blen + (l) < 0
  2. blen + (l) > 2 ** 32 && blen + (l) < 2 ** 32 + bsiz

Even the logic of doubling bsiz is questionable, since it can become negative, causing an endless loop.

Width and Precision

As I demonstrated at the start of the post, a formatted string can use “qualifiers” that will be responsible for the width and the precision of the formatted print. There are several ways to parse them from the formatted string, however the implementation dictates (almost) no limitations on their values!

  • width >= 0 ==> will stay as it is
  • width < 0 ==> converted to positive: width = -1 * width (see previous blog post)
  • prec >= 0 ==> will stay as it is
  • prec  < 0 ==> not supported (error)

And this is where a complicated, multiple cased implementation turns to be an implementation with multiple vulnerabilities. For example, let’s check the simple case of ‘%c’:

if (!(flags & FWIDTH)) {
     memcpy(buf+blen, c, n);
     blen += n;
else if ((flags & FMINUS)) {
     memcpy(buf+blen, c, n);
     blen += n;
     // EI: FILL can resize the string according to MAX_INT - 1
     if (width>0) FILL(' ', width-1);
else {
     // EI: FILL can resize the string according to MAX_INT - 1
     if (width>0) FILL(' ', width-1);
     memcpy(buf+blen, c, n); blen += n;

And the following code will trigger a huge memset(‘ ‘) causing a DoS:

sprintf("abcdefghijklmnopqrstuvwxyz % 2147483640c", "A")

Fix #1 – the CHECK(l) macro

Since there is an obvious Integer-Overflow (IOF) in the CHECK(l) macro used inside the FILL() macro, the suggested fix was:

#define CHECK(l) do {\
  while ((l) >= bsiz - blen)\
  mrb_str_resize(mrb, result, bsiz);\
  buf = RSTRING_PTR(result);\
} while (0);

This follows the invariant that the capacity is always bigger than the length of the result string, thus avoiding an underflow.

However, let’s remember that the implementation uses signed integers in the code. If we can find a place in which there is an IOF prior to the CHECK(l) macro, we will check a negative value, thus bypassing the check. And indeed there are multiple examples for such calls, and they vary between MRuby and CRuby. Several such cases are:

// EI: 1) fractions (%f, %G, ...)
CHECK(need + 1);
// EI: 2) fractions (%f, %G, ...)
if ((flags&FWIDTH) && need < width)
need = width;
need += 20;
// EI: And this is a double vulnerability
n = snprintf(&buf[blen], need, fbuf, fval);
blen += n;

And the 2nd case is the more interesting one. This is a code line that was changed from CRuby to MRuby and did not migrate back. The vulnerabilities here are:

  • A controllable width will cause a bypass of the check macro due to need +=20 overflowing to negative
  • Our format was used to build a new format, and the new format will be used to call the standard library’s snprintf() function (most interpreter languages uses the standard library for the fraction case, because it is complicated)
  • Out format will cause snprintf() to fail, returning -1

The developers mishandled the error cases: instead of n being the length of the formatted fraction, will be an invalid value! This can be used to move blen backwards enabling a controllable formatted write behind the result buffer.

f = 1234567890.12345678
format = "% 2147483628G" * 10 + "!!!!!!!!!!!"
str1 = "1" * 120
unique = sprintf(format, f, f, f, f, f, f, f, f, f, f, f, f, f, f, f, f, f, f, f, f)
print str1

In the next blog post I will describe a step-by-step exploitation of this vulnerability, an exploitation that achieves a complete VM takeover (VM escape) in the MRuby case.

Fix #2 – a massive change

The reported IOFs were fixed, and the calls to snprintf() are now checked for errors. In addition, there was another change to the CHECK(l) macro:

#define CHECK(l) do {\
 if ((l) < 0) mrb_raise(mrb, E_ARGUMENT_ERROR, "invalid specifier"); \
 while ((l) >= bsiz - blen){\
  if (bsiz < 0) mrb_raise(mrb, E_ARGUMENT_ERROR, "too big specifier"); \
 mrb_str_resize(mrb, result, bsiz);\
 buf = RSTRING_PTR(result);\
} while (0); 

The first line of the macro makes it quite robust, in case some IOFs slipped during the fix.

Several weeks after the fix, the first line was removed during a migration from CRuby to MRuby, causing yet again some vulnerabilities due to IOFs that slipped during the previous fixes. The official reason for removing the check was “Better error message”, as the “only” place that needed the check received a unique exception string, making the robust check “unneeded”. Any future change to this module in CRuby or MRuby is highly prone to new vulnerabilities, due to the highly unstable code, and the differences between the two implementations causing assumptions in one library to brake on the other.


Formatting strings is a complicated feature, a feature most people tend to treat as obvious. Despite it’s unusual complexity, caused mainly because of the lack of limitations over the width and precision specifiers, most programming languages chose to re-implement this feature on their own. This means that PHP, Python, CRuby, Mruby, Perl and more have their own variants for this risky code, putting a huge question mark over the hidden assumptions of programmers that interpreted languages are not vulnerable to format string attacks.

In the next blog post we will dive deep into the exploitation of the shown MRuby format string vulnerability, going step-by-step until we achieve a full VM escape.

Author: eyalitkin

White hat security researcher. Recently finished my M.s.c at TAU, and now focus on security research, mainly in open sources.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: