The C/C++ programing language seems simple and quite straight forward to most common/embedded developers. Unfortunately, most of the programmers lack knowledge of the C standard, resulting in many security vulnerabilities that can be found in those dark shadows of the code. This post will try to introduce a small part of the integer overflow world, and specifically it’s sometimes undefined behavior.
Integer Overflow, or IOF in short, generally refers to integer calculations that passed their types’ boundaries from above (overflow) or from below (underflow).
Integer Underflow example:
unsigned short length = packet.len, pos = 0; if ( length > 1500 ) { /* error handling */ } // skipping the reserved field (unused) length -= 4; // Watch: length can be >= 65532 pos += 4;
Integer Overflow example:
unsigned int start, length, end; // some init code ... // perform a bounds check if ( start + length > end ) { /* error handling */ }
More examples can be found in the Common Weakness Enumartion (CWE) page at MITRE.
So we have seen a glimpse of IOF, and it seems that overflowing/underflowing will simply behave based on the 2’s complement memory representation of the variables. However, this is not always the case. As explained in great detail in here, the C/C++ standard defines several classes of code, one of them is the “Undefined Behavior” class:
undefined behavior – there are no restrictions on the behavior of the program. … Compilers are not required to diagnose undefined behavior … and the compiled program is not required to do anything meaningful.
And indeed, signed integer overflow is declared undefined behavior (UB). Back down in 2007 this caused a wide discussion in regards to GCC’s optimizations (reading the discussion is highly recommended). While there is the famous example of the compiler optimizing out some intended overflow checks, as the discussion clearly states, it doesn’t mean that the compiler will always optimize out all of your program once some UB is found.
Since there are some popular open source libraries that were written without this UB notion in mind, Ruby for instance, I recommend that security researchers will have some deep knowledge of the actual behavior of modern compilers in such cases. This will help in determining how the code will actually behave, and what vulnerabilities might be in it. In all of the next cases I used the very useful compiler explorer, quite a handy tool to be found in the arsenal of a researcher.
Clang x86-64 3.9.0, optimization: -O2
As can be seen, the behavior of clang is:
- The check changes and since 100 > 0, all of the check get optimized out.
- The check changes to be if (b < 0)
- The check remains unchanged
GCC x86-64 6.2, optimization: -O2
As can be seen, the behavior of gcc is different:
- The check changes and since 100 > 0, all of the check get optimized out.
- The check remains unchanged
- The check remains unchanged
Conclusion
We have seen a difference in the way gcc and clang treats undefined behavior of signed integer overflows. Although none of the compilers exploded or optimized out all of the code, clang’s behavior can’t be regarded as predictable to the common developer. To avoid such risky edge cases, resulting from lack of awareness on behalf of the programmers, major open source libraries tend to use counter-measure compilation flags, such as:
- -fwrapv – this is the case for Python
- -fno-strict-overflow – this is the case for Ruby
Here is an snippet from Ruby explaining why the flag saved them from a bug originating from an undefined behavior when compiled using clang.
range_step() at range.c:
while (i < end) { rb_yield(LONG2NUM(i)); if (i + unit < i) break; i += unit; }
This code handles the step functionality, when:
- long unit = step > 0
- long i = range start
- long end = range end
Since all variables are signed longs, the loop suffers from the 2nd case of the IOF we saw earlier. This means that when compiler using clang (if no flags where used) the loop will act like this:
while (i < end) { rb_yield(LONG2NUM(i)); if (unit < 0) break; i += unit; }
And this can lead to an endless loop (Denial Of Service) if used with the following args:
- unit = LONG_MAX / 4
- i = range start = 0
- end = range end = LONG_MAX / 4 + epsilon (>0)
The loop will endlessly use the values: 0, L/4, -L/2, -3L/4, 0, …
Last Tip: here is a useful (very basic) regex I’ve used to search for IOF code:
([a-z_A-Z]+)( )?\+( )?([a-z_A-Z]+)( )?[<>](=)?( )?(\4|\1)
And this is a useful example why knowing about integer overflow undefined behavior, and about the way compilers treat them, can be handy for the security researcher.