At the end of 2016, while checking for updates in Microsoft’s bounty program, I saw a reference to a new defense mechanism called “Return Flow Guard” (RFG). Since at that time I just finished the work on Liberation Guard, I took the time to check if can bypass this new protection method. This post will describe my attack on Microsoft’s Return Flow Guard, an attack that achieves full bypass of the protection method.
Return Flow Guard 101
The first step on my research was to gather as much information about the new defense mechanism. I quickly came upon this excellent technical review, by Tencent Xuanwu Lab. As I found out, RFG was a software implementation of a Intel’s stack hardware protection. In short, by using a duplicate stack (sometimes called “Shadow Stack”), a special hook at the end of each function, will compare the return address of the user with it’s shadow copy, aiming to prevent attacks on the user’s stack. This figure illustrates Microsoft’s software implementation of the shadow stack:
Attack Plan #1 – Hijack RSP
What first caught my eye in Microsoft’s design, is the fact that RSP, the stack register, actually points to both stacks, as can be seen in the above figure. On 32 bit executables it is common that the function’s prologue stores EBP to the stack, and assigns EBP = ESP, while the function’s epilogue performs ESP=EBP, and restores EBP from the stack. This means that a Buffer-Overflow on the stack can change the stored EBP, which in turn will effect the ESP of the caller function.
Unfortunately, on 64 bit executables this pattern is rare. The compiler prefers to directly use SS:[RSP + X] to access stack variables, while the function’s epilogue simply adds the frame size back to RSP. Without the use of the “base pointer”, it will be much harder to “hijack” RSP. We will need a better plan.
Attack Plan #2 – Controlled Pair
A major flaw in Microsoft’s software implementation is the fact that the shadow stack resides in the user’s address space. The user has RW permissions on the shadow stack’s pages, and this leads us to our new attack plan:
If we could find a “Write-What-Where-Double” primitive (or a “Controlled Pair” in short), we could be able to simultaneously override both return addresses, thus bypassing the validation check in the function’s epilogue.
We will need to achieve 3 goals for this attack plan:
- Locate the address of our stack
- Locate the address of the shadow stack
- Gain a Controlled Pair primitive
Cartography – Locate the Shadow Stack
Microsoft took special measures so that there won’t be any pointer to the shadow stack from the user’s address space. In my next post I’ll show how to easily overcome this obstacle, and efficiently and reliably locate the hidden shadow stack.
Cartography – Locate our thread’s stack
While there are several techniques for locating our thread’s stack, I tried a different approach this time. During my searches through MSDN, I found this extremely useful function: GetCurrentThreadStackLimits:
Retrieves the boundaries of the stack that was allocated by the system for the current thread.
Instead of manually searching the memory of our target for a stack pointer, we could simply call this function and get it for free. An additional bonus is that this new technique will work for every target process, offering a generic solution instead of a dedicated recon step for every target.
Gain a Controlled Pair primitive
While testing the new API function, I found out that it returns two results:
- The stack’s base address
- The stack’s top address
And since it needs to return two results, it has the following signature:
VOID WINAPI GetCurrentThreadStackLimits(
_Out_ PULONG_PTR LowLimit,
_Out_ PULONG_PTR HighLimit
And by sheer luck we found a candidate function for our controlled pair primitive. We now have to check several key issues:
- How does the function calculates the returned values?
- Can we control these values?
A fast debugging check shown that the values are simply taken from the thread’s context (TEB):
more information about the thread’s context can be found here.
This answers our first question. We can now rephrase our second question:
- Is the thread’s context writable?
- Can we locate our thread’s context?
The answer to the first question is: Yes. Now we only need to find the context in the address space.
Locate the TEB
My first step was to find the address using a debugger, and now I only needed to find it somewhere in the memory. After several random walks in the memory, focusing mainly on the heap, I decided to take a look on the stack. Surprisingly a pointer to the TEB can be found 3 times in a row on the stack, also supplying us a clear indication for a “hit”. Here is a screen shot from iexplorer.exe:
After a short investigation I found out the cause for this useful memory pattern. Most user processes, at least the ones that attackers are usually interested in, use windows API to create and use a pool of worker threads. As part of the creation of the the worker threads, the pointer to their TEB is stored 3 times to the stack, probably as part of some struct that is stored in the stack.
Wrapping it all together
Assuming we know the address of the shadow stack (will be presented in my next post), we can bypass RFG following these steps:
- Call GetCurrentThreadStackLimits() to find the stack’s base and the stack’s top
- Traverse the stack from it’s base, until the same address will be found 3 times
- Compare the TEB values to those returned in step #1
- Update both TEB values to the desired return address
- Call again GetCurrentThreadStackLimits() with two arguments:
- The address of the current return value on the thread’s stack
- The address of the matching return value on the shadow stack
The last step will give us the desired “Write-What-Where-Double” primitive, thus hijacking both addressees simultaneously.
After I finished my research, I waited for an official Windows 10 version (Creators Update), hoping it will include Return Flow Guard. Unfortunately, in Jan 31, 2017, shortly after it was announced, Microsoft updated the bug bounty page and excluded RFG from the program. They later added that their Red Team found a flaw in the mechanism, and that Microsoft chose to wait to Intel’s hardware implementation of the Shadow Stack.
In this post I presented several attack scenario’s against Microsoft experimental Return Flow Guard protection mechanism. While the first attack scenario might work in some cases, the second plan presents a full exploit using a “Controlled Pair” primitive, bypassing RFG’s validation check. Although RFG was discontinued, I believe that the GetCurrentThreadStackLimits() API function will be useful in other scenarios, both for the recon phase and the actual exploitation phase.
4 thoughts on “Bypassing Return Flow Guard (RFG)”
Great article! Thanks for sharing.
Question – why do we need to use “GetCurrentThreadStackLimits()” in the last step? why can’t we just overwrite the values?
We can overwrite the values directly, however since our write primitive will probably be a result of a function call, we won’t be able to overwrite BOTH values in the same function. Since RFG checks the values on each RET, we need the “controlled pair” primitive to successfully overwrite both values in a single function call.
Thanks for the answer!
I get the “controlled pair” issue, but I think I don’t get something more basic… we have the memory address of the “current return value on the thread’s stack”, why can’t we just do something like DOWRD* addr = current_return_value_on_threads_stack; *addr = new_return_addr; (and the same for the second address..) ? Why do we need to overwrite the address with the help of a function call?