Jul 19, 2019

Daniel Schacknat

Heap-based AMSI bypass for MS Excel VBA and others

This was originally posted on blogger here.

This blog post describes how to bypass Microsoft’s AMSI (Antimalware Scan Interface) in Excel using VBA (Visual Basic for Applications). In contrast to other bypasses this approach does not use hardcoded offsets or opcodes but identifies crucial data on the heap and modifies it. The idea of an heap-based bypass has been mentioned by other researchers before but at the time of writing this article no public PoC was available. This blog post will provide the reader with some insights into the AMSI implementation and a generic way to bypass it.

Introduction

Since Microsoft rolled out their AMSI implementation many writeups about bypassing the implemented mechanism have been released. Code White regularly conducts Red Team scenarios where phishing plays a great role. Phishing is often related to MS Office, in detail to malicious scripts written in VBA. As per Microsoft AMSI also covers VBA code placed into MS Office documents. This fact motivated some research performed earlier this year. It has been evaluated if and how AMSI can be defeated in an MS Office Excel environment.

In the past several approaches have been published to bypass AMSI. The following links contain information which were used as inspiration or reference:

https://modexp.wordpress.com/2019/06/03/disable-amsi-wldp-dotnet also lists a lot of other writeups and implements a nice data-based approach
https://outflank.nl/blog/2019/04/17/bypassing-amsi-for-vba/ AMSI Bypass for VBA

The first article from the list above also mentions a heap-based approach. Independent of that writeup, Code White’s approach used exactly that idea. During the time of writing this article there was no code publicly available which implements this idea. This was another motivation to write this blog post. Porting the bypass to MS Excel/VBA revealed some nice challenges which were to be solved. The following chapters show the evolution of Code White’s implementation in a chronological way:

Implementing our own AMSI Client in C to have a debugging platform
Understanding how the AMSI API works
Bypassing AMSI in our own client
Porting this approach to VBA
Improving the bypass
Improving the bypass - making it production-ready

Implementing our own AMSI Client

In order to ease debugging we will implement our own small AMSI client in C which triggers a scan on the malicious string ‘amsiutils’. This string gets flagged as evil since some AMSI Bypasses of Matt Graeber used it. Scanning this simple string depicts a simple way to check if AMSI works at all and to verify if our bypass is functional. A ready-to-use AMSI client can be found on sinn3r’s github. This code provided us a good starting point and also contained important hints, e.g. the pre-condition in the Local Group Policies.

We will implement our test client using Microsoft Visual Studio Community 2017. In a first step, we end up with two functions, amsiInit() and amsiScan(), not to be confused with functions exported by amsi.dll. Later we will add another function amsiByPass() which does what its name suggests. See this gist for the final code including the bypass.

Running the program generates the following output:

This means our ‘amsiutils’ is considered as evil. Now we can proceed working on our bypass.

Understanding AMSI Structures

As promised we would like to do a heap-based bypass. But why heap-based?

At first, we have to understand that using the AMSI API requires initializing a so called AMSI Context (HAMSICONTEXT). This context must be initialized using the function AmsiInitialize(). Whenever we want to scan something, e.g. by calling AmsiScanBuffer(), we have to pass our context as first parameter. If the data behind this context is invalid the related AMSI functions will fail. This is what we are after, but let’s talk about that later.

Having a look at HAMSICONTEXT we will see that this type gets resolved by the pre-processor to the following:

So what we got here is a pointer to a struct called ‘HAMSICONTEXT__’. Let’s have a look where this pointer points to by printing the memory address of ‘amsiContext’ in our client. This will allow us to inspect its contents using windbg:

The variable itself is located at address 0x16a144 (note we have 32-bit program here) and its content is 0x16c8b10, that’s where it points to. At address 0x16c8b10 we see some memory starting with the ASCII characters ‘AMSI’ identifying a valid AMSI context. The output below the memory field is derived via ‘!address’ which prints the memory layout of the current process.

There we can see that the address 0x16c8b10 is allocated to a region starting from 0x16c0000 to 0x16df0000 which is identified as Heap. Okay, that means AmsiInitialize() delivers us a pointer to a struct residing on the heap. A deeper look into AmsiInitialize() using IDA delivers some evidence for that:

The function allocates 16 bytes (10h) using the COM-specific API CoTaskMemAlloc(). The latter is intended to be an abstraction layer for the heap. See here and here for details. After allocating the buffer the Magic Word 0x49534D41 is written to the beginning of the block, which is nothing more than our ‘AMSI’ in ASCII.

It is noteworthy to say that an application cannot easily change this behavior. The content of the AMSI context will always be stored on the heap unless really sneaky things are done, like copying the context to somewhere else or implementing your own memory provider. This explains why Microsoft states in their API documentation, that the application is responsible to call AmsiUnitialize() when it is done with AMSI. This is because the client cannot (should not) free that memory and the process of cleaning up is performed by the AMSI library.

Now we have understood that

the AMSI Context is an important data structure
it is always placed on the heap
it always starts with the ASCII characters ‘AMSI’

In case our AMSI context is corrupt, functions like AmsiScanBuffer() will fail with a return value different from zero. But what does corrupt mean, how does AmsiScanBuffer() detect if the context is valid? Let’s check that in IDA:

The function does what we already spoilered in the beginning: The first four bytes of the AMSI Context are compared against the value ‘0x49534D41’. If the comparison fails, the function returns with 0x80070057 which does not equal 0 and tells something went wrong.

Bypassing AMSI in our own AMSI Client

Our heap-based approach assumes several things to finally depict a so-called bypass:

we have already code execution in the context of the AMSI client, e.g. by executing a VBA script
The AMSI client (e.g. Excel) initializes the AMSI context only once and reuses this for every AMSI operation
the AMSI client rates the checked payload in case of a failure of AmsiScanBuffer() as ‘not malicious

The first point is not true for our test client but also not required because it depicts only a test vehicle which we can modify as desired.

Especially the last point is important because we will try to mess up the one and only AMSI context available in the target process. If the failure of AmsiScanBuffer() leads to negative side effects, in worst case the program might crash, the bypass will not work.

So our task is to iterate through the heap of the AMSI client process, look for chunks starting with ‘AMSI’ and mess this block up making all further AMSI operations fail.

Microsoft provides a nice code example which walks through the heap using a couple of functions from kernel32.dll.

Due to the fact that all the required information is present in user space one could do this task by parsing data structures in memory. Doing so would make the use of external functions obsolete but probably blow up our code so we decided to use the functions from the example above.

After cutting the example down to the minimum functionality we need, we end up with a function amsiByPass().

So this code retrieves the heap of the current process, iterates through it and looks at every chunk tagged as ‘busy’. Within these busy chunks we check if the first bytes match our magic pattern ‘AMSI’ and if so overwrite it with some garbage.

The expectation is now that our payload is no longer flagged as malicious but the AmsiScanBuffer() function should return with a failure. Let’s check that:

Okay, that’s exactly what we expected. Are we done? No, not yet as we promised to provide an AMSI bypass for EXCEL/VBA so let’s move on.

Bypassing AMSI in Excel using VBA

Now we will dive into the strange world of VBA. We used Microsoft Excel for Office 365 MSO (16.0.11727.20222) 32-bit for our testing.

After having written a basic POC in C we have to port this POC to VBA. VBA supports importing arbitrary external functions from DLLs so using our Heap APIs should be no problem. As far as we understood VBA, it does not allow pointer arithmetic or direct memory access. This problem can be resolved by importing a function which allows copying data from arbitrary memory locations into VBA variables. A very common function to perform this task is RtlMoveMemory().

After some code fiddling we came up with the following code.

As you can see we put some time measurement around the main loop. The number of rounds the loop may take can be several hundred thousand iterations and in addition with the poor VBA performance we expected a significant amount of time the bypass will take. Time is a crucial thing in a real attack scenario. If our phishing victim opens a malicious Excel sheet it is not acceptable that the embedded script blocks the execution for let’s say more than one or two seconds. At least after 5 seconds of perceived unresponsiveness of the application a human will get impatient and do things like trying to close Excel, which is not what we want.

So let’s see how long the bypass will take. To be honest, we did not expect what was happening. The result was difficult to reproduce but the measured runtime varied from 15 minutes to endless. In some rare cases Excel was closed after some minutes without any further notice. This was probably because it was unresponsive for too long. However, this isn’t really something we can use in a real scenario.

Okay so what went wrong here? Is VBA really that slow? Yes, it is some orders of magnitude slower than our C code, but that does not explain what we experienced. Microsoft gives some internal details on how AMSI is implemented in Excel. It turns out that Excel uses another strategy as e.g. PowerShell does. The latter more or less sends the whole script to AmsiScanBuffer(). Excel implements a little smarter approach which is based on so-called triggers. Microsoft considers pure VBA code to be harmless until the point when imports come into play. That’s exactly what we do - importing functions from external DLLs. Some of these imports are treated as potentially dangerous and their call including all parameters are put into a ring buffer which is sent to AMSI. This gives AV solutions like MS defender the opportunity to check data behind addresses which of course makes sense. Let’s see what data is sent to the AMSI API in our explicit case by breaking on AmsiScanBuffer using windbg:

As we can see the ring buffer contains all functions we imported including their parameter values. Our Windows 10 System has MS Defender installed and activated. So every call to AmsiScanBuffer() will bother our friend MS Defender. AMSI is implemented as In-process COM in the first place. But to finally communicate with other AV solutions it has to transport data out of process and perform a context switch. This can be seen on the next architecture overview provided by MS:

The little green block on the bottom of the figure shows that our process (Excel) indirectly communicates via rpc with Defender. Hmm… Okay so this is done several 100k times, which is just too much and explains the long runtime. To provide some more evidence we repeat the bypass with Defender switched off, which should significantly speed up our bypass. In addition to that we monitor the amount of calls to AmsiScanBuffer() so we can get an impression how often it is called.

The same loop with Defender disabled took something between one and two minutes:

In a separate run we check the amount of calls to AmsiScanBuffer() using windbg:

AmsiScanBuffer() is called 124624 times (0x10000000 - 0xffe1930) which is roughly the amount of iterations our loop did. That’s a lot and underlines our assumptions that AMSI is just called very often. So we understood what is going on, but currently there seems no workaround available to solve our runtime problem.

Giving up now? Not yet…

Improving AMSI Bypass in Excel

As described in the chapter above our current approach is much too slow to be used in a real scenario. So what can we do to improve this situation?

One of the functions we imported is RtlMoveMemory() which is as mentioned earlier used by a lot of malware. Monitoring this function would make a lot of sense and it might be considered as trigger. Let’s verify that by just removing the call to CopyMem (the alias for RtlMoveMemory) and see what happens. This prevents our bypass from working but it might give us some insight.

The runtime is now at 0.8 seconds. Wow okay, this really made a change. It shall be noted that in this configuration we even walk through the whole heap. Due to the missing call to RtlMoveMemory() we will not find our pattern.

After we identified our bottleneck, what can we do? We will have to find an alternative method to access raw memory which is not treated as trigger by Excel. Some random Googling revealed the following function: CryptBinaryToStringA() which is part of crypt32.dll. The latter should be present on most Windows systems, and thus it should be okay to import it.

The function is intended to convert strings from one format to another, but it can also be used to just simply copy bytes from an arbitrary memory position we specify. Cool, that’s exactly what we are after! In order to abuse this function for our purpose we call it like that to read the lpData field from the Process Heap Entry structure:

The input parameter from left to right explained:

phe.lpData is the source we want to copy data from,
ByVal 4 is the length of bytes we want to copy (lpData is 32-bit on our 32-bit Excel)
ByVal 2 means we want to copy raw binary (CRYPT_STRING_BINARY)
ByVal VarPtr(magicWord) is the target we want to copy that memory to (our VBA variable magicWord)
the last parameter (ByVal VarPtr(bytesWritten)) tells us how many bytes were really copied

So let’s replace all occurrences of RtlMoveMemory() with CryptBinaryToStringA() and check again how long our bypass takes. You can find an updated version of the source code right here.

Our loop now takes about four seconds to finish, that is still much but it finished and it told us that it found the pattern we are looking after. Let’s see how many times Excel calls AmsiScanBuffer() now with this version:

Oh my… Excel did not call AmsiScanBuffer() at all. So this means as long as there is no trigger in our code, nothing is sent to AMSI. Or the other way around: As soon as we use one single trigger function, Excel will send all calls to AMSI. Good to know…

This is the first time we can really verify if the bypass works. So let’s look for some code which triggers AMSI from VBA. Iliya Dafchev shows some port of an older AMSI Bypass to VBA which gets flagged by AMSI itself in the first place. Perfect, we will put this code into a function called triggerAMSI() and use it as positive test:

After running it, Excel complains as expected with a warning and just closes our current instance of Excel:

Putting our bypass and our positive test together we get the following function:

Hopes are high, that the message box containing “we survived” gets displayed because we killed AMSI before triggering it.

Great, our bypass seems to work. So let’s put this into our real phishing campaign. Uhm.. just wait, how long did the whole thing take? four seconds? Repeated execution of the bypass even showed runtimes greater than ten seconds. Oh no, this is still too much.

Giving up now? Not yet…

Improving AMSI Bypass in Excel - continued

In the last chapter we improved our AMSI bypass from infinite runtime to ten seconds or below. This seems to be still too much for a real campaign (our opinion). So what can we do to speed up the whole thing one more time?

The loop takes some 100k iterations which is done in C in no time. Defender is completely out of the game. So our current runtime seems to be a pure result of the bad VBA performance. To be fair, these kind of things we are currently trying to do are not a typical VBA task so let’s blame us instead of Excel for doing crazy stuff…

Anyway, what can we do now? Programming C in VBA is not an option, but what about invoking some shellcode? As long as we can import arbitrary functions the execution of shellcode should not be a problem. This code snippet shows an example how to do that within VBA. The next step is converting our VBA code (or more our initial C code) into assembly language, that is, into shellcode.

Everyone who ever wrote some pieces of shellcode and wanted to call functions from DLLs knows that the absolute addresses of these functions are not known during the time the shellcode gets assembled. This means we have to implement a mechanism like GetProcAddress() to look up the addresses of required functions during runtime. How to do this without any library support is well understood and extensively documented, so we will not go into details here. Implementing this part of the shellcode is left as an exercise for the reader.

Of course there are many ready to use code snippets which should do the job, but we decided to implement the shellcode on our own. Why? Because it is fun and self written shellcode should be unlikely to get caught by AV solutions.

The main loop of our AMSI bypass in assembly can be found here.

The structure ShellCodeEnvironment holds some important information like the looked up address of our HeapWalk() and GetProcessHeaps() function. The rest of the loop should be straight forward…

So putting everything together we generate our shellcode, put it into our VBA code and start it from there as new thread. Of course, we measure the runtime again:

This time it is only 0.02 seconds!

We think this result is more than acceptable. The runtime may vary depending on the processor load or the total heap size but it should be significantly below one second which was our initial goal.

Summary

We hope you enjoyed reading this blog post. We showed the feasibility of a heap-based AMSI bypass for VBA. The same approach, with slight adaptions, also works for PowerShell and .Net 4.8. The latter also comes with AMSI support integrated in its Common Language Runtime. As per Microsoft AMSI is not a security boundary so we do not expect that much reaction but are still curious if MS will develop some detection mechanisms for this idea.