In a recent investigation, Portcullis has had to undertake static analysis of malware, which acting as a driver performed dynamic imports of Windows APIs.
Dynamic imports are always an issue from the malware analyst’s perspective, especially while performing static analysis on various malicious components. We are going to focus on the Windows OS where there are mainly two different methods for achieving dynamic importing of Windows APIs.
Generally speaking programmers would only use dynamic API imports for two key reasons:
This requires the creation of some assembly language code that can be located anywhere inside a binary application or alternatively injected into the process address space of another at runtime and then executed. This is very similar to how shellcode is used during exploitation.
A programmer may choose to use this method of dynamically getting the address of an API on runtime in order to increase the complexity of the code and make its analysis difficult. This is a technique often employed by computer viruses authors.
In modern operating systems, the available physical memory is divided between kernel and user space. Applications run in the less privileged user mode and use other components called ‘drivers’ in order to gain the higher privileges of kernel mode and interact with the hardware.
Driver code can assume complete control over the operating system including memory that belongs to the user mode execution environment, where application code resides.
From the malware analyst’s perspective, analysing a driver is not always a trivial task. Since the driver cannot be loaded and run as a process in user mode, most of the analysis done in drivers dropped by rootkits is static, unless it is possible to perform live kernel debugging against an infected machine, which is not usually the case. The fact that the analysis of a driver is usually limited into static analysis, having to deal also with dynamic imports makes the whole process more challenging and time consuming, which sometimes requires custom solutions in order to get over the dynamic imports problem.
However, these dynamic import techniques are not just used by drivers. They are also commonly used by the user mode components of malicious applications. What makes the difference is that, in that case it is much easier to reveal the ‘hidden’ API calls since we can easily execute these components in user mode and observe which APIs the application is looking for.
i) The first method involves the use of two Windows APIs, the LoadLibrary and the GetProcAddress APIs.
In this case the author uses the first one in order to obtain the base address of the dynamic link library (DLL) while mapping the module inside the process address space, while the second one is used in order to obtain the address of the API that he needs to use.
In order to make this process more difficult, the author might choose to encrypt the DLL and API names and decrypt them on the fly when it is necessary. He might also choose to clean up the decrypted names at the end of the operation in order to hide some of his intended actions.
However, modern compilers and IDEs (Integrated Development Environments) they might also add extra code that uses this same technique during the executable generation process. In addition, in cases where an application uses its own proprietary DLLs, it is a common practice to use these two Windows APIs in order to be able call specific functions located inside that DLL.
The following figure demonstrates an example of code added to an executable created using the MS Visual C++ IDE. It is used in order to display a messagebox notifying the user about potential runtime errors occurred inside the MS Visual C++ Runtime Library.
Having said that, it means that this technique is “as malicious as the code author is”, since its presence cannot guarantee either the fact that an executable is malicious or not.
ii) The second one which is also quite powerful and stealthy against reverse engineering, involves parsing the export table of the module where the author knows the API is located while calculating custom checksums on the names of the exported APIs and compare them against some predefined ones, hardcoded inside its code.
The anti-analysis advantage of this method compared to the previous one is that in this case the address of a specific API can be retrieved without the use of any extra APIs and without having to hardcode the names of the APIs we are looking for inside the code, encrypted or not.
In order to achieve that, we need to know first the imagebase of the module that exports those APIs. This can be achieved by using two main ways, but in both cases we are targeting the export table of the module.
The first one can be through a legitimate looking API already present inside the executable’s IAT (Import Address Table) on runtime.
In this case we only need to extract its address from the IAT, zero out the lower 16bits, and then start subtracting the default memory page size of the OS we are targeting until we reach the start of this module in memory which we call imagebase. There might be a case, depending on the VA (Virtual Address) of the API we used from the IAT, that by zeroing out the lower 16-bit part it is enough to know the image base of the module. However, it is technically safer to first check and then if not found start subtracting the default size of a memory page, which in Windows OS is 1000h (4KBytes). This is also the method that we had to deal with, in our case.
The second one, goes through a few internal Windows OS structures, easily accessible from user mode (Ring 3). In this case the code author retrieves the address of PEB structure (Process Environment Block), which contains a pointer to the LDR_DATA structure, which in turn contains pointers to double-linked lists of LDR_Module structures regarding the modules loaded inside the process address space. Each one of this structures contains information about a specific loaded module, such as the absolute path to it, its image size in memory, its EP (EntryPoint), and of course its imagebase etc.
Going back to the common goal of these two techniques, once the image base it is known it is trivial to retrieve the address of the exports table through its PE Header. More information will be given in the next part where will be discussing about our case study.
However, it is already clear enough that retrieving the address of a specific API by parsing the export table it is not as easy as the first method. This also implies that someone would never really use this method unless he wants to spend time on stuff that otherwise are trivial to accomplish.
Furthermore, since this method is using internal OS structures and it relies on specific characteristics of the PE file format, it suggests that the code written today, might fail in a future version of the OS, which again implies that a legitimate code author would never choose this way of dynamic importing for calling a function located outside the caller’s module.
Having said that, this method of dynamically retrieving the address of Windows APIs can be considered dodgy enough to motivate us into digging deeper inside an executable module.
Please keep a look out for our next article (part 2), which will discuss how to perform analysis of actual piece of malware and show how the malware author attempts to find the base address of the Windows Kernel module.