Stealth Import of Windows API

Submitted by alexey on Tue, 10/04/2011 - 11:03

At good old times, memory was an expensive resource and developers had to take care of the size of the programs they create. Imagine how hard they had to work before there were high level languages (like C), before compilers became smart enough to handle all size optimization issues. Speed was also among the concerns, as the hardware was not as fast as it is now. Another headache was the need to interact with the underlying operating system. Or, to be more precise, the need to implement the interfaces (at pre-libc times). Modern operating systems provide a built-in mechanism for that. This mechanism is called API - Application Programming Interface. This mechanism is a blessing and curse in one. On one hand it greatly simplifies the interaction with the OS, on the other hand it just makes your software more vulnerable to hackers and/or malware. In some cases the usage of APIs just gets exaggerated.

Let us check that with a simple example - the well known "Hello World" application in C created with Microsoft Visual C++ 2010 Express. The size of the executable is 27 kilobytes. The image of the executable has 7 sections while it could well be implemented with only two sections (one for code and one for data). 

The import directory looks even more exciting. Well, MSVCRT.dll is unavoidable as it is the C language interface to Windows Operating System. But there are 28 imported APIs from KERNEL32.dll and most of them seem to be placed here by mistake. GetTickCount for example. Do we care about timing when we only want to output a single string and leave? No, we do not.

Anyway, the issue of compiler's heuristic is outside the scope of this article. Let's concentrate on the API functions. In general, it is a great thing that lets us deal with application development without a need to implement every single interaction with Operating System and saves us a lot of time. Good on one hand, but bad on the other. Having the list of API functions for certain software provides a clear understanding of what, and what is even more important, how that software is intended to do. This may be good when you deal with malware research, but not as good when you are trying to protect your legitimate software from being hacked.

Unfortunately, there are thousands of software products that use IsDebuggerPersent API as their only protection mechanism. Isn't this ridiculous? No, it is not. It is rather sad, I'd say. Of course, there are numerous packers/cryptors/protectors out there, but the problem is that the more known your solution is, the more vulnerable it gets. There are some linkers that provide you with import section obfuscation abilities, but again, the problem is that they are known.

One of the possible solutions for this problem is the Stealth Import of APIs. This is a simple, powerful but underestimated technique. There are many developers, most of the developers, I should say, who believe that it is impossible to create and, even more important, launch a Windows executable without imports at all. "You need to import KERNEL32.dll at least!" - they would say. Unfortunately, not all of us are aware of the fact that both NTDLL.dll and KERNEL32.dll are automatically mapped into the process's address space regardless of the executable's import table. It is obvious, that having them loaded in memory, makes it possible to locate any API function and load any library should there be a need for it.  We may not know, but the operating system itself provides us with all the tools we may need for that. 


Get Handle of KERNEL32.DLL or NTDLL.DLL

Normally, if we need to get a handle to certain module that we know is loaded in memory, we simply call the GetModuleHandle API function or LoadLibrary in those cases where the module is not loaded in memory. But this is not the normal situation. We do not have access to those functions (yet). What should we do? The answer is simple - SEH or Structured Exception Handling mechanism (remember we said that the OS provides us with everything?).

All we need to do, is to get the address of the first exception handler in the chain of handlers. This chain is accessible through the first entry in the TIB (Thread Information Block) which is pointed by [FS:0]. This is as simple as

   ;Get the initial exception handler
   mov eax,[fs:0]


We now have the pointer to the last added EXCEPTION_REGISTRATION record and only need to iterate through the rest of the records in order to get to the first record which normally points to either KERNEL32.DLL or NTDLL.DLL (on Windows Vista, 7). The following code does exactly this:

.search_default_handler:
   cmp dword[eax],0xFFFFFFFF
   jz .found_default_handler
   ;go to the previous handler
   mov eax,[eax]
   jmp .search_default_handler 


The last (or to say it right - the first) record would have its prev field equal to -1 and its handler field ([eax+4] in our case) contains the address of the default exception handler located in one of the dlls mentioned above. What's next? 

Things are really easy if we are on Windows XP, as we have an address inside the KERNEL32.DLL and all we have to do is make it page aligned 

   mov eax,[eax+4]
   and eax,0xFFFF0000


then "scroll" the pages towards lower addresses and check each page for 'MZ' signature

.look_for_mz:
   cmp word [eax],'MZ'
   jz .got_mz
   sub eax,0x10000
   jmp .look_for_mz


Once we find the 'MZ' signature, we have the handle to the library. My advice - save it somewhere. The problem is - we still do not know which library this is (KERNEL32.DLL or NTDLL.DLL). In normal situation we would call GetModuleFileName, but, again, we are not in a normal situation. The solution is easy. Having the base address of the module we already have everything we may need. Does offset 0x3C look familiar? If not, then you should probably read this document. At this offset from the base address we have a WORD which is the offset of the PE signature ('PE\0\0') which is followed by the COFF header. We should check it anyway

   mov bx,[eax+0x3C]
   movzx ebx,bx
   add eax,ebx
   mov bx,'PE'
   movzx ebx,bx
   cmp [eax],ebx
   jz .found_pe


There is not much we can do if the zero flag is not set after comparison, which means that the PE signature has not been found. We just need to restore the stack to the state it was in when the process started, zero the eax register and execute ret instruction. This would terminate our process. Basically, this would mean that we have to review our previous code. On the other hand, if zero flag is set, this mean that we have successfully got to the COFF header. 

Now we need to locate the export directory. Its RVA (relative virtual address) and size should appear right after the Optional Header, which, in turn, appears right after the COFF header. We may skip the headers themselves and get straight to the export IMAGE_DATA_DIRECTORY entry

   add eax,0x78


Yes, as simple as that. Now [eax] points to the RVA of the export directory and [eax+4] to its size

typedef struct _IMAGE_DATA_DIRECTORY
{
   DWORD RVA;     //EAX points here
   DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;


The next step is to read the RVA of the export table and add it to the image base address (the handle we obtained earlier)

   mov eax,[eax]
   add eax,[image_base_address]


Conrgatulations! We are finally at the Export Directory Table! 

Right now, we are interested in particular field of this table, namely the "Name RVA" which is located at [eax+0x0C] and points to NULL terminated ASCII string containing the name of this very library. The procedure is almost identical to the previous one

   mov eax,[eax+0x0C]
   add eax,[image_base_address]


We are one step from knowing what library this is. However, we have to implement a simple strcasecmp function. Why strcasecmp instead of strcmp? Just because strcasecmp is case-insensitive and we do not have to guess whether the library name is upper or lower, or even mixed case (like "KERNEL32.dll"). By respectively comparing the library name with strings 'kernel32.dll' and 'ntdll.dll' we identify the library.

We are in KERNEL32.DLL!

If this is the case, then we are lucky as we only have to locate the address of GetProcAddress API by parsing the export table (this deserves a separate article) or we may still use our custom version of GetProcAddress. We are able to obtain addresses of any API that is exported by KERNEL32.DLL as we have its handle. More than that, we are able to load additional libraries by first locating the LoadLibraryA or LoadLibraryW addresses. Basically, we are done.

We are in NTDLL.DLL...

This case is less desired but it may occur if our software runs on Windows Vista and higher. Needless to say that we have no access to GetProcAddress or LoadLibraryA (yet!).  Instead we have LdrLoadDll or LdrGetDllHandle API functions exported by NTDLL.DLL. Here are the prototypes of these functions:

NTSYSAPI NTSTATUS NTAPI LdrLoadDll(
     IN PWCHAR          PathToFile OPTIONAL,
     IN ULONG           Flags OPTIONAL,
     IN PUNICODE_STRING ModuleFileName,
     OUT PHANDLE        ModuleHandle);


Let's skip the optional values as we may safely set them to 0. The first non optional parameter is ModuleFileName, but what does PUNICODE_STRING mean? It is a pointer to a structure, that describes a UNICODE string. This structure may be easily build on stack. Here is its declaration:

typedef struct _LSA_UNICODE_STRING
{
   USHORT Length;
   USHORT MaximumLength;
   PWSTR  Buffer;
} LSA_UNICODE_STRING, *PLSA_UNICODE_STRNIG, UNICODE_STRING, *PUNICODE_STRING;


Length - this field specifies the length, in bytes, of the string pointed by Buffer, not including the terminatinf NULL;
MaximumLength - total size, in bytes, of the memory allocated for Buffer;
Buffer - pointer to a wide-character string (like 'K', 0, 'E', 0, 'R', 0, 'N', 0, 'E', 0, 'L', 0, '3', 0, '2', 0, '.', 0, 'D', 0, 'L', 0, 'L', 0, 0, 0).

The PHANDLE ModuleHandle is the pointer to a location in memory where the function should store the handle to a loaded library.

Now, let's turn to LdrGetDllHandle

NTSYSAPI NTSTATUS NTAPI LdrGetDllHandle(
     IN PWORD           pwPath OPTIONAL,
     IN PVOID           Unused OPTIONAL,
     IN PUNICODE_STRING ModuleFileName,
     OUT PHANDLE        pHModule); 


Let's skip the optional parameters again. Especially the "Unused" one.
ModuleFileName - is a pointer to UNICODE_STRING structure which describes the name of the DLL;
pHModule - a pointer to a location in memory where the function should store the result (the handle of the DLL).

We still have to implement a custom GetProcAddress function in order to retrieve these. The sample code is located at the end of this article. 

Once we have the addresses of these functions, we should first try to get the module handle of the 'KERNEL32.dll' by calling the LdrGetDllHandle and if it fails, we then try to load it with LdrLoadDll. If both functions fail - restore the stack, execute ret and check your code.

Once we have the module handle of the KERNEL32.DLL, we are free to use the API functions it exports (e.g. GetProcAddress, LoadLibrary, etc.).

As you can see, this technique is simple in deed. More than that it allows you to implement additional protection mechanisms like code obfuscation, SEH usage and many more in order to protect one of the most hack-sensitive parts of your software - the import section.

Hope this post was helpful. See you at the next post!

P.S. Custom GetProcAddress function. It is far from being perfect but is enough for what we need.


;This is our custom GetProcAddress
;get_proc_address(HMODULE hModule, PCSTR procName)

if used _get_proc_address
baseAddress=         -4
numberNamePointers=  -8
exportAddressTableVA=-12
namePointerVA=       -16
ordinalTableVA=      -20
ordinalBase=         -24
_get_proc_address:
        push ebp
        mov ebp, esp
        sub esp,24
        push ebx ecx edx esi edi ebx
        mov esi,[ebp+8]                 ;ESI -> base address
        mov ebx,esi                     ;EBX is going to point to export table
        push ebx
        mov bx,[ebx+0x3C]
        movzx ebx,bx
        add ebx,[esp]
        add esp,4
        ;Set variables
        mov [ebp+baseAddress],esi
        add ebx,0x78                 ;now EBX points to the export table directory entry
        mov ebx,[ebx]
        add ebx,[ebp+baseAddress]
        mov eax,[ebx+24]                ;number of name pointers
        dec eax                         ;This is done in order to compare 0 based index
        mov [ebp+numberNamePointers],eax
        mov eax,[ebx+16]
        mov [ebp+ordinalBase],eax       ;ordinal base
        mov eax,[ebx+28]
        add eax,[ebp+baseAddress]
        mov [ebp+exportAddressTableVA],eax       ;VA of address table
        mov eax,[ebx+32]
        add eax,[ebp+baseAddress]
        mov [ebp+namePointerVA],eax     ;VA of name pointers table
        mov eax,[ebx+36]
        add eax,[ebp+baseAddress]
        mov [ebp+ordinalTableVA],eax    ;VA of ordinal table
        ;Reset offset counter
        xor ecx,ecx
    .search_loop:
        push ecx
        shl ecx,2                       ;Offset must be multiple of 4, so we multiply counter by 4
        mov ebx,[ebp+namePointerVA]
        add ebx,ecx                      ;EBX now points to one of the exported API functions name pointer
        mov ebx,[ebx]
        add ebx,[ebp+baseAddress]
        push ebx dword[ebp+12]
        call _strcmp
        test eax,1
        jnz .found_api_name
        pop ecx
        cmp ecx,[ebp+numberNamePointers]
        jz .not_found_a_thing
        inc ecx
        jmp .search_loop
    .found_api_name:
        pop ecx
        ;We now have the offset of the api in ECX register
        mov ebx,[ebp+ordinalTableVA]
        shl ecx,1
        add ebx,ecx                     ;EBX now points to the correct ordinal value
        mov bx,[ebx]
        movzx ebx,bx                    ;EBX contains an offset into the export address table
        shl ebx,2                       ;Multiply it by 4
        mov eax,[ebp+exportAddressTableVA]
        add ebx,eax
        mov eax,[ebx]
        add eax,[ebp+baseAddress]       ;now the EAX register contains the address of the exported function
    .out:
        pop ebx edi esi edx ecx ebx
        mov esp,ebp
        pop ebp
        ret 8
    .not_found_a_thing:
        xor eax,eax
        jmp .out
end if