Introduction

Knowing how to identify the main function by yourself will avoid wasting time looking at code that most of the time you don’t need to analyze.

Nonetheless, it’s a skill that can come in handy while unpacking malware, or if your static analysis framework of choice fails to correctly identify it for any reason.

In this post I will only talk about how to find WinMain, the entry point for graphical Windows applications. For console-based applications, I provided a link to another article in the last section.

WinMain definition

According to MSDN, the WinMain function definition is as follows:

int WinMain(
    HINSTANCE hInstance,
    HINSTANCE hPrevInstance,
    LPSTR     lpCmdLine,
    int       nShowCmd
);

After reading the rest of the documentation, we discover the following characteristics:

  • the function takes 4 parameters
  • the first parameter is a handle to the instance of the current application
  • the second parameter is always NULL

Another important property of WinMain is that its return value will be passed to the exit function of the program.

Sometimes, you will find that the function is called wWinMain instead: the only difference is that the third argument’s type is LPWSTR (pointer to wide string) instead of LPSTR.

Finding WinMain manually

MSVC compiled binaries

When dealing with executables compiled with the Microsoft C/C++ compiler, the code that calls WinMain is usually located near (as in call depth) to the entry point.

Most of the time, it’s either called directly by the entry point function, or inside its second function call, and looks similar to this:

In this case, we can clearly see all the typical traits leading to it:

  • four arguments being passed to the function
  • the image base (which corresponds to the application instance handle) as the first argument
  • the second argument being NULL, due to the xor esi, esi at VA 0x406829
  • the return value of the function being passed to _exit

The last indicator may not always be immediately visible, in particular if the exit function can’t be identified automatically.

Sometimes, instead of the image base being passed directly as a constant, you will see this:

In the code above, the image base is being calculated at runtime by calling GetModuleHandleA(NULL), since this API returns a handle to the calling process when its parameter is NULL.

The pattern is mostly the same for 64 bit applications, aside from the different calling convention:

If you are using a decompiler, be aware that they do not always get the function prototype right, and you may need to adjust it manually:

MinGW/GCC compiled binaries

For executables that were compiled with GCC on the MinGW environment (like this one), WinMain is not as easy to find, since its caller looks like a generic wrapper function:

You can still see some other indicators, like a reference to the function __initenv and the call to _cexit, but determining the exact location might still be challenging if you’re not used to analyze code generated by MinGW.

Despite all this mess, there’s a trick to make it clearer: since GCC still loads the image base as a constant (at least in the samples I analyzed), you need to look for an instruction that writes the image base somewhere (which is inside sub_401180 this time), follow the cross reference, and you will find your target a few instructions below:

How IDA finds WinMain

FLIRT signatures

One of the strongest features of IDA is the ability to detect known function signatures. Near the end of this article, we discover that IDA uses FLIRT signatures not only to identify known library functions, but also to locate main and WinMain:

For the sake of user’s convenience we attempted to recognize the main() function as often as it was possible. The algorithm for identifying this function differs from compiler to compiler and from program to program. (DOS/OS2/Windows/GUI/Console…).

This algorithm is written, as a text string, in a signature file.

You can find the FLIRT signatures that are used for this purposes inside [IDA install directory]/sig/pc. For PE files, the signature files we are interested in are pe.sig and pe64.sig.

Diving deeper

Let’s take this Colibri Loader sample and see how IDA detects WinMain.

Looking at the function ___tmainCRTStartup, we will see the following code:

By parsing the signature files (you can find the script I used here), we can find the pattern that was used in this scenario:

6A5868........E8........33F68975FC8D459850FF15........6AFE5F897D:
0. 09 A2B5 017D
   0000:o=2:a=104:vc32rtf:l=vc32mfc/vcextra/vc8atl:m=+10D^[_wWinMain@16]~msmfc2u/~@vc32mfc@;
   0081: E8
1. 09 A2B5 017E
   0000:o=2:a=104:vc32rtf:l=vc32mfc/vcextra/vc8atl:m=+10E^[_WinMain@16]~msmfc2/~@vc32mfc@;
   0081: FF

FLIRT matches a signature by checking some properties of the function bytes: first, it does pattern matching on the first few bytes of a function.

We can see the first 32 bytes of the function matched this pattern:

6A 58               push 0x58
68 .. .. .. ..      push addr
E8 .. .. .. ..      call addr
33 F6               xor esi, esi
89 75 FC            mov [ebp-4], esi
8D 45 98            lea eax, [ebp-0x68]
50                  push eax
FF 15 .. .. .. ..   call [addr]
6A FE               push 0xFFFFFFFE
5F                  pop edi
89 7D               mov [ebp-X], edi

If the pattern is found somewhere, it examines some other properties. Since the function present in this sample is wWinMain, we will check against the first of the two leaves:

  • CRC16 of the 9 bytes after the pattern: 0xA2B5
  • Function length: 0x17D bytes
  • Tail bytes (with offset 0x0081): 0xE8

The field o=2 indicates the OS type (OS_WIN) and a=104 are the app type flags, which correspond to APP_32_BIT | APP_EXE (source).

There are more parameters that can be found in other signatures, but I haven’t been able to figure out the meaning behind them.

The long string starting with m= will tell us that the main function is called _wWinMain@16, and is located 0x10D bytes below the start of the function:

In addition to locating the main function, these instructions tells IDA to load the signature library vc32rtf to identify the rest of the library functions, since this pattern is only found in MSVC applications.

Breaking IDA WinMain recognition

Since the function identification relies on the assumption that the first few bytes of the function should match a specific pattern, violating this assumption is pretty simple, and will make IDA unable to identify WinMain:

Changing the register eax into ebx at VA 0x4070cf and 0x4070d2 was enough to break the signature recognition algorithm, without disrupting the function execution at runtime.

There are many other ways to deceive this algorithm, which will be explained in a future article. Some of them require extensive and careful modifications to the runtime initialization code, but are able to trick IDA in unexpected ways.

Furthermore, we can also observe that ___tmainCRTStartup wasn’t recognized, along all the other library functions, due to the fact that the vc32rtf library was not loaded automatically.

Other resources