6 minutes
How to find WinMain
Introduction
Knowing how to identify the main function by yourself will avoid wasting time looking at code that most of the time you don’t need to analyze.
Nonetheless, it’s a skill that can come in handy while unpacking malware, or if your static analysis framework of choice fails to correctly identify it for any reason.
In this post I will only talk about how to find WinMain
, the entry point for graphical Windows applications. For console-based applications, I provided a link to another article in the last section.
WinMain definition
According to MSDN, the WinMain
function definition is as follows:
int WinMain(
HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
int nShowCmd
);
After reading the rest of the documentation, we discover the following characteristics:
- the function takes 4 parameters
- the first parameter is a handle to the instance of the current application
- the second parameter is always
NULL
Another important property of WinMain
is that its return value will be passed to the exit function of the program.
Sometimes, you will find that the function is called wWinMain
instead: the only difference is that the third argument’s type is LPWSTR
(pointer to wide string) instead of LPSTR
.
Finding WinMain manually
MSVC compiled binaries
When dealing with executables compiled with the Microsoft C/C++ compiler, the code that calls WinMain
is usually located near (as in call depth) to the entry point.
Most of the time, it’s either called directly by the entry point function, or inside its second function call, and looks similar to this:
In this case, we can clearly see all the typical traits leading to it:
- four arguments being passed to the function
- the image base (which corresponds to the application instance handle) as the first argument
- the second argument being
NULL
, due to thexor esi, esi
at VA0x406829
- the return value of the function being passed to
_exit
The last indicator may not always be immediately visible, in particular if the exit function can’t be identified automatically.
Sometimes, instead of the image base being passed directly as a constant, you will see this:
In the code above, the image base is being calculated at runtime by calling GetModuleHandleA(NULL)
, since this API returns a handle to the calling process when its parameter is NULL
.
The pattern is mostly the same for 64 bit applications, aside from the different calling convention:
If you are using a decompiler, be aware that they do not always get the function prototype right, and you may need to adjust it manually:
MinGW/GCC compiled binaries
For executables that were compiled with GCC on the MinGW environment (like this one), WinMain
is not as easy to find, since its caller looks like a generic wrapper function:
You can still see some other indicators, like a reference to the function __initenv
and the call to _cexit
, but determining the exact location might still be challenging if you’re not used to analyze code generated by MinGW.
Despite all this mess, there’s a trick to make it clearer: since GCC still loads the image base as a constant (at least in the samples I analyzed), you need to look for an instruction that writes the image base somewhere (which is inside sub_401180
this time), follow the cross reference, and you will find your target a few instructions below:
How IDA finds WinMain
FLIRT signatures
One of the strongest features of IDA is the ability to detect known function signatures. Near the end of this article, we discover that IDA uses FLIRT signatures not only to identify known library functions, but also to locate main
and WinMain
:
For the sake of user’s convenience we attempted to recognize the main() function as often as it was possible. The algorithm for identifying this function differs from compiler to compiler and from program to program. (DOS/OS2/Windows/GUI/Console…).
This algorithm is written, as a text string, in a signature file.
You can find the FLIRT signatures that are used for this purposes inside [IDA install directory]/sig/pc
. For PE files, the signature files we are interested in are pe.sig
and pe64.sig
.
Diving deeper
Let’s take this Colibri Loader sample and see how IDA detects WinMain
.
Looking at the function ___tmainCRTStartup
, we will see the following code:
By parsing the signature files (you can find the script I used here), we can find the pattern that was used in this scenario:
6A5868........E8........33F68975FC8D459850FF15........6AFE5F897D:
0. 09 A2B5 017D
0000:o=2:a=104:vc32rtf:l=vc32mfc/vcextra/vc8atl:m=+10D^[_wWinMain@16]~msmfc2u/~@vc32mfc@;
0081: E8
1. 09 A2B5 017E
0000:o=2:a=104:vc32rtf:l=vc32mfc/vcextra/vc8atl:m=+10E^[_WinMain@16]~msmfc2/~@vc32mfc@;
0081: FF
FLIRT matches a signature by checking some properties of the function bytes: first, it does pattern matching on the first few bytes of a function.
We can see the first 32 bytes of the function matched this pattern:
6A 58 push 0x58
68 .. .. .. .. push addr
E8 .. .. .. .. call addr
33 F6 xor esi, esi
89 75 FC mov [ebp-4], esi
8D 45 98 lea eax, [ebp-0x68]
50 push eax
FF 15 .. .. .. .. call [addr]
6A FE push 0xFFFFFFFE
5F pop edi
89 7D mov [ebp-X], edi
If the pattern is found somewhere, it examines some other properties. Since the function present in this sample is wWinMain
, we will check against the first of the two leaves:
- CRC16 of the 9 bytes after the pattern:
0xA2B5
- Function length:
0x17D
bytes - Tail bytes (with offset
0x0081
):0xE8
The field o=2
indicates the OS type (OS_WIN
) and a=104
are the app type flags, which correspond to APP_32_BIT | APP_EXE
(source).
There are more parameters that can be found in other signatures, but I haven’t been able to figure out the meaning behind them.
The long string starting with m=
will tell us that the main function is called _wWinMain@16
, and is located 0x10D
bytes below the start of the function:
In addition to locating the main function, these instructions tells IDA to load the signature library vc32rtf
to identify the rest of the library functions, since this pattern is only found in MSVC applications.
Breaking IDA WinMain recognition
Since the function identification relies on the assumption that the first few bytes of the function should match a specific pattern, violating this assumption is pretty simple, and will make IDA unable to identify WinMain
:
Changing the register eax
into ebx
at VA 0x4070cf
and 0x4070d2
was enough to break the signature recognition algorithm, without disrupting the function execution at runtime.
There are many other ways to deceive this algorithm, which will be explained in a future article. Some of them require extensive and careful modifications to the runtime initialization code, but are able to trick IDA in unexpected ways.
Furthermore, we can also observe that ___tmainCRTStartup
wasn’t recognized, along all the other library functions, due to the fact that the vc32rtf
library was not loaded automatically.
Other resources
- How to find start of actor code (WinMain) in malware: video about how to find WinMain with Ghidra through a more visual approach. The last example shows how to find the start of user code on a malware sample using the MFC library
- Why is the PE entry point not the same as main: article that explains some things about the MSVC console applications runtime initialization and how to find
main