LokiBot packer analysis

Summary

File type: Win32 EXE
Size:      276.992 bytes
MD5:       17a062692257bfe27e95a64a9e4e6a65
SHA1:      6580ac9fb2f54376ba1bea40f6339662dd1c76ab
SHA256:    faa18e54f144e8377cc6492f1daabeed658d4c2fbfdbdaf3caec6f5188c95ac9

This sample is a packed version of LokiBot. This packer has been pretty widespread during the years 2020 and 2021, and contains 3 stages before executing the main payload.

A sample using the same packer has been documented by Fumik0, but the third stage described in the article performs thread execution hijacking instead of process hollowing.

Stage 1

The first thing we notice after opening the sample in IDA and looking at WinMain, is the heavy use of WinAPI functions with nonsensical arguments: it might look like API hammering is being used here, but it’s actually all junk code, since before those instruction blocks there is always a comparison with the global variable dwBytes and a conditional jump:

It’s possible to make the control flow easier to follow by hiding the junk code, as shown below:

Proceding with the analysis, we discover that the executable loads kernel32.dll in memory, allocates some memory using GlobalAlloc and changes its memory protection attributes to RWX. It then decrypts the shellcode using TEA (Tiny Encryption Algorithm) and jumps inside it.

To find out the encryption algorithm used it’s possible to use the plugin findcrypt-yara. Moreover, by running the sample under a debugger, we can find that some other costants used commonly by TEA get calculated dynamically:

Stage 2

The second stage shellcode creates a memory structure on the stack and fills it with data that will be used by the next stage. It then decrypts, decompresses and executes the next stage shellcode.

Memory structure

The memory structure on the stack is populated at runtime with some function pointers and other information:

Its layout is outlined below:

offset	meaning
+0x00	unused
+0x04	shellcode info pointer
+0x08	encrypted shellcode pointer
+0x0C	RNG state
+0x10	`LoadLibraryA` pointer
+0x14	`GetProcAddress` pointer
+0x18	`GlobalAlloc` pointer
+0x1C	`GetLastError` pointer
+0x20	`Sleep` pointer
+0x24	`VirtualAlloc` pointer
+0x28	`CreateToolhelp32Snapshot` pointer
+0x2C	`Module32First` pointer
+0x30	`CloseHandle` pointer
+0x34	unused
+0x38	unused

The second field points to another hardcoded structure located right after the last function of this stage:

offset	value	meaning
+0x00	57237	compressed shellcode size
+0x04	0x80000002	initial seed
+0x08	1	is compressed flag
+0x0C	110160	original shellcode size

API hashing

To resolve the addresses of LoadLibraryA and GetProcAddress in the main function, API hashing is used. Here’s the string hashing function, reimplemented in python:

from fixedint import UInt32

def hash_string(s):
    n = UInt32(0)

    for ch in s.lower():
        n = (n + ord(ch)) * 2

    return n

Shellcode wrapper

After loading the memory structure (named sc in the following snippet), the malware ensures it’s able to open its own process image correctly before going forward:

#define CURRENT_PROCESS 0

void exec_shellcode_wrapper(sc) {
	HMODULE hModule;
	MODULEENTRY32 *lpme;

	for (SIZE_T i = 0; i < 100; i++) {
		Sleep(100);
		hModule = sc->CreateToolHelp32Snapshot(TH32CS_SNAPMODULE, CURRENT_PROCESS);

		if (hModule != INVALID_HANDLE_VALUE || sc->GetLastError() != ERROR_BAD_LENGTH)
			break;
	}

	if (sc->Module32First(hModule, lpme))
		call exec_shellcode(sc);

	sc->CloseHandle(hModule)
}

Decryption

The decryption algorithm uses the MSVCRT implementation of rand to generate a XOR key for each byte of the encrypted shellcode, as shown in the following reimplementation:

from fixedint import UInt32

def rand(seed):
    seed = UInt32(seed * 0x343fd + 0x269ec3)
    val = (seed >> 16) & 0x7fff
    return val, seed

seed = 0x80000002

for i in range(len(buf)):
    val, seed = rand(seed)
    enc_shellcode[i] ^= val & 0xff

Decompression

The decompression algorithm doesn’t look like any known algorithm, so it is probably custom made.

In this sample, it has a compression ratio of about 50%, which is slightly worse than other common compression algorithms like Gzip, LZMA and deflate in this case, but the decompression function requires only 613 bytes.

Stage 3

This stage performs the following operations:

Get LoadLibraryA and GetProcAddress function pointers by traversing the PEB
Clear SEH chain
Load addresses of VirtualAlloc, VirtualProtect, VirtualFree, GetVersionExA, TerminateProcess, ExitProcess, SetErrorMode
Check if it’s running under a sandbox
Clear FLS callbacks
Self-inject the payload through process hollowing
Resolve the payload imports
Setup exit callback
Jump to the injected payload entry point

Payload info structure

Like in the previous stage, this stage uses a memory structure contaning parameters of the final payload, located right after the last function:

offset	value	meaning
+0x00	4	number of sections
+0x01	0	is compressed flag
+0x02	0x1A000	original size
+0x06	0x1A000	memory page size
+0x0A	0xA2000	size of image
+0x0E	0x139DE	entry point
+0x12	0x18ED0	import directory RVA
+0x16	0x64	import directory size
+0x1A	0	resource directory RVA
+0x1E	0	resource directory size
+0x22	0	unknown
+0x26	0	unknown
+0x2A	0	unknown
+0x2E	0	unknown
+0x32	0	unknown
+0x36	0	unknown
+0x3A		payload image

SEH and FLS clearing

After getting the addresses of LoadLibraryA and GetProcAddress the SEH chain gets cleared, leaving only the default exception handler:

There is a function that clears all the FLS callbacks, if the current Windows version supports them:

Anti-sandbox: SetErrorMode

The malware calls SetErrorMode twice in a row: first with the value 1024, and later with the value 0:

The correct behavior of this API is to return the previously set error mode, but some sandboxes will behave incorrectly and return 0 instead of 1024 on the second call, making the malware terminate itself.

A more in-depth explanation can be found here.

Self-injection

The malware changes the permissions of every page of the original process (stage 1, with VA 0x400000) to RWX, zeroes it out and copies the final payload inside it:

LPVOID baseAddress = 0x400000;
LPVOID pi = baseAddress + 0xe16;
LPVOID buf = VirtualAlloc(0, pi->mem_page_size, MEM_COMMIT, PAGE_READWRITE);

// copy/decompress the payload into a temporary buffer
if (pi->is_compressed)
    decompress(pi->payload_image, pi->original_size, buf);

else
    memcpy(buf, pi->payload_image, pi->original_size);

// change the permissions of the original process pages to RWX
VirtualProtect(baseAddress, pi->image_size, PAGE_EXECUTE_READWRITE, &flOldProtect);

// hollow the process and copy the final payload header inside it
memset(baseAddress, 0, pi->image_size);
memcpy(baseAddress, buf, OptionalHeader.SizeOfHeaders);

// copy the payload sections too
for (SIZE_T i = 0; i < pi->section_count; i++) {
    LPVOID virtualAddress = baseAddress+sectionHeaders[i].virtualAddress;
    LPVOID rawAddress = buf+sectionHeaders[i].rawAddress;
    LPVOID rawSize = sectionHeaders[i].rawSize;

    memcpy(virtualAddress, rawAddress, rawSize);
}

// free the temporary buffer
VirtualFree(buf, NULL, MEM_RELEASE);

Exit callback setup

Before jumping to the payload entry point the malware dynamically loads atexit from msvcr100.dll: if the import succeeds, it replaces the placeholder value 0x44444444 on the atexit callback with the address of TerminateProcess:

HMODULE hModule = LoadLibraryA("msvcr100.dll");
LPVOID atexit = GetProcAddress(hModule, "atexit");

if (atexit != NULL) {
    (&callback)+5 = &TerminateProcess;
    atexit(callback);
}

// jump to payload entry point