Summary

File type: Win32 EXE
Size:      276.992 bytes
MD5:       17a062692257bfe27e95a64a9e4e6a65
SHA1:      6580ac9fb2f54376ba1bea40f6339662dd1c76ab
SHA256:    faa18e54f144e8377cc6492f1daabeed658d4c2fbfdbdaf3caec6f5188c95ac9

This sample is a packed version of LokiBot. This packer has been pretty widespread during the years 2020 and 2021, and contains 3 stages before executing the main payload.

A sample using the same packer has been documented by Fumik0, but the third stage described in the article performs thread execution hijacking instead of process hollowing.

Stage 1

The first thing we notice after opening the sample in IDA and looking at WinMain, is the heavy use of WinAPI functions with nonsensical arguments: it might look like API hammering is being used here, but it’s actually all junk code, since before those instruction blocks there is always a comparison with the global variable dwBytes and a conditional jump:

It’s possible to make the control flow easier to follow by hiding the junk code, as shown below:

Proceding with the analysis, we discover that the executable loads kernel32.dll in memory, allocates some memory using GlobalAlloc and changes its memory protection attributes to RWX. It then decrypts the shellcode using TEA (Tiny Encryption Algorithm) and jumps inside it.

To find out the encryption algorithm used it’s possible to use the plugin findcrypt-yara. Moreover, by running the sample under a debugger, we can find that some other costants used commonly by TEA get calculated dynamically:

Stage 2

The second stage shellcode creates a memory structure on the stack and fills it with data that will be used by the next stage. It then decrypts, decompresses and executes the next stage shellcode.

Memory structure

The memory structure on the stack is populated at runtime with some function pointers and other information:

Its layout is outlined below:

offsetmeaning
+0x00unused
+0x04shellcode info pointer
+0x08encrypted shellcode pointer
+0x0CRNG state
+0x10LoadLibraryA pointer
+0x14GetProcAddress pointer
+0x18GlobalAlloc pointer
+0x1CGetLastError pointer
+0x20Sleep pointer
+0x24VirtualAlloc pointer
+0x28CreateToolhelp32Snapshot pointer
+0x2CModule32First pointer
+0x30CloseHandle pointer
+0x34unused
+0x38unused

The second field points to another hardcoded structure located right after the last function of this stage:

offsetvaluemeaning
+0x0057237compressed shellcode size
+0x040x80000002initial seed
+0x081is compressed flag
+0x0C110160original shellcode size

API hashing

To resolve the addresses of LoadLibraryA and GetProcAddress in the main function, API hashing is used. Here’s the string hashing function, reimplemented in python:

from fixedint import UInt32

def hash_string(s):
    n = UInt32(0)

    for ch in s.lower():
        n = (n + ord(ch)) * 2

    return n

Shellcode wrapper

After loading the memory structure (named sc in the following snippet), the malware ensures it’s able to open its own process image correctly before going forward:

#define CURRENT_PROCESS 0

void exec_shellcode_wrapper(sc) {
	HMODULE hModule;
	MODULEENTRY32 *lpme;

	for (SIZE_T i = 0; i < 100; i++) {
		Sleep(100);
		hModule = sc->CreateToolHelp32Snapshot(TH32CS_SNAPMODULE, CURRENT_PROCESS);

		if (hModule != INVALID_HANDLE_VALUE || sc->GetLastError() != ERROR_BAD_LENGTH)
			break;
	}

	if (sc->Module32First(hModule, lpme))
		call exec_shellcode(sc);

	sc->CloseHandle(hModule)
}

Decryption

The decryption algorithm uses the MSVCRT implementation of rand to generate a XOR key for each byte of the encrypted shellcode, as shown in the following reimplementation:

from fixedint import UInt32

def rand(seed):
    seed = UInt32(seed * 0x343fd + 0x269ec3)
    val = (seed >> 16) & 0x7fff
    return val, seed

seed = 0x80000002

for i in range(len(buf)):
    val, seed = rand(seed)
    enc_shellcode[i] ^= val & 0xff

Decompression

The decompression algorithm doesn’t look like any known algorithm, so it is probably custom made.

In this sample, it has a compression ratio of about 50%, which is slightly worse than other common compression algorithms like Gzip, LZMA and deflate in this case, but the decompression function requires only 613 bytes.

Stage 3

This stage performs the following operations:

  1. Get LoadLibraryA and GetProcAddress function pointers by traversing the PEB
  2. Clear SEH chain
  3. Load addresses of VirtualAlloc, VirtualProtect, VirtualFree, GetVersionExA, TerminateProcess, ExitProcess, SetErrorMode
  4. Check if it’s running under a sandbox
  5. Clear FLS callbacks
  6. Self-inject the payload through process hollowing
  7. Resolve the payload imports
  8. Setup exit callback
  9. Jump to the injected payload entry point

Payload info structure

Like in the previous stage, this stage uses a memory structure contaning parameters of the final payload, located right after the last function:

offsetvaluemeaning
+0x004number of sections
+0x010is compressed flag
+0x020x1A000original size
+0x060x1A000memory page size
+0x0A0xA2000size of image
+0x0E0x139DEentry point
+0x120x18ED0import directory RVA
+0x160x64import directory size
+0x1A0resource directory RVA
+0x1E0resource directory size
+0x220unknown
+0x260unknown
+0x2A0unknown
+0x2E0unknown
+0x320unknown
+0x360unknown
+0x3Apayload image

SEH and FLS clearing

After getting the addresses of LoadLibraryA and GetProcAddress the SEH chain gets cleared, leaving only the default exception handler:

There is a function that clears all the FLS callbacks, if the current Windows version supports them:

Anti-sandbox: SetErrorMode

The malware calls SetErrorMode twice in a row: first with the value 1024, and later with the value 0:

The correct behavior of this API is to return the previously set error mode, but some sandboxes will behave incorrectly and return 0 instead of 1024 on the second call, making the malware terminate itself.

A more in-depth explanation can be found here.

Self-injection

The malware changes the permissions of every page of the original process (stage 1, with VA 0x400000) to RWX, zeroes it out and copies the final payload inside it:

LPVOID baseAddress = 0x400000;
LPVOID pi = baseAddress + 0xe16;
LPVOID buf = VirtualAlloc(0, pi->mem_page_size, MEM_COMMIT, PAGE_READWRITE);

// copy/decompress the payload into a temporary buffer
if (pi->is_compressed)
    decompress(pi->payload_image, pi->original_size, buf);

else
    memcpy(buf, pi->payload_image, pi->original_size);

// change the permissions of the original process pages to RWX
VirtualProtect(baseAddress, pi->image_size, PAGE_EXECUTE_READWRITE, &flOldProtect);

// hollow the process and copy the final payload header inside it
memset(baseAddress, 0, pi->image_size);
memcpy(baseAddress, buf, OptionalHeader.SizeOfHeaders);

// copy the payload sections too
for (SIZE_T i = 0; i < pi->section_count; i++) {
    LPVOID virtualAddress = baseAddress+sectionHeaders[i].virtualAddress;
    LPVOID rawAddress = buf+sectionHeaders[i].rawAddress;
    LPVOID rawSize = sectionHeaders[i].rawSize;

    memcpy(virtualAddress, rawAddress, rawSize);
}

// free the temporary buffer
VirtualFree(buf, NULL, MEM_RELEASE);

Exit callback setup

Before jumping to the payload entry point the malware dynamically loads atexit from msvcr100.dll: if the import succeeds, it replaces the placeholder value 0x44444444 on the atexit callback with the address of TerminateProcess:

HMODULE hModule = LoadLibraryA("msvcr100.dll");
LPVOID atexit = GetProcAddress(hModule, "atexit");

if (atexit != NULL) {
    (&callback)+5 = &TerminateProcess;
    atexit(callback);
}

// jump to payload entry point