6 minutes
LokiBot packer analysis
Summary
File type: Win32 EXE
Size: 276.992 bytes
MD5: 17a062692257bfe27e95a64a9e4e6a65
SHA1: 6580ac9fb2f54376ba1bea40f6339662dd1c76ab
SHA256: faa18e54f144e8377cc6492f1daabeed658d4c2fbfdbdaf3caec6f5188c95ac9
This sample is a packed version of LokiBot. This packer has been pretty widespread during the years 2020 and 2021, and contains 3 stages before executing the main payload.
A sample using the same packer has been documented by Fumik0, but the third stage described in the article performs thread execution hijacking instead of process hollowing.
Stage 1
The first thing we notice after opening the sample in IDA and looking at WinMain
, is the heavy use of WinAPI functions with nonsensical arguments: it might look like API hammering is being used here, but it’s actually all junk code, since before those instruction blocks there is always a comparison with the global variable dwBytes
and a conditional jump:
It’s possible to make the control flow easier to follow by hiding the junk code, as shown below:
Proceding with the analysis, we discover that the executable loads kernel32.dll in memory, allocates some memory using GlobalAlloc
and changes its memory protection attributes to RWX. It then decrypts the shellcode using TEA (Tiny Encryption Algorithm) and jumps inside it.
To find out the encryption algorithm used it’s possible to use the plugin findcrypt-yara. Moreover, by running the sample under a debugger, we can find that some other costants used commonly by TEA get calculated dynamically:
Stage 2
The second stage shellcode creates a memory structure on the stack and fills it with data that will be used by the next stage. It then decrypts, decompresses and executes the next stage shellcode.
Memory structure
The memory structure on the stack is populated at runtime with some function pointers and other information:
Its layout is outlined below:
offset | meaning |
---|---|
+0x00 | unused |
+0x04 | shellcode info pointer |
+0x08 | encrypted shellcode pointer |
+0x0C | RNG state |
+0x10 | LoadLibraryA pointer |
+0x14 | GetProcAddress pointer |
+0x18 | GlobalAlloc pointer |
+0x1C | GetLastError pointer |
+0x20 | Sleep pointer |
+0x24 | VirtualAlloc pointer |
+0x28 | CreateToolhelp32Snapshot pointer |
+0x2C | Module32First pointer |
+0x30 | CloseHandle pointer |
+0x34 | unused |
+0x38 | unused |
The second field points to another hardcoded structure located right after the last function of this stage:
offset | value | meaning |
---|---|---|
+0x00 | 57237 | compressed shellcode size |
+0x04 | 0x80000002 | initial seed |
+0x08 | 1 | is compressed flag |
+0x0C | 110160 | original shellcode size |
API hashing
To resolve the addresses of LoadLibraryA
and GetProcAddress
in the main function, API hashing is used. Here’s the string hashing function, reimplemented in python:
from fixedint import UInt32
def hash_string(s):
n = UInt32(0)
for ch in s.lower():
n = (n + ord(ch)) * 2
return n
Shellcode wrapper
After loading the memory structure (named sc
in the following snippet), the malware ensures it’s able to open its own process image correctly before going forward:
#define CURRENT_PROCESS 0
void exec_shellcode_wrapper(sc) {
HMODULE hModule;
MODULEENTRY32 *lpme;
for (SIZE_T i = 0; i < 100; i++) {
Sleep(100);
hModule = sc->CreateToolHelp32Snapshot(TH32CS_SNAPMODULE, CURRENT_PROCESS);
if (hModule != INVALID_HANDLE_VALUE || sc->GetLastError() != ERROR_BAD_LENGTH)
break;
}
if (sc->Module32First(hModule, lpme))
call exec_shellcode(sc);
sc->CloseHandle(hModule)
}
Decryption
The decryption algorithm uses the MSVCRT implementation of rand
to generate a XOR key for each byte of the encrypted shellcode, as shown in the following reimplementation:
from fixedint import UInt32
def rand(seed):
seed = UInt32(seed * 0x343fd + 0x269ec3)
val = (seed >> 16) & 0x7fff
return val, seed
seed = 0x80000002
for i in range(len(buf)):
val, seed = rand(seed)
enc_shellcode[i] ^= val & 0xff
Decompression
The decompression algorithm doesn’t look like any known algorithm, so it is probably custom made.
In this sample, it has a compression ratio of about 50%, which is slightly worse than other common compression algorithms like Gzip, LZMA and deflate in this case, but the decompression function requires only 613 bytes.
Stage 3
This stage performs the following operations:
- Get
LoadLibraryA
andGetProcAddress
function pointers by traversing the PEB - Clear SEH chain
- Load addresses of
VirtualAlloc
,VirtualProtect
,VirtualFree
,GetVersionExA
,TerminateProcess
,ExitProcess
,SetErrorMode
- Check if it’s running under a sandbox
- Clear FLS callbacks
- Self-inject the payload through process hollowing
- Resolve the payload imports
- Setup exit callback
- Jump to the injected payload entry point
Payload info structure
Like in the previous stage, this stage uses a memory structure contaning parameters of the final payload, located right after the last function:
offset | value | meaning |
---|---|---|
+0x00 | 4 | number of sections |
+0x01 | 0 | is compressed flag |
+0x02 | 0x1A000 | original size |
+0x06 | 0x1A000 | memory page size |
+0x0A | 0xA2000 | size of image |
+0x0E | 0x139DE | entry point |
+0x12 | 0x18ED0 | import directory RVA |
+0x16 | 0x64 | import directory size |
+0x1A | 0 | resource directory RVA |
+0x1E | 0 | resource directory size |
+0x22 | 0 | unknown |
+0x26 | 0 | unknown |
+0x2A | 0 | unknown |
+0x2E | 0 | unknown |
+0x32 | 0 | unknown |
+0x36 | 0 | unknown |
+0x3A | payload image |
SEH and FLS clearing
After getting the addresses of LoadLibraryA
and GetProcAddress
the SEH chain gets cleared, leaving only the default exception handler:
There is a function that clears all the FLS callbacks, if the current Windows version supports them:
Anti-sandbox: SetErrorMode
The malware calls SetErrorMode
twice in a row: first with the value 1024, and later with the value 0:
The correct behavior of this API is to return the previously set error mode, but some sandboxes will behave incorrectly and return 0 instead of 1024 on the second call, making the malware terminate itself.
A more in-depth explanation can be found here.
Self-injection
The malware changes the permissions of every page of the original process (stage 1, with VA 0x400000
) to RWX, zeroes it out and copies the final payload inside it:
LPVOID baseAddress = 0x400000;
LPVOID pi = baseAddress + 0xe16;
LPVOID buf = VirtualAlloc(0, pi->mem_page_size, MEM_COMMIT, PAGE_READWRITE);
// copy/decompress the payload into a temporary buffer
if (pi->is_compressed)
decompress(pi->payload_image, pi->original_size, buf);
else
memcpy(buf, pi->payload_image, pi->original_size);
// change the permissions of the original process pages to RWX
VirtualProtect(baseAddress, pi->image_size, PAGE_EXECUTE_READWRITE, &flOldProtect);
// hollow the process and copy the final payload header inside it
memset(baseAddress, 0, pi->image_size);
memcpy(baseAddress, buf, OptionalHeader.SizeOfHeaders);
// copy the payload sections too
for (SIZE_T i = 0; i < pi->section_count; i++) {
LPVOID virtualAddress = baseAddress+sectionHeaders[i].virtualAddress;
LPVOID rawAddress = buf+sectionHeaders[i].rawAddress;
LPVOID rawSize = sectionHeaders[i].rawSize;
memcpy(virtualAddress, rawAddress, rawSize);
}
// free the temporary buffer
VirtualFree(buf, NULL, MEM_RELEASE);
Exit callback setup
Before jumping to the payload entry point the malware dynamically loads atexit
from msvcr100.dll: if the import succeeds, it replaces the placeholder value 0x44444444
on the atexit
callback with the address of TerminateProcess
:
HMODULE hModule = LoadLibraryA("msvcr100.dll");
LPVOID atexit = GetProcAddress(hModule, "atexit");
if (atexit != NULL) {
(&callback)+5 = &TerminateProcess;
atexit(callback);
}
// jump to payload entry point