Code Obfuscation
Obfuscation, in simple words, is a process to make something difficult to understand. Said "something" can be anything. Code Obfuscation is the practice of intentionally making source code more difficult to understand and reverse engineer, while still maintaining its functionality. The goal of obfuscation is to make it harder for attackers to identify and exploit vulnerabilities in the code, or to steal intellectual property.
Obfuscation is commonly used in software that requires strong security, such as financial or military applications, as well as in malware and other malicious software. However, it can also be used in legitimate software to protect against reverse engineering and intellectual property theft.
Despite its benefits, obfuscation is NOT a foolproof solution and can sometimes introduce its own set of problems. It can make debugging and maintenance more difficult, increase code size and complexity, and even introduce new vulnerabilities if not done carefully. Generally, you should maintain 2 versions of your source code, an obfuscated one and an unobfuscated version.
As a malware developer, code obfuscation is incredibly useful because it makes it much more difficult for security researchers and anti-malware software to analyze and detect malicious code. Let's go over some techniques for obfuscating code:
Obfuscation by Renaming
Consider the following piece of C++
code:
We can obfuscate this by renaming the variables and such to random garbage, like so:
See the difference? If you had not looked at the former program, you'd be completely clueless as to what this code is supposed to be doing. But this is not enough, since people can still guess and figure out what the code might aim to do. Thus, come in more obfuscation techniques....
Data Obfuscation
This technique targets the data structures used in the code so that the hacker is unable to lay hands on the actual intent of the program. This may involve altering the way data is stored through the program in memory and how the stored data is interpreted for displaying the final output. This can be done using one (or a combination) of the following:
Aggregation Obfuscation: This alters the way data is stored in the program. For example, arrays could be broken down into many sub-arrays, which could then be referenced at different places in the program.
Storage Obfuscation: This changes the very manner in which data is stored in memory. For example, developers can shuffle between local to global storage of variables, so that the real nature of variable behaviour is obfuscated.
Ordering Obfuscation: This method reorders how data is ordered by not altering the behaviour of the program/code snippet. This may be done by developing a separate module which is called for all instances of the variable reference.
String Encryption: This method encrypts all readable strings and hence results in unreadable code. These need to be decrypted at runtime when the program is executed.
Control Flow Obfuscation: This method alters the order or structure of a program's code to make it harder for someone to analyze or understand how it works. It's like a maze where someone has to navigate through the code to find out what it does. A simple way to obfuscate code flow is to use conditional statements that randomly redirect the program's flow. Essentially, this means adding a shit ton of dead code.
Debug Obfuscation
Debug information often comes in handy in knowing critical information about program flow, and flaws in the program through decompiling, and recompiling source code. It is important to mask such identifiable information by changing their identifiers, and line numbers, or stopping the access to debug information altogether. Here are some methods to achieve this:
Stripping debug symbols: When a program is compiled, the compiler generates symbols that can be used to debug the code. Debug obfuscation can involve stripping these symbols from the binary, making it difficult for someone to use a debugger to analyze the program. For example, you may use the
-s
flag withgcc
org++
to strip debug symbols when compiling code.Code optimization: Code optimization can be used to change the structure of the code in a way that makes it harder to understand. For example, an optimizer might change the order of instructions or use a more complex control flow structure.
Anti-debugging techniques: These are techniques that are specifically designed to make it difficult to debug a program. Examples include checking for debuggers, obfuscating debug information, or using encryption to hide sensitive data.
Address Obfuscation
Address obfuscation is a technique to hide the real addresses of data or functions within a program. One way to achieve address obfuscation is by using pointers. For example, instead of using a function's real address directly, we can create a pointer to the function and manipulate its address. Here's an example:
In this example, we create a pointer to the foo
function and then obfuscate its address by adding 1 to each byte of the pointer's memory. This makes it difficult to find the real address of the foo
function.
Another way to achieve address obfuscation is by using function pointers within an array. Here's an example:
The base concept remains the same though.
Custom Encoding
In custom encoding, specific characters or sequences of characters are replaced with alternative characters or sequences. This can be done in many different ways, such as using a lookup table or applying mathematical operations to the original values. Here's an example that obfuscates the classic hello, world
program:
We've assigned the decimal value 111 (the ASCII code for "o") to the variable o
, and used it in place of all occurrences of the letter "o". This makes the code harder to read and understand, especially if the encoding is more complex and harder to discern at a glance.
Passing Arguments at Runtime
The program can be changed to expect arguments at runtime. This requires the user to have both the code as well as the decryption key to decrypt the variables.
References
Last updated