Code Obfuscation

Obfuscation, in simple words, is a process to make something difficult to understand. Said "something" can be anything. Code Obfuscation is the practice of intentionally making source code more difficult to understand and reverse engineer, while still maintaining its functionality. The goal of obfuscation is to make it harder for attackers to identify and exploit vulnerabilities in the code, or to steal intellectual property.

Obfuscation is commonly used in software that requires strong security, such as financial or military applications, as well as in malware and other malicious software. However, it can also be used in legitimate software to protect against reverse engineering and intellectual property theft.

Despite its benefits, obfuscation is NOT a foolproof solution and can sometimes introduce its own set of problems. It can make debugging and maintenance more difficult, increase code size and complexity, and even introduce new vulnerabilities if not done carefully. Generally, you should maintain 2 versions of your source code, an obfuscated one and an unobfuscated version.

As a malware developer, code obfuscation is incredibly useful because it makes it much more difficult for security researchers and anti-malware software to analyze and detect malicious code. Let's go over some techniques for obfuscating code:

Obfuscation by Renaming

Consider the following piece of C++ code:

#include <iostream>

class human {
public:
	int height;
	int weight;
	int age;
	int get_height() const {
		return height;
	}
	int get_weight() const {
		return weight;
	}
	int get_age() const {
		return age;
	}
	bool is_adult() {
		return age >= 18;
	}
	
};

int main() {
	human john;
	john.height = 180;
	john.weight = 220;
	std::cout << john.get_height() << std::endl;
	std::cout << john.get_weight() << std::endl;
	return 0;
}

We can obfuscate this by renaming the variables and such to random garbage, like so:

#include <iostream>

class siiinodinOIJOA {
public:
        int aiunUNu298hssa;
        int duu32ifniuNNAA;
        int x2hbiYSijnqokn() const {
                return aiunUNu298hssa;
        }
        int dn2989aAINOS28() const {
                return duu32ifniuNNAA;
        }
        ...
        ...
};

int main() {
        siiinodinOIJOA iaiuqLLLskaji1d8h98;
        iaiuqLLLskaji1d8h98.aiunUNu298hssa = 180;
        iaiuqLLLskaji1d8h98.duu32ifniuNNAA = 220;
        std::cout << iaiuqLLLskaji1d8h98.x2hbiYSijnqokn() << std::endl;
        std::cout << iaiuqLLLskaji1d8h98.dn2989aAINOS28() << std::endl;
        return 0;
}

See the difference? If you had not looked at the former program, you'd be completely clueless as to what this code is supposed to be doing. But this is not enough, since people can still guess and figure out what the code might aim to do. Thus, come in more obfuscation techniques....

Data Obfuscation

This technique targets the data structures used in the code so that the hacker is unable to lay hands on the actual intent of the program. This may involve altering the way data is stored through the program in memory and how the stored data is interpreted for displaying the final output. This can be done using one (or a combination) of the following:

  • Aggregation Obfuscation: This alters the way data is stored in the program. For example, arrays could be broken down into many sub-arrays, which could then be referenced at different places in the program.

// Aggregation Obfuscation:
#include <iostream>

class height {
public:
    int h;
    int get_height() const { return h; }
};

class weight {
public:
    int w;
    int get_weight() const { return w; }
};

class human {
public:
    height h;
    weight w;
    int age;
    int get_age() const { return age; }
    bool is_adult() { return age >= 18; }
};

int main() {
    human john;
    john.h.h = 180;
    john.w.w = 220;
    std::cout << john.h.get_height() << std::endl;
    std::cout << john.w.get_weight() << std::endl;
    return 0;
}
  • Storage Obfuscation: This changes the very manner in which data is stored in memory. For example, developers can shuffle between local to global storage of variables, so that the real nature of variable behaviour is obfuscated.

// Storage obfuscation
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

class human { ... };

int main() {
    human john;
    std::ifstream file("data.txt");
    if (file.is_open()) {
        std::string line;
        std::getline(file, line);
        char* data = new char[line.length() + 1];
        std::strcpy(data, line.c_str());
        char* height_str = std::strtok(data, ",");
        char* weight_str = std::strtok(NULL, ",");
        char* age_str = std::strtok(NULL, ",");
        john.height = std::stoi(height_str);
        john.weight = std::stoi(weight_str);
        john.age = std::stoi(age_str);
        std::cout << john.get_height() << std::endl;
        std::cout << john.get_weight() << std::endl;
        delete[] data;
        file.close();
    } else {
        std::cout << "Error: could not open file" << std::endl;
    }
    return 0;
}

Ordering Obfuscation: This method reorders how data is ordered by not altering the behaviour of the program/code snippet. This may be done by developing a separate module which is called for all instances of the variable reference.

// Original:
#include <stdio.h>

int main() {
  int x = 10, y = 5, z = 3;
  int result = x + y * z;
  printf("The result is: %d\n", result);
  return 0;
}

// Obfuscated:
#include <stdio.h>

int main() {
  int y = 5, x = 10, z = 3;
  int result = y * z + x;
  printf("%d\n", result);
  return 0;
}
  • String Encryption: This method encrypts all readable strings and hence results in unreadable code. These need to be decrypted at runtime when the program is executed.

// String Obfuscation
#include <iostream>
#include <string>

class human {
public:
    std::string _0x68;
    std::string _0x77;
    std::string _0x61;
    int height;
    int weight;
    int age;
    int get_height() const { return height; }
    int get_weight() const { return weight; }
    int get_age() const { return age; }
    bool is_adult() { return age >= 18; }
};

int main() {
    human john;
    john._0x68 = "height";
    john._0x77 = "weight";
    john._0x61 = "age";
    john.height = 180;
    john.weight = 220;
    std::cout << john._0x68 << ": " << john.get_height() << std::endl;
    std::cout << john._0x77 << ": " << john.get_weight() << std::endl;
    return 0;
}
  • Control Flow Obfuscation: This method alters the order or structure of a program's code to make it harder for someone to analyze or understand how it works. It's like a maze where someone has to navigate through the code to find out what it does. A simple way to obfuscate code flow is to use conditional statements that randomly redirect the program's flow. Essentially, this means adding a shit ton of dead code.

#include <iostream>
#include <thread>

class human { ... };

void foo(int n) {
    human john;
    john.height = n;
    std::this_thread::sleep_for(std::chrono::seconds(5)); // random sleeping
    john.weight = n;
    john.age = n;
    if (john.is_adult()) {
        std::this_thread::sleep_for(std::chrono::seconds(5));
        std::cout << "John is an adult!" << std::endl;
    }
}

int main() {
    foo(20); // garbage function calls to make the execution jump around a bit
             // you can also use the infamous goto statement to make it better :D
    std::this_thread::sleep_for(std::chrono::seconds(5));
    return 0;
}

Debug Obfuscation

Debug information often comes in handy in knowing critical information about program flow, and flaws in the program through decompiling, and recompiling source code. It is important to mask such identifiable information by changing their identifiers, and line numbers, or stopping the access to debug information altogether. Here are some methods to achieve this:

  1. Stripping debug symbols: When a program is compiled, the compiler generates symbols that can be used to debug the code. Debug obfuscation can involve stripping these symbols from the binary, making it difficult for someone to use a debugger to analyze the program. For example, you may use the -s flag with gcc or g++ to strip debug symbols when compiling code.

  2. Code optimization: Code optimization can be used to change the structure of the code in a way that makes it harder to understand. For example, an optimizer might change the order of instructions or use a more complex control flow structure.

  3. Anti-debugging techniques: These are techniques that are specifically designed to make it difficult to debug a program. Examples include checking for debuggers, obfuscating debug information, or using encryption to hide sensitive data.

Address Obfuscation

Address obfuscation is a technique to hide the real addresses of data or functions within a program. One way to achieve address obfuscation is by using pointers. For example, instead of using a function's real address directly, we can create a pointer to the function and manipulate its address. Here's an example:

#include <iostream>

void foo() {
    std::cout << "Hello, World!" << std::endl;
}

int main() {
    // create a pointer to foo
    void (*ptr)() = &foo;

    // obfuscate the pointer's address
    char *cptr = reinterpret_cast<char *>(&ptr);
    for (int i = 0; i < sizeof(ptr); ++i) {
        *(cptr + i) = *(cptr + i) + 1;
    }

    // call the obfuscated function using the obfuscated pointer
    (*ptr)();

    return 0;
}

In this example, we create a pointer to the foo function and then obfuscate its address by adding 1 to each byte of the pointer's memory. This makes it difficult to find the real address of the foo function.

Another way to achieve address obfuscation is by using function pointers within an array. Here's an example:

#include <iostream>

void foo() {
    std::cout << "Hello, World!" << std::endl;
}

int main() {
    // create an array of function pointers
    void (*funcs[1])() = {&foo};

    // obfuscate the addresses of the function pointers
    char *cptr = reinterpret_cast<char *>(funcs);
    for (int i = 0; i < sizeof(funcs); ++i) {
        *(cptr + i) = *(cptr + i) + 1;
    }

    // call the obfuscated function using the obfuscated pointer
    (*funcs[0])();

    return 0;
}

The base concept remains the same though.

Custom Encoding

In custom encoding, specific characters or sequences of characters are replaced with alternative characters or sequences. This can be done in many different ways, such as using a lookup table or applying mathematical operations to the original values. Here's an example that obfuscates the classic hello, world program:

#include <iostream>
using namespace std;

int main() {
    char o = 111;
    cout << "Hell" << o << ", w" << o << "rld!" << endl;
    return 0;
}

We've assigned the decimal value 111 (the ASCII code for "o") to the variable o, and used it in place of all occurrences of the letter "o". This makes the code harder to read and understand, especially if the encoding is more complex and harder to discern at a glance.

Passing Arguments at Runtime

The program can be changed to expect arguments at runtime. This requires the user to have both the code as well as the decryption key to decrypt the variables.

References

Last updated