Thursday, December 21, 2006

How obfuscation works (how to protect your code)I have come across quite a few people that belive that it is actually possible to create an application that cannot be disassembled.
To disprove that once and for all, here is the short logical explanation:

  • The code must run on the target computer, otherwise it is useless.

  • Any code that runs on the target computer must somehow translate into x86 machine code (or whatever chip you happen to have).

  • Any x86 runable code, can be read.


If you start comming up with some wild scheme with self modifiable code, it would be possible to hack into an x86 emulator (such as VMWare), and have it dump all instructions that actually execute.

However reading x86 machine code, is non trivial for any moderately complex program. Usually this pratice is only done on small portions of code, such as the code in a Copy Protection program. Restoring an x86 code dump to something remotely usefull, is a very difficult process, but it CAN be done, and tools for this procedure exists.

Considerations
What you have to consider when choosing a protection scheme is: How bad does someone else want your code? Will someone benefit from 4 years of reverse engineering to extract your program?
For most programs, it is sufficient that the reconstruction is moderately difficult.

In .Net it is highly desireable that the code remains managed code, so going into machine code is not an option.
The trick usually employed is Obfuscation. The basic principle is to make the code as difficult to understand as possible.

Obfuscation example
Here is an example of readable code:

private void UpdateCustomer(CustomerClass customer, Country newCountry)
{
   customer.CheckContry();
   customer.CheckBalance();
   customer.UpdateCountry(newCountry);
}


The example is easy to read, and the code is almost self explaining. Here is an example Obfuscation:


private void a(b c, d e)
{
   c.f();
   c.g();
   c.h(e);
}


The code does exactly the same, but there is no way to guess what it does. It can still be stolen and it can still run, but you will end up using an enormous amount of time trying to figure out what goes on where. I would say that this level of protection is sufficient for most code. It does not prevent anyone from stealing a particular piece of code, but it will probally be faster to rewrite the program, than try to make sense of it. And there are no tools to help make sense of it.

This kind of obfuscation, is what the DotFuscator Community Edition does.

If you want to take it one step further, you can purchase a tool, such as the DotFuscator Professional edition, and you will get some protection from people trying to read your code.
The Reflector and similar tools, look at the bytecode for the program, and recognize patterns. A sample piece of code may look like:


for(int i=0;i<10;i++)
   Console.WriteLine(i.ToString());


In IL (Intermediate Language, an assembler like language), it becomes:

.locals init ([1] int32 i)
L_0000: nop
L_0001: ldc.i4.0
L_0002: stloc.1
L_0003: ldloca.s i
L_0005: call instance string int32::ToString()
L_000a: call void [mscorlib]System.Console::WriteLine(string)
L_000f: nop
L_0010: nop
L_0011: ldloc.1
L_0012: ldc.i4.1
L_0013: add.ovf
L_0014: stloc.1
L_0015: ldloc.1
L_0016: ldc.i4.s 10
L_0018: ble.s L_0003


In the reflector, it is shown as this:

int i = 0;
do
{
Console.WriteLine(i.ToString());
i++;
}
while (i < 10);


The part to note, is that the IL lines labled L_0001 to L_0003 equals the start of the loop (setting i to zero), and the lines L_0011 to L_0018 is the end of the loop (counting up, and checking the exit condition).
Any loop constructed by the .Net compiler will look simillar to this, and thus it is easy to recognize, and build back into a loop. If one were to swap some of the instructions performed inside the code with ea. the exit check, the decompiler would have a hard time guessing that this was in fact a loop, thus making the code look more like the actual assembly code.

On top of this, the DotFuscator Professional edition inserts invalid instructions into the stream, which confuses most decompilers. Any reconstruction will have to be manually assisted, which means it will take a LOOONG time to recover anything usefull.

Other tricks
Other tricks include string encryption and code removal.
String encryption ensures that no strings can be read from the assembly (at least until it is running).
Code removal, removes uncalled code, and ensures that it will not be easy to extend the application, as it contains only the code needed for this particular application.

Caveats
All these tricks can be performed without impacting the functionality of the program, if no reflection is used within the program. That goes for both direct and indirect reflection. Direct reflection is when you call a method with an Invoke("methodname"), and indirect is when you access property metadata, such as the ToString() function of en enum variable. The first will break after code removal or method renaming, the second will break after renaming.

No comments: