Friday, December 29, 2006

Un-peeling the OnionI have worked on a project that involved use of the Tor onion routing concept.
The Tor Network, is a network that enables anonymous communication over the internet. Please see the official site for details on how it works.

While I was working on this project, I tried to see if someone were able to break the anonymity of the Tor concept, and I found a report with the title "Peeling the Onion: Unmasking TOR users". At first I was intriged by the title, and after reading it, I was a bit disappointed with Tor, because they were able to extract a wide variety of information from the Tor network. Then I read it again, and realised that there is actually nothing wrong with the Tor network. They have simply gotten their title wrong, it should have been somtehing like "Accessing browser information through anonymizers". If you read the report (which is a prerequisite for understanding this blog entry), you can see that they try to uncover some vandals who have exploited a weakness on some sites, and used Tor to hide their IP.

Blocking Entry
The first point that is given in the report, is that it is easy to find a complete list of exit nodes in the Tor network. That list can be used to block the entry from Tor machines. This does not pose any threat to the Tor users, because an exit node cannot be mapped to any particular user. It can however be usefull for site operators to limit anonymous access (ea. disallow comments from Tor users).

Timing attacks
The next point in the report, is that it is possible to figure out who was attacking a site, by the use of timing statistics (They use a simplified approach at first). This very point is written on the Tor site itself, so there is nothing new here. This possibility is not considered a risk because it can only provide an indication that two machines are communicating.
The basic idea is this:
  • Set up a listener at machine A and B.

  • Every time A sends a package, record the size and time.

  • Every time B recieves a package record the size and time.

  • Do the last two steps with A and B switched.

Now plot in the data in a graph, and if A and B were communicating, it should be visible that most packages sent from A were recieved by B, delayed by some (constant) time factor.

This will ONLY work if you suspect A and B BEFORE they start communicating, AND you have access to set up the listening devices.
If the machines communicate heavily with other machines all the time, the pattern will be very hard to see.

The report claims that since there are only a few Tor Servers in Denmark (they replace Tor Server with Tor user by mistake), it would be possible to see if the server was online at the time of the attack. They do mention that this will only work if the offendant is running a server. Well, yes it would require the offender to run the server AND two other conditions:
The Server is only online during attacks.
The Offender is forcing the use of his/her own onion router (highly weird choice).

Its pretty obvious that the two other conditions are highly unlikely, and thus the entire technique is completely useless.

They all use browsers
The next way they try to unmask the user, is by using browser bugs/features (they forget to mention that). This method will work against almost any unknowing user. The simple user solution is to ensure that the machine does not know its own IP, through the use of proxies and outbound firewall rules.

Protocol Specifics
Following the browser exploits, are a number of weird takes, that show dumps of the various protocol specific data that a program sends, such as Locale and OS version. I have a hard time seeing how a flag saying "Locale Danish, OS WinXP" can in any way help to identify anyone.

Linux Encryption FS
Right after that weird entry, they point to a very old problem with the Linux Encryption engine that uses Sector numbers as Initialization Vectors. They claim that they have not investigated this further. Well, if they did, they would know the following:
  • It was fixed a looong time ago.

  • It cannot be exploited by sending anything to the machine. It is a bug that will enable people with low-level access to the disk, to guess the passphrase faster.

  • Since sector numbers do not exist in Tor, it is completely unrelated to the Tor network.

My guess is that they threw that link in there, to try to leverage their report by pointing to some real investegation.

Timing attacks, again
As a last point, they try to visualize the timing attacks, and introduce the idea of taging Tor data. If they read the protocol, they would know that the packages are encrypted. Encryption makes it impossible to change the headers inside the real package, and the layered approach makes sure that no TCP headers remain intact. It is possible to inject such headers from the attacked machine, but the output would only be readable to the client and thus very hard to intercept.

Traffic analysis, sort of...
This section is one of the funniest. At first, it is exciting that someone who runs such a server can extract this info. Then again, you don't know who is requesting the data.
The reason I find this section funny, is that when you read their findings on what people use the server for, you see some http traffic, and some telnet traffic. Then, In the next section, you see that they configured the server to accept only http and telnet traffic. Well, it would be very odd if they found anything else!

The state of affairs in Denmark
If you live in Denmark, you should start using Tor NOW. The government of Denmark has recently accepted a law that requires all usage of phone and internet to be logged. The Internet Service Providers are required by law to register all your phone calls, and all your internet connections for a whole year. If you think: "I am not criminal, why should that affect me?", consider the following scenario:
You visit a number of websites, and get a virus. The virus turns out to be a trojan, and your computer is being used to attack various sites. The police checks up on the data, and finds your IP. Now every website you have visited, or have pulled a commercial from, will now be on the list, and you may have to explain why you visited each and every one of these.

Thursday, December 21, 2006

How obfuscation works (how to protect your code)I have come across quite a few people that belive that it is actually possible to create an application that cannot be disassembled.
To disprove that once and for all, here is the short logical explanation:

  • The code must run on the target computer, otherwise it is useless.

  • Any code that runs on the target computer must somehow translate into x86 machine code (or whatever chip you happen to have).

  • Any x86 runable code, can be read.

If you start comming up with some wild scheme with self modifiable code, it would be possible to hack into an x86 emulator (such as VMWare), and have it dump all instructions that actually execute.

However reading x86 machine code, is non trivial for any moderately complex program. Usually this pratice is only done on small portions of code, such as the code in a Copy Protection program. Restoring an x86 code dump to something remotely usefull, is a very difficult process, but it CAN be done, and tools for this procedure exists.

What you have to consider when choosing a protection scheme is: How bad does someone else want your code? Will someone benefit from 4 years of reverse engineering to extract your program?
For most programs, it is sufficient that the reconstruction is moderately difficult.

In .Net it is highly desireable that the code remains managed code, so going into machine code is not an option.
The trick usually employed is Obfuscation. The basic principle is to make the code as difficult to understand as possible.

Obfuscation example
Here is an example of readable code:

private void UpdateCustomer(CustomerClass customer, Country newCountry)

The example is easy to read, and the code is almost self explaining. Here is an example Obfuscation:

private void a(b c, d e)

The code does exactly the same, but there is no way to guess what it does. It can still be stolen and it can still run, but you will end up using an enormous amount of time trying to figure out what goes on where. I would say that this level of protection is sufficient for most code. It does not prevent anyone from stealing a particular piece of code, but it will probally be faster to rewrite the program, than try to make sense of it. And there are no tools to help make sense of it.

This kind of obfuscation, is what the DotFuscator Community Edition does.

If you want to take it one step further, you can purchase a tool, such as the DotFuscator Professional edition, and you will get some protection from people trying to read your code.
The Reflector and similar tools, look at the bytecode for the program, and recognize patterns. A sample piece of code may look like:

for(int i=0;i<10;i++)

In IL (Intermediate Language, an assembler like language), it becomes:

.locals init ([1] int32 i)
L_0000: nop
L_0001: ldc.i4.0
L_0002: stloc.1
L_0003: ldloca.s i
L_0005: call instance string int32::ToString()
L_000a: call void [mscorlib]System.Console::WriteLine(string)
L_000f: nop
L_0010: nop
L_0011: ldloc.1
L_0012: ldc.i4.1
L_0013: add.ovf
L_0014: stloc.1
L_0015: ldloc.1
L_0016: ldc.i4.s 10
L_0018: ble.s L_0003

In the reflector, it is shown as this:

int i = 0;
while (i < 10);

The part to note, is that the IL lines labled L_0001 to L_0003 equals the start of the loop (setting i to zero), and the lines L_0011 to L_0018 is the end of the loop (counting up, and checking the exit condition).
Any loop constructed by the .Net compiler will look simillar to this, and thus it is easy to recognize, and build back into a loop. If one were to swap some of the instructions performed inside the code with ea. the exit check, the decompiler would have a hard time guessing that this was in fact a loop, thus making the code look more like the actual assembly code.

On top of this, the DotFuscator Professional edition inserts invalid instructions into the stream, which confuses most decompilers. Any reconstruction will have to be manually assisted, which means it will take a LOOONG time to recover anything usefull.

Other tricks
Other tricks include string encryption and code removal.
String encryption ensures that no strings can be read from the assembly (at least until it is running).
Code removal, removes uncalled code, and ensures that it will not be easy to extend the application, as it contains only the code needed for this particular application.

All these tricks can be performed without impacting the functionality of the program, if no reflection is used within the program. That goes for both direct and indirect reflection. Direct reflection is when you call a method with an Invoke("methodname"), and indirect is when you access property metadata, such as the ToString() function of en enum variable. The first will break after code removal or method renaming, the second will break after renaming.

Wednesday, December 20, 2006

Extracting a .Net assembly from af Netz compressed applicationI recently stumbled upon the Netz tool. The Netz tool can compress a complete .Net application and package it into a single executeable. That makes it great for deployment without an installer.
What struck me was that someone claimed that this tool protects your code from dissassembly through reflection (using a tool, such as Reflector).
Creating such a program is not possible, see my post on Obfuscation for an argument and explanation.

As the assembly is merely inserted into the Netz loader, as an embedded resource, it is very easy to write a small program that reads all such resources from an executeable, and dumps them to disk. In fact the code for this is just these lines:

Assembly asm = Assembly.LoadFile(Path.GetFullPath("NetzLoader.exe"));
ResourceManager rm = new ResourceManager("app", asm);

byte[] t = UnZip((byte[])rm.GetObject("A6C24BF5-3690-4982-887E-11E1B159B24"));
Stream st = File.Create("objdump.exe");
st.Write(t, 0, t.Length);

The GUID in the code, can be obtained by opening a Netz executeable, and view the resources. It seems that the name of the Resource is always app.resource.
The function named UnZip is not included in the code. It is a standard UnZip function, that takes a compressed byte array, and returns an uncompressed stream.

If you try to encrypt this stream, remeber that the decryption will have to run on the client machine, so one may load the assembly, and execute the decryption function (in short: Don't bother).

You can actually dump the resource file from a tool such as Reflector, but there are no functions that create a ResourceManager from a stream or byte array..
This procedure can be repeated for any number of files. I imagine that it is possible to obtain all the different resource names programatically, but I didn't bother trying.

I've also seen claims that the load time will decrease, as the disk image is smaller. I doubt that any disk IO will slow the program more than decompressing a stream. Also, there is a significant memory overhead, as the image will have to be in memory a number of times, during the read/unzip/load procedure.

The only proper use for this tool, is to allow distribution without an installer (msi installers carry compressed files per default).
The compressed format, makes it perfect for easy distribution. However, there are a few caveats that has to be adressed.

  • .Net executeables downloaded from the internet will carry a special attribute, that impairs their functionality. Such an executeable may not read registry keys, P/Invoke, etc.

  • There is no way to uninstall the program.

  • It may be possible to bypass virus scanners, seeing that no virus scanners open .Net executeables and try to decompress their resources.

The first two problems may not be a problem, if your application is a lightweight program, that does not interact with the users systen (such a program could be a calculator program).
The third problem may be adressed later by virus scanner builders, and thus suddenly disabling your program.

Generally you should choose to use the MSI installer format. If you want the easy way, use the tools in Visual Studio, if you like control, use the Nullsoft Install System.
If you truly have a lightweight application, consider the ClickOnce deployment system.

If you still think that having a single executeable is neat, try ILMerge.

Friday, December 15, 2006

DRM - The last breath from the era of Record LabelsIn the recent years, the major record labels have tried to maintain their stronghold on the world of music.

The record labels has had a major role in the launch of new artists. As an artist, it is usually a sure path to success, when you get signed by a record label. Unfortunately the world is changing, and so is the way music is transported and used.

The evolution of music usage

Back in the day, music was on LP's, then MC's, then CD's. All these containers needed a physical transport, and a physical (retail) outlet. So, as a new artist, it is close to impossible to walk up to a retail-chain and ask them to sell your CD. That makes sense, as it is usually hard to sell CD's from unknown artists. So, the way in, is using a record label, which has the distribution and marketing deals set up.


When Napster came about, the record labels saw digital copying as a competitor. The RIAA (and other local branches) invented the word "pirates", and started telling people that they were stealing music. They continue to do this, even though "pirating" is technically "copyright infringement", and NOT stealing. If you wonder about the differences between the two, stealing is an act where an item is physically taken from another person, and copyright infringement is copying an item without the copyright holders permission. Stealing is a matter investigated by the police, where copyright infringement is a matter between to legal persons in civil court.

The American Way - See you in court

Napster gave MP3's and Digital Music a bad image, at least from the perspective of Record Labels. So ever since Napster, RIAA has used all means to prosecute individuals who make digital copies, right down to sueing soccer-moms for gigantic amounts. The reason that the law allows these huge amounts, is that the laws are target CD shops, and not a home filesharers.

The CD shop is the kind of persons/enterprise who purchase a CD, and then copies labels, covers etc. and then reproduces this in great numbers, which is then sold to gas stations and the like, often going as being original/legal material.
Most laws passed, are targeting the CD shop, yet RIAA uses these same laws to procecute filesharers.

Protect your assets

To reduce amount of copying, the Record Labels have tried to limit the ways that consumers can use their legally purchased music. All these attemps have failed and mostly backfired. The reason for these failures are very easy to see, once you have tried using music in a real environment.

First of all, music is avalible in CD format, so whatever new thing comes up, it will have to compare with CD's.
Next, the music is avalible for free downloading, so that has to be a factor also.
Finally, people use music in a myriad of ways, MP3 players, Computers, Stereo, Car, etc...

Method 1 - Protect the CD's

The first thing the Record Labels have done, is to try to protect CD's. That will always fail, simply because the CD format is an international standard, and has NO way to protect the content. Every now and then, a new CD protection method pops up, and there is always problems with it not playing in certain CD players. Secondly, it is not possible to prevent the music from being played, and at the same time being played, on the same equipment. So in all cases, a person will be able to put up a microphone near the speakers, and record the output. Almost all copy protection schemes can be broken in a simpler way, because the decryption/authentication has to happen on the playing device, and that device is beyond the control of the authors.

So, that method won't work, but why do they try it anyway? Well, if you can't get the music of the CD, you can't copy it. But if you can't get it of the CD, you must use a CD player. That might be fine for some people, but I personally likes to hear music on my cellphone, and it doesn't come with a CD drive. So such a CD is 100% worthless to me. Some of these schemes are a little more gratefull, and allows me to copy the music a few times, but thats just plain stupid. After a few times use, the CD is worthless, and I have to re-buy my use of the CD.

One very funny (and outrageous) case, is the case of the Sony Rootkit.
In this particular case, Sony used the XPC software, which installs as a rootkit, when inserted into a PC running Windows, regardless of the users choice on the EULA.
The rootkit was discovered by Mark Russinovich, and inserts noise when the CD is played from anything but the Sony player. This case is interresting, because Sony belives they had a right to do this, because they were protecting their property. A Sony spokesperson even said "Most people don't even know what a rootkit is, why should they care?". The case is also funny, because the protection only works on a PC with windows, so any user with access to a Linux or Mac computer, may copy it freely.

I have read that some Record Label people claim that the DRM is to be considered a guideline for legal use. Thats just plain lying. A guideline would be a messagebox saying "You are making copy 3 of this CD. You may be violating copyright laws, do you want to continue?". A DRM system is NOT a guideline!

Method 2 - Protect the files

The second approach, is to supply digital files with DRM protection. The idea is that the DRM ensures that the file is only playable on an approved device. Also there can be limitations on number of CD's burned and number of times played, etc. DRM protected files suffer two major problems:
  • They limit the ways the files can be used

  • They require authentication with a DRM server

The first problem is usually the worst. If I buy an iPod, purchase some music from the iTunes store, and later the iPod becomes broken, I have to either buy a new iPod, or re-buy all my music. Thats just plain stupid, and I can't imagine why anyone would EVER do that. Ohh, and you can forget all about playing your favorite music under linux.
The second problem is even worse: what if Apple goes bankrupt? Or decides to close the iTunes store. Well, you can't reactivate your music, which is required when you move it to another device. Again, re-buy your music, but this time it's all out of your hands. Again, I cannot imagine why anyone would pay for that.

CD Problems

So, what is wrong with CD's?

Well, in my household, we have the following music capable devices:
  • PC's with Windows,

  • PC's with Linux

  • An MP3 capable phone

  • A Sansa Music Player

  • An IPod

If I want to play my purchased music on all these devices, there is only one format: MP3. Not even CD's can play on all these. In fact I never play music directly from CD's, so I would benefit greatly from buying music digitally.
Why do I have to go to a store, search through stacks of CD's, pay for transport and manufacturing, when I go home an rip it before I play it?
It doesn't make any sense.

What to do

If Record Labels would start paying attention, they would realise, that MP3 files are at the same protection level as CD's, there is no need to add extra security to MP3 files. However, it is very easy to record a watermark into each sold song, uniquely identifying the original customer. If that file ever ends up in a filesharing network, it would be easy to locate the origin.
If this practice is adopted, the Record Labels could end up saving a lot of cash, because they can simply stop all the CD presses, avoid all the costly distribution, stop paying DRM fees, and concentrate on marketing and recording studios.
And that makes a lot of sense, since it is something they are good at!