Friday, October 29, 2010

The CIL optimizer at work

I'm currently working on some low-level optimizations with CIL, and I bumped into a surprising instruction sequence.

The C# code that is compiled looks like this:
public static class Fac
{
    public static long Factorial(long number)
    {
        if (number == 0)
            return 1;
        else
            return number * Factorial(number - 1);
    }
}

When compiled to CIL, it looks like this:
public static class Fac
{
.method public hidebysig static int64  
    Factorial(int64 number) cil managed
{
  // Code size       34 (0x22)
  .maxstack  3
  .locals init ([0] int64 CS$1$0000,
           [1] bool CS$4$0001)
  IL_0000:  nop
  IL_0001:  ldarg.0
  IL_0002:  ldc.i4.0
  IL_0003:  conv.i8
  IL_0004:  ceq
  IL_0006:  ldc.i4.0
  IL_0007:  ceq
  IL_0009:  stloc.1
  IL_000a:  ldloc.1
  IL_000b:  brtrue.s   IL_0012
  IL_000d:  ldc.i4.1
  IL_000e:  conv.i8
  IL_000f:  stloc.0
  IL_0010:  br.s       IL_0020
  IL_0012:  ldarg.0
  IL_0013:  ldarg.0
  IL_0014:  ldc.i4.1
  IL_0015:  conv.i8
  IL_0016:  sub
  IL_0017:  call       int64 CILFac.Fac::Factorial(int64)
  IL_001c:  mul
  IL_001d:  stloc.0
  IL_001e:  br.s       IL_0020
  IL_0020:  ldloc.0
  IL_0021:  ret
} // end of method Fac::Factorial
}

What struck me was this code sequence:
  IL_001e:  br.s       IL_0020
  IL_0020:  ldloc.0

Rather weird as it just jumps directly to the next instruction. While writing an optimizer that detects and erases such inefficient constructs, it dawned on me that the code was probably built in debug mode, so I tried it in release mode, which gives this output:
.method public hidebysig static int64  
    Factorial(int64 number) cil managed
{
  // Code size       20 (0x14)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldc.i4.0
  IL_0002:  conv.i8
  IL_0003:  bne.un.s   IL_0008
  IL_0005:  ldc.i4.1
  IL_0006:  conv.i8
  IL_0007:  ret
  IL_0008:  ldarg.0
  IL_0009:  ldarg.0
  IL_000a:  ldc.i4.1
  IL_000b:  conv.i8
  IL_000c:  sub
  IL_000d:  call       int64 CILFac.Fac::Factorial(int64)
  IL_0012:  mul
  IL_0013:  ret
} // end of method Fac::Factorial

And with that the inefficient instruction was gone, saving me a lot of work :).

Another interesting difference is that the release version has removed all use of the local variables at the expense of a larger stack depth. This is also rather nice as a JIT engine can assign stack values to registers, much more easily than with the local variables.

Lesson learned: when optimizing CIL (or any byte code) only look at the optimized version.

No comments: