Deep Dive in Ref Returns

In the previous post I have used Ref returns to return some data. I noticed that with slight changes we get totally different code generated by the JIT, which is can have a good or bad effect on our code.

In this post, I will dig deep (with WinDbg) in the JIT generated code. As forefront: I am using 64 bit machine, .net core 2.1 and RyuJIT.

I created a sample benchmark to showcase. I have a Point struct with 2 integer properties. I benchmark setting the values on the struct in 3 different ways, I show related IL and machine code impacting performance.

The benchmark

The benchmark code looks as follows:

[CoreJob]
public class RefReturnBenchmark
{
  private int _sum = 0;
  private Point _p;

  [Benchmark(Baseline = true)]
  public int RefMethodArg()
  {
    var p = new Point();
    RefMethodArg2(ref p);
    _sum += p.X + p.Y;
    return _sum;
  }

  [MethodImpl(MethodImplOptions.NoInlining)]
  private void RefMethodArg2(ref Point p)
  {
    p.X = 10;
    p.Y = 11;
  }

  [Benchmark]
  public int RefReturn()
  {
    ref var p = ref RefReturn2();
    _sum += p.X + p.Y;
    return _sum;
  }

  [MethodImpl(MethodImplOptions.NoInlining)]
  private ref Point RefReturn2()
  {
    ref Point p = ref _p;
    p.X = 10;
    p.Y = 11;
    return ref p;
  }

  [Benchmark]
  public int RefReturnSlow()
  {
    ref var p = ref RefReturnSlow2();
    _sum += p.X + p.Y;
    return _sum;
  }

  [MethodImpl(MethodImplOptions.NoInlining)]
  private ref Point RefReturnSlow2()
  {
    _p.X = 10;
    _p.Y = 11;
    return ref _p;
  }
}

There are 3 use cases:

  • RefMethodArg is passing a struct as a ref input parameter to RefMethodArg2

  • RefReturnSlow is using ref returns and calling RefReturnSlow2

  • RefReturn is calling RefReturn2 (it differs only a single line of code to RefReturnSlow)

The results of the Benchmark:

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4), VM=Hyper-VIntel Xeon CPU E5-2673 v3 2.40GHz, 1 CPU, 2 logical and 2 physical cores.NET Core SDK=2.2.101[Host] : .NET Core 2.1.9 (CoreCLR 4.6.27414.06, CoreFX 4.6.27415.01), 64bit RyuJITCore : .NET Core 2.1.9 (CoreCLR 4.6.27414.06, CoreFX 4.6.27415.01), 64bit RyuJITJob=Core Runtime=Core

Method

Mean

Error

StdDev

Ratio

RatioSD

RefMethodArg

2.611 ns

0.0982 ns

0.1796 ns

1.00

0.00

RefReturn

2.178 ns

0.0928 ns

0.1998 ns

0.83

0.09

RefReturnSlow

2.483 ns

0.0981 ns

0.2090 ns

0.95

0.12

To admit, there is some noise, running a couple of times, we get different results in terms of difference, but the overall order remains.

IL

What are the differences? To point them out between the first and second use-case, let's investigate the IL code for each solution.

RefMethodArg - IL

.locals init (
	[0] valuetype StructDeserializingFix.Point
)

// (no C# code)
IL_0000: ldloca.s 0
// Point p = default(Point);
IL_0002: initobj StructDeserializingFix.Point
// this.RefMethodArg2(ref p);
IL_0008: ldarg.0
IL_0009: ldloca.s 0
IL_000b: call instance void StructDeserializingFix.RefReturnBenchmark::RefMethodArg2(valuetype StructDeserializingFix.Point&)
// (no C# code)
IL_0010: ldarg.0
// this._sum += p.X + p.Y; (same for both methods)
...

RefReturnSlow - IL

.locals init (
	[0] valuetype StructDeserializingFix.Point&
)

// ref Point p = this.RefReturnSlow2();
IL_0000: ldarg.0
IL_0001: call instance valuetype StructDeserializingFix.Point& StructDeserializingFix.RefReturnBenchmark::RefReturnSlow2()
IL_0006: stloc.0
// (no C# code)
IL_0007: ldarg.0
// this._sum += p.X + p.Y; (same for both methods)
...

The big difference is that RefMethodArg has a local Point, which needs to be initialized, while RefReturnSlow is using a Point reference of a local variable in the class, and it does not need to pass it to RefReturnSlow2 (but it is being returned as a reference)

This explains why RefReturnSlow is faster to RefMethodArg. RefReturn's IL looks exactly as RefReturnSlow, only differs in the method called to populate the values on the struct, hence omitted here.

Machine Code

In this section let's compare the JIT-ed code of RefReturn and RefReturnSlow.

RefReturnSlow - Machine code

I load up Windbg, attach to the process, load the SOS extension for coreclr.

.loadby sos coreclr

Then examine the methods:

!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturnSlow
...
!U /d [address]
00007ffc`59c118f0 56              push    rsi
00007ffc`59c118f1 4883ec20        sub     rsp,20h
00007ffc`59c118f5 488bf1          mov     rsi,rcx
00007ffc`59c118f8 488bce          mov     rcx,rsi
00007ffc`59c118fb e820f8ffff      call    00007ffc`59c11120 (StructDeserializingFix.RefReturnBenchmark.RefReturnSlow2(), mdToken: 0000000006000020)
00007ffc`59c11900 8b5608          mov     edx,dword ptr [rsi+8]
00007ffc`59c11903 8b08            mov     ecx,dword ptr [rax]
00007ffc`59c11905 03d1            add     edx,ecx
00007ffc`59c11907 035004          add     edx,dword ptr [rax+4]
00007ffc`59c1190a 8bc2            mov     eax,edx
00007ffc`59c1190c 894608          mov     dword ptr [rsi+8],eax
00007ffc`59c1190f 4883c420        add     rsp,20h
00007ffc`59c11913 5e              pop     rsi
00007ffc`59c11914 c3              ret

Results the two methods JIT-ed code:

!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturnSlow2
...
!U /d [address]
00007ffc`59c11930 488d4110        lea     rax,[rcx+10h]
00007ffc`59c11934 488bd0          mov     rdx,rax
00007ffc`59c11937 c7020a000000    mov     dword ptr [rdx],0Ah
00007ffc`59c1193d 488bd0          mov     rdx,rax
00007ffc`59c11940 c742040b000000  mov     dword ptr [rdx+4],0Bh
00007ffc`59c11947 c3              ret

RefReturn - Machine code

!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturn
...
!U /d [address]
00007ffc`59c11960 56              push    rsi
00007ffc`59c11961 4883ec20        sub     rsp,20h
00007ffc`59c11965 488bf1          mov     rsi,rcx
00007ffc`59c11968 488bce          mov     rcx,rsi
00007ffc`59c1196b e8a0f7ffff      call    00007ffc`59c11110 (StructDeserializingFix.RefReturnBenchmark.RefReturn2(), mdToken: 000000000600001e)
00007ffc`59c11970 8b5608          mov     edx,dword ptr [rsi+8]
00007ffc`59c11973 8b08            mov     ecx,dword ptr [rax]
00007ffc`59c11975 03d1            add     edx,ecx
00007ffc`59c11977 035004          add     edx,dword ptr [rax+4]
00007ffc`59c1197a 8bc2            mov     eax,edx
00007ffc`59c1197c 894608          mov     dword ptr [rsi+8],eax
00007ffc`59c1197f 4883c420        add     rsp,20h
00007ffc`59c11983 5e              pop     rsi
00007ffc`59c11984 c3              ret
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturn2
...
!U /d [address]
00007ffc`59c119a0 488d4110        lea     rax,[rcx+10h]
00007ffc`59c119a4 c7000a000000    mov     dword ptr [rax],0Ah
00007ffc`59c119aa c740040b000000  mov     dword ptr [rax+4],0Bh
00007ffc`59c119b1 c3              ret

Comparing them, we can see that the difference is only RefReturn2 and RefReturnSlow2, and two mov instructions. This seems to be one of the places, where more C# code results in a more optimized and faster code.