Deep Dive in Ref Returns
05/04/2019
5 minutes
In the previous post I have used Ref returns to return some data. I noticed that with slight changes we get totally different code generated by the JIT, which is can have a good or bad effect on our code.
In this post, I will dig deep (with WinDbg) in the JIT generated code. As forefront: I am using 64 bit machine, .net core 2.1 and RyuJIT.
I created a sample benchmark to showcase. I have a Point struct with 2 integer properties. I benchmark setting the values on the struct in 3 different ways, I show related IL and machine code impacting performance.
The benchmark
The benchmark code looks as follows:
[CoreJob] public class RefReturnBenchmark { private int _sum = 0; private Point _p; [Benchmark(Baseline = true)] public int RefMethodArg() { var p = new Point(); RefMethodArg2(ref p); _sum += p.X + p.Y; return _sum; } [MethodImpl(MethodImplOptions.NoInlining)] private void RefMethodArg2(ref Point p) { p.X = 10; p.Y = 11; } [Benchmark] public int RefReturn() { ref var p = ref RefReturn2(); _sum += p.X + p.Y; return _sum; } [MethodImpl(MethodImplOptions.NoInlining)] private ref Point RefReturn2() { ref Point p = ref _p; p.X = 10; p.Y = 11; return ref p; } [Benchmark] public int RefReturnSlow() { ref var p = ref RefReturnSlow2(); _sum += p.X + p.Y; return _sum; } [MethodImpl(MethodImplOptions.NoInlining)] private ref Point RefReturnSlow2() { _p.X = 10; _p.Y = 11; return ref _p; } }
There are 3 use cases:
RefMethodArg is passing a struct as a ref input parameter to RefMethodArg2
RefReturnSlow is using ref returns and calling RefReturnSlow2
RefReturn is calling RefReturn2 (it differs only a single line of code to RefReturnSlow)
The results of the Benchmark:
BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.648 (1803/April2018Update/Redstone4), VM=Hyper-VIntel Xeon CPU E5-2673 v3 2.40GHz, 1 CPU, 2 logical and 2 physical cores.NET Core SDK=2.2.101[Host] : .NET Core 2.1.9 (CoreCLR 4.6.27414.06, CoreFX 4.6.27415.01), 64bit RyuJITCore : .NET Core 2.1.9 (CoreCLR 4.6.27414.06, CoreFX 4.6.27415.01), 64bit RyuJITJob=Core Runtime=Core
Method | Mean | Error | StdDev | Ratio | RatioSD |
---|---|---|---|---|---|
RefMethodArg | 2.611 ns | 0.0982 ns | 0.1796 ns | 1.00 | 0.00 |
RefReturn | 2.178 ns | 0.0928 ns | 0.1998 ns | 0.83 | 0.09 |
RefReturnSlow | 2.483 ns | 0.0981 ns | 0.2090 ns | 0.95 | 0.12 |
To admit, there is some noise, running a couple of times, we get different results in terms of difference, but the overall order remains.
IL
What are the differences? To point them out between the first and second use-case, let's investigate the IL code for each solution.
RefMethodArg - IL
.locals init ( [0] valuetype StructDeserializingFix.Point ) // (no C# code) IL_0000: ldloca.s 0 // Point p = default(Point); IL_0002: initobj StructDeserializingFix.Point // this.RefMethodArg2(ref p); IL_0008: ldarg.0 IL_0009: ldloca.s 0 IL_000b: call instance void StructDeserializingFix.RefReturnBenchmark::RefMethodArg2(valuetype StructDeserializingFix.Point&) // (no C# code) IL_0010: ldarg.0 // this._sum += p.X + p.Y; (same for both methods) ...
RefReturnSlow - IL
.locals init ( [0] valuetype StructDeserializingFix.Point& ) // ref Point p = this.RefReturnSlow2(); IL_0000: ldarg.0 IL_0001: call instance valuetype StructDeserializingFix.Point& StructDeserializingFix.RefReturnBenchmark::RefReturnSlow2() IL_0006: stloc.0 // (no C# code) IL_0007: ldarg.0 // this._sum += p.X + p.Y; (same for both methods) ...
The big difference is that RefMethodArg has a local Point, which needs to be initialized, while RefReturnSlow
is using a Point reference of a local variable in the class, and it does not need to pass it to RefReturnSlow2
(but it is being returned as a reference)
This explains why RefReturnSlow
is faster to RefMethodArg
. RefReturn
's IL looks exactly as RefReturnSlow
, only differs in the method called to populate the values on the struct, hence omitted here.
Machine Code
In this section let's compare the JIT-ed code of RefReturn
and RefReturnSlow
.
RefReturnSlow - Machine code
I load up Windbg, attach to the process, load the SOS extension for coreclr.
.loadby sos coreclr
Then examine the methods:
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturnSlow
...
!U /d [address]
00007ffc`59c118f0 56 push rsi
00007ffc`59c118f1 4883ec20 sub rsp,20h
00007ffc`59c118f5 488bf1 mov rsi,rcx
00007ffc`59c118f8 488bce mov rcx,rsi
00007ffc`59c118fb e820f8ffff call 00007ffc`59c11120 (StructDeserializingFix.RefReturnBenchmark.RefReturnSlow2(), mdToken: 0000000006000020)
00007ffc`59c11900 8b5608 mov edx,dword ptr [rsi+8]
00007ffc`59c11903 8b08 mov ecx,dword ptr [rax]
00007ffc`59c11905 03d1 add edx,ecx
00007ffc`59c11907 035004 add edx,dword ptr [rax+4]
00007ffc`59c1190a 8bc2 mov eax,edx
00007ffc`59c1190c 894608 mov dword ptr [rsi+8],eax
00007ffc`59c1190f 4883c420 add rsp,20h
00007ffc`59c11913 5e pop rsi
00007ffc`59c11914 c3 ret
Results the two methods JIT-ed code:
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturnSlow2 ... !U /d [address] 00007ffc`59c11930 488d4110 lea rax,[rcx+10h] 00007ffc`59c11934 488bd0 mov rdx,rax 00007ffc`59c11937 c7020a000000 mov dword ptr [rdx],0Ah 00007ffc`59c1193d 488bd0 mov rdx,rax 00007ffc`59c11940 c742040b000000 mov dword ptr [rdx+4],0Bh 00007ffc`59c11947 c3 ret
RefReturn - Machine code
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturn
...
!U /d [address]
00007ffc`59c11960 56 push rsi
00007ffc`59c11961 4883ec20 sub rsp,20h
00007ffc`59c11965 488bf1 mov rsi,rcx
00007ffc`59c11968 488bce mov rcx,rsi
00007ffc`59c1196b e8a0f7ffff call 00007ffc`59c11110 (StructDeserializingFix.RefReturnBenchmark.RefReturn2(), mdToken: 000000000600001e)
00007ffc`59c11970 8b5608 mov edx,dword ptr [rsi+8]
00007ffc`59c11973 8b08 mov ecx,dword ptr [rax]
00007ffc`59c11975 03d1 add edx,ecx
00007ffc`59c11977 035004 add edx,dword ptr [rax+4]
00007ffc`59c1197a 8bc2 mov eax,edx
00007ffc`59c1197c 894608 mov dword ptr [rsi+8],eax
00007ffc`59c1197f 4883c420 add rsp,20h
00007ffc`59c11983 5e pop rsi
00007ffc`59c11984 c3 ret
!name2ee StructDeserializingFix!StructDeserializingFix.RefReturnBenchmark.RefReturn2 ... !U /d [address] 00007ffc`59c119a0 488d4110 lea rax,[rcx+10h] 00007ffc`59c119a4 c7000a000000 mov dword ptr [rax],0Ah 00007ffc`59c119aa c740040b000000 mov dword ptr [rax+4],0Bh 00007ffc`59c119b1 c3 ret
Comparing them, we can see that the difference is only RefReturn2
and RefReturnSlow2
, and two mov
instructions. This seems to be one of the places, where more C# code results in a more optimized and faster code.