Struct Serialization with Spans

In one of the books, Pro .Net Performance, an some examples are mentioned for struct serialization on the chapter of 'Code Generation'.

Since the book was released the C# has evolved, and some of the constraints has been removed earlier existed. Also with C# 7.2 and above we have Span at the hands of the developers so performance improvements can be implemented.

In this post, I will take the 2 deserialization examples of the book and compare it with the latest constructs available with Span and unmanaged generics.

Serialization task

The task is that given 10 million TcpHeader structs in a byte[], we would like to serialize them all back to TcpHeader. We would like to create a generic solution for the serialization which is reusable regardless of the struct being used.

public readonly struct TcpHeader
{
  public TcpHeader(int sIp, int dIp, int sPort, int dPort)
  {
    SourceIp = sIp;
    DestIp = dIp;
    SourcePort = sPort;
    DestPort = dPort;
  }
  
  public readonly int SourceIp;
  public readonly int DestIp;
  public readonly int SourcePort;
  public readonly int DestPort;
}

PtrToStructure

The first solution will pin the array and use the PtrToStructure method to get the results. On my machine having the 10000000 structs deserialized takes an average of 3.7 sec. The disadvantage of this solution is that PtrToStructure will cause a heap allocation and unboxing which is not desired in case of structs.

private static void ReadMarshalPtrToStructure(byte[] data, int offset, out T result)
{
  GCHandle gch = GCHandle.Alloc(data, GCHandleType.Pinned);
  try
  {
    result = Marshal.PtrToStructure(gch.AddrOfPinnedObject() + offset);
  }
  finally
  {
    gch.Free();
  }
}

TypedReference

The seconds solution uses TypedReference and undocumented __makeref and __refvalue to interpret the bytes as structs. This solution is considerably faster to the previous one, this takes an average of 262ms to run. Note the method is marked with unsafe keyword.

private static unsafe void ReadPointerTypedRef(byte[] data, int offset, ref T result)
{
  TypedReference tr = __makeref(result);
  fixed (byte* ptr = &data[offset])
  {
    *(IntPtr*)&tr = (IntPtr)ptr;
    result = __refvalue(tr, T);
  }
}

The Generic Unsafe

One disadvantage of the previous solutions is that they are not inlined by the JIT compiler. This next solution will be still unsafe but inlined. With the latest language feature, we can write generic unsafe methods. By marking T as unmanaged we can create generic unsafe methods. This is the fastest solution, as it takes an average of 29 ms to run. It is fast as the compiler generates assembly code straight which copies the memory into related registers. JIT is smart enough to figure if only certain fields are used in the TcpHeaer, and if so, it can omit emitting the mov instructions for the rest of the fields.

private static unsafe void ReadPointer(byte[] data, int offset, out T result) where T : unmanaged
{
  fixed (byte* pData = &data[offset])
  {
    result = *(T*)pData;
  }
}

Taking a look at the assembly instructions, we can see it even got inlined, and memory is simply moved from one location to another using the mov instruction.

C:\Users\...\documents\visual studio 2017\Projects\ConsoleApp\StructDeserializing\Program.cs @ 27:
00007ffe`16530970 33d2 xor edx,edx
00007ffe`16530972 4889542420 mov qword ptr [rsp+20h],rdx
00007ffe`16530977 8bd1 mov edx,ecx
00007ffe`16530979 0fafd6 imul edx,esi
00007ffe`1653097c 3bd0 cmp edx,eax
00007ffe`1653097e 0f835c010000 jae 00007ffe`16530ae0
00007ffe`16530984 4863d2 movsxd rdx,edx
00007ffe`16530987 488d541710 lea rdx,[rdi+rdx+10h]
00007ffe`1653098c 4889542420 mov qword ptr [rsp+20h],rdx
00007ffe`16530991 488b542420 mov rdx,qword ptr [rsp+20h]
00007ffe`16530996 448b32 mov r14d,dword ptr [rdx]
00007ffe`16530999 448b7a04 mov r15d,dword ptr [rdx+4]
00007ffe`1653099d 448b6208 mov r12d,dword ptr [rdx+8]
00007ffe`165309a1 448b6a0c mov r13d,dword ptr [rdx+0Ch]
00007ffe`165309a5 33d2 xor edx,edx
00007ffe`165309a7 4889542420 mov qword ptr [rsp+20h],rdx
00007ffe`165309ac 4889542420 mov qword ptr [rsp+20h],rdx

Using Span

Finally we can produce a managed (not unsafe) version of this copy, which is reasonably fast. It is still not as fast as the unsafe ReadPointer method, but it is fully managed, and it still gets inlined. It takes an average of 203 ms to run on the sample data, which makes it the fastest and safest solution. Also an advantage of this solution that we do not need to pin the array, as through the Span we have a managed reference to it.

private static void ReadMemoryMarshal(byte[] array, int offset, int size, out T result) where T : struct
{
  result = MemoryMarshal.Cast(array.AsSpan(offset, size))[0];
}