GC Allocate Array

In this post I will examine the GC Allocate Array methods along with other array creation alternatives.

Previously, I have looked into array allocation with when investigating SkipInitLocal. In that post it turned out the larger the array was, the larger the gain was to use stack allocation with [SkipInitLocal] attribute.

However during exercise I noticed that the current implementation of ArrayPool has been using GC.AllocateUninitializedArray<T> to create the arrays for its internal pool.

Creating Arrays

Over the years there has been a few extensions added to C# and the BCL which enabled more-and-more ways of creating arrays. Examples including but not limited to:

  • new operator

  • Array.CreateInstance

  • GC.AllocateUninitializedArray

  • GC.AllocateArray

  • stackalloc

  • Activator.CreateInstance

In this post I will solely focus on the methods offered by the GC class creating single dimensional, zero based arrays.

AllocateUninitializedArray

GC is a type that is mostly used for high-performance code, benchmarking or diagnosing application issues. It contains two methods interesting for the current discussion: GC.AllocateArray<T> and GC.AllocateUninitializedArray<T>. Both methods allow to specify the type of the array, the size of the array and whether it is pinned on the heap or not.

The difference between the two methods is that AllocateUninitializedArray may return an array with uninitialized memory. There are some constraints though, one is that the type of the array may not contain references. It also worth pointing out that, the 'contains references' rule in reality is a bit more nuanced. To tell if the given type in .NET 6 is eligible for uninitialized allocation of arrays, the RuntimeHelpers.IsReferenceOrContainsReferences<T>() method can be used. There is also a minimum array size which is currently at 2048 bytes (assume this is an internal limit, which is subject of change in each .NET release).

Note, that because GC.AllocateUninitializedArray<T> returns a memory segment which might not be initialized with zeros. Thus this method should only be used, when we know that each element of the array will be overwritten with a valid value before further use.

Is it any faster?

My first question was: is this really faster to regular, zero initialized allocation? If so, how much faster it is? To answer these questions, I ran two benchmarks, one vanilla focusing on net results, and one that iterates over the array twice to emulate reading-writing values.

Vanilla Benchmark

In the vanilla benchmarks, I allocated a 90000 int array on the Large Object Heap. I also invoked a GC collection with LargeObjectHeapCompactionMode set to CompactOnce. This will suggest garbage collection on the large object heap, which releases the previously allocated array to make the space reusable for the next allocation.

|        Method |     Mean |   Error |   StdDev |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|-------------- |---------:|--------:|---------:|----------:|----------:|----------:|----------:|
| Uninitialized | 200.8 us | 6.48 us | 19.09 us | 1000.0000 | 1000.0000 | 1000.0000 |    352 KB |
|   Initialized | 214.8 us | 5.34 us | 15.14 us | 1000.0000 | 1000.0000 | 1000.0000 |    352 KB |

Read-Write Benchmark

In a more real-life like use case, the benchmark also sets a valid value for each element of the array. In this case the benchmarked method body has been 'unrolled' 5 times.


|        Method |     Mean |     Error |    StdDev |     Gen 0 |     Gen 1 |     Gen 2 | Allocated |
|-------------- |---------:|----------:|----------:|----------:|----------:|----------:|----------:|
| Uninitialized | 1.364 ms | 0.0290 ms | 0.0856 ms | 5000.0000 | 5000.0000 | 5000.0000 |      2 MB |
|   Initialized | 1.427 ms | 0.0295 ms | 0.0862 ms | 5000.0000 | 5000.0000 | 5000.0000 |      2 MB |

Both indicates a net performance gain using the Uninitialized method counterpart.

When can I benefit using the Uninitialized method?

As previously stated, GC.AllocateUninitializedArray<T> returns a memory fragment which might not be initialized with zeros. Thus this method should only be used, when we know that each element of the array will be overwritten with a valid value before further use.

To answer the above question, one would first have to answer the question: how to tell if the memory fragment is left uninitialized? One idea could be to iterate over the array after allocation, and print each element. When non-zero elements are printed on the console it is an uninitialized array. However during my testing, I noticed that it was very likely to receive a memory fragment that is initialized. So even though I asked for an uninitialized array, all elements of the returned array were set to zero.

To understand this better, I altered the experiment:

  • allocate an array

  • set a known value for each element

  • remove all references to the array

  • invoke garbage collection

  • allocate a new uninitialized array

  • print all non-zero elements on the console

When the known values are printed on the console in a consecutive manner, it is very likely that the new array was allocated overlapping the old array. Because in the initial array, the values are explicitly changed from zero, this should be visible through the second array's elements when the values are not initialized.

After a few more rounds of testing, I found these scenarios where uninitialized arrays had a clear advantage:

  • consecutive rapid array allocation on SOH and Garbage Collection

  • consecutive rapid array allocation on LOH and Garbage Collection with large object heap compaction enabled

  • exhausting the initial segment

In the first two cases allocating and releasing memory rapidly makes the runtime allocator re-use the previously used memory fragment. However without the GC being triggered and memory compacted, the allocator tends to allocate forward, on previously unused memory. With compaction the allocation pointer is 'reset'.

On the third case, I notice when the initially allocated segment (for SOH) is filled and a new segment needs to be allocated, the newly allocated memory (on the new segment) is returned as uninitialized. On my machine (.NET6 in workstation mode) the initial segment is 256 MB large. However, it was somewhat more difficult to benchmark this case.

Can I read existing memory?

A reasonable question to ask would be: can I read previously written data from an uninitialized array? In some cases this should be possible. In this post I present one of these cases.

Reading data that was previously written by the same process, should be easier to achieve with the Garbage Collection cases. When a new segment is allocated, it is more likely that a previously unused (by the current process) memory segment is allocated by the runtime.

Allocating on the Small Object Heap has its own challenges. There are typically more, short living objects, that are divided into generations. Allocating and garbage collecting an array may end up having that memory segment on Gen0 or Gen1 area depending on compaction and pinning. The large object heap is less dense of objects, and not divided into generations. Seems to provide a good space for the experiment.

The code below consists of two methods. PreAllocate() allocates a Person record type on the SOH, and serializes that into a byte array, that is large enough (in .NET 6) so that it is allocated on the LOH. The code snippet below writes the following JSON string into the byte array: {"Name":"SomeName","Id":123,"Birthday":"2022-01-01T00:00:00"}

PostAllocate() method is invoked right after. First, this method sets LOH compaction and invokes GC for garbage collection. This shall reclaim the memory used by the initial byte array, as there are no further objects referencing it. Next step allocates an uninitialized byte array on the Large Object Heap. In the lucky scenario, this object will be allocated overlapping the same memory segment that was used by the array allocated in the PreAllocate() method.

For simplicity I use IndexOf() method to find the opening and closing parentheses of the JSON string serialized previously. This gives us the memory span of the serialized data. The Deserialize() method tries to deserialize this span. If that is not a valid JSON data, it will throw an exception. When the memory is allocated overlapping and containing the complete JSON data, deserialized variable will hold a Person record, which can be then printed onto the console.

PreAllocate();
PostAllocate();

void PreAllocate()
{
    var person = new Person("SomeName", 123, new DateTime(2022, 01, 01));
    JsonSerializer.Serialize(new MemoryStream(new byte[90000]), person);
}

void PostAllocate()
{
    GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
    GC.Collect();
    var span = GC.AllocateUninitializedArray<byte>(90000).AsSpan();
    var startIndex = span.IndexOf((byte)'{');
    var endIndex = span.IndexOf((byte)'}') + 1;
    var deserialized = JsonSerializer.Deserialize<Person>(span[startIndex..endIndex]);
    Console.WriteLine(deserialized);
}

public record Person(string Name, int Id, DateTime Birthday);

Note, that for this to work one would need to build the code in Release mode. Another reason for the having the code structured into two methods is how the runtime compiles a given code to be fully or partially interruptible from the GC's point of view. To further examine how the code is compiled use windbg's !U [address of the JITTED method] -gcinfo command.

Another way to achieve the same results in Release mode without the need of two separate methods is by using the <TieredCompilationQuickJit>false</TieredCompilationQuickJit>. Set this flag in the csproj to disable quick JIT, so that the code does not need to be structured into separate methods.

Conclusion

GC.AllocateUninitializedArray<T> provides performance benefit when allocating arrays. However, this does not mean that we shall replace every new MyType[] array allocation. First, we shall check if the type and size of the array is eligible for allocating uninitialized arrays. It should be also ensured, that reading uninitialized data will not corrupt the application's state. Finally, when doing such micro-optimizations, always benchmark the code and measure the net performance gain achieved.