Collection Expressions

C# 12 introduced a new language feature: collection expressions or collection literal. It provides a neat way to initialize collections: enumerables, lists, spans, etc.

An example of using the collection expressions:

int[] a = [6, 7];

In this case the compiler generates code for variable a referencing an array with two elements: 6 and 7.

In this post I will investigate what the C# compiler generates behind the scenes and how a custom collection could be extended to opt-in for this compiler feature.

Behind the scenes

In the above int[] a = [6, 7]; case the compiler emits the following IL code:

IL_0001: newarr [System.Runtime]System.Int32
IL_0006: dup
IL_0007: ldc.i4.0
IL_0008: ldc.i4.6
IL_0009: stelem.i4
IL_000a: dup
IL_000b: ldc.i4.1
IL_000c: ldc.i4.7

It creates an integer array and sets the value 6 on index 0 and the value 7 on index 1.

However, if a was declared as a List<int>, the compiler would emit different instructions:

// List<int> list = new List<int>();
IL_0000: newobj instance void class [System.Collections]System.Collections.Generic.List`1<int32>::.ctor()
// CollectionsMarshal.SetCount(list, 2);
// Creates a span over the list
// span[num] = 6;
IL_0015: ldloca.s 1
IL_0017: ldloc.2
IL_0018: call instance !0& valuetype [System.Runtime]System.Span`1<int32>::get_Item(int32)
IL_001d: ldc.i4.6
IL_001e: stind.i4
IL_001f: ldloc.2
IL_0020: ldc.i4.1
IL_0021: add
IL_0022: stloc.2
// span[num] = 7;
IL_0023: ldloca.s 1
IL_0025: ldloc.2
IL_0026: call instance !0& valuetype [System.Runtime]System.Span`1<int32>::get_Item(int32)
IL_002b: ldc.i4.7
IL_002c: stind.i4

In this case a List<int> object is created. The items are set through a span that is created over the backing array of the list. The C# compiler has a good understanding of the built-in collections, hence it can produce the most efficient code. It also understand the built-in types for example, it may use Array.Empty<T> for empty array expressions.

Custom Collections with Collection Expressions

A custom collection can opt-in for the collection expression feature in two ways:

  1. Implement IEnumerable<T> and public or internal void Add(T item) method. Make sure the collection has a parameterless constructor.

  2. Implement the methods required by IEnumerable<T> (note, that the actual interface does not need to be declared). Declare a CollectionBuilder attribute on the collection, for example: [CollectionBuilder(typeof(MyCollectionBuilderType), "Create")]. Then implement the referenced MyCollectionBuilderType type containing a static MyCollection<T> Create<T>(ReadOnlySpan<T> items) method, returning the custom collection in question:

internal static class MyCollectionBuilderType
{
    internal static MyCollection<T> Create<T>(ReadOnlySpan<T> items) =>
     new MyCollection<T>(items);
}

Applying this attribute is only possible when we have direct access to the source code of the collection. This means that 3rd party collections in NuGet packages cannot be enriched to work with collection expressions if we have no access to the source itself.

If any of the criteria mentoined in point 1 or 2 is not met, the compiler will issue a build errors:

CS9174 A collection expression of this type cannot be used in this context because it may be exposed outside of the current scope. OR CS9214 Collection expression type must have an applicable constructor that can be called with no arguments.

The official documentation mentions that all built-in collections are well-behaved. To my understanding this means if the code compiles the collection will be created (unless some unexpected error occurs such as an OOM). With custom collections such a guarantee might not exists. Imagine a collection that limits its size to 3 elements. When instantiated with MyLimitedCollection<int> a = [0,1,2,3], it would throw an exception runtime. This would not be a great developer experience as there is no good way to indicate such limitations through this API. Hence only well-behaving collections should opt-in to the collection expressions feature.

Similarly, MyCollection<int> a = [] will compile and work, however the compiler will not use a MyCollection.Empty<T> (despite being declared) to instantiate an empty custom collection.

Implementation with Add() method

When the collection opts-in to collection expressions with the void Add(T item) method, the compiler will emit IL that adds each item from the expression one-by-one invoking the Add method. The following C# code:

MyCollection<int> c = [1, 2, 3];

compiles to:

IL_000f: newobj instance void class MyCollection`1<int32>::.ctor()
IL_0014: dup
IL_0015: ldc.i4.1
IL_0016: callvirt instance void class MyCollection`1<int32>::Add(!0)
IL_001b: dup
IL_001c: ldc.i4.2
IL_001d: callvirt instance void class MyCollection`1<int32>::Add(!0)
IL_0022: dup
IL_0023: ldc.i4.3
IL_0024: callvirt instance void class MyCollection`1<int32>::Add(!0)

Invoking the Add method for each item might not be the most efficient way to initialize all collections. A custom collection type might expose some sort of a void AddRange(...) method that adds multiple items to the collection in a more efficient way.

Implementation with CollectionBuilder attribute

When methods such as void AddRange(...) is available on the collection's API, or a default parameter-less constructor is not available, the collection can opt-in to the collections expression feature with the [CollectionBuilder(typeof(MyCollectionBuilderType), "Create")] attribute. Depending on the custom collection's implementation this might have performance advantages. In this case the compiler generates more efficient code. For the same C# code:

MyCollection<int> c = [1, 2, 3];

produces the following IL:

IL_000f: ldtoken field valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=12_Align=4' '<PrivateImplementationDetails>'::'4636993D3E1DA4E9D6B8F87B79E8F7C6D018580D52661950EABC3845C5897A4D4'
IL_0014: call valuetype [System.Runtime]System.ReadOnlySpan`1<!!0> [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::CreateSpan<int32>(valuetype [System.Runtime]System.RuntimeFieldHandle)
IL_0019: call class MyCollection`1<!!0> MyCollectionInitializer::Create<int32>(valuetype [System.Runtime]System.ReadOnlySpan`1<!!0>)

In this case actual values are in written to the application binary. At runtime this IL creates a span over the right segment of the binary file and passes this span to the custom collection for initialization.

Note, that while this is more efficient from the compiler's point of view, the final performance will depend on the implementation of Add and Create/AddRange methods of the collection.

Conclusion

In this post I had a quick introduction over the collection expression feature of C# 12. I investigated the IL code generated by the compiler for different built-in collections. Then I outlined two ways custom collections can work with this feature. Finally, I explored the IL code generated for these two approaches.