Collection Expressions
09/15/2024
6 minutes
C# 12 introduced a new language feature: collection expressions or collection literal. It provides a neat way to initialize collections: enumerables, lists, spans, etc.
An example of using the collection expressions:
int[] a = [6, 7];
In this case the compiler generates code for variable a
referencing an array with two elements: 6 and 7.
In this post I will investigate what the C# compiler generates behind the scenes and how a custom collection could be extended to opt-in for this compiler feature.
Behind the scenes
In the above int[] a = [6, 7];
case the compiler emits the following IL code:
IL_0001: newarr [System.Runtime]System.Int32 IL_0006: dup IL_0007: ldc.i4.0 IL_0008: ldc.i4.6 IL_0009: stelem.i4 IL_000a: dup IL_000b: ldc.i4.1 IL_000c: ldc.i4.7
It creates an integer array and sets the value 6 on index 0 and the value 7 on index 1.
However, if a
was declared as a List<int>
, the compiler would emit different instructions:
// List<int> list = new List<int>(); IL_0000: newobj instance void class [System.Collections]System.Collections.Generic.List`1<int32>::.ctor() // CollectionsMarshal.SetCount(list, 2); // Creates a span over the list // span[num] = 6; IL_0015: ldloca.s 1 IL_0017: ldloc.2 IL_0018: call instance !0& valuetype [System.Runtime]System.Span`1<int32>::get_Item(int32) IL_001d: ldc.i4.6 IL_001e: stind.i4 IL_001f: ldloc.2 IL_0020: ldc.i4.1 IL_0021: add IL_0022: stloc.2 // span[num] = 7; IL_0023: ldloca.s 1 IL_0025: ldloc.2 IL_0026: call instance !0& valuetype [System.Runtime]System.Span`1<int32>::get_Item(int32) IL_002b: ldc.i4.7 IL_002c: stind.i4
In this case a List<int>
object is created. The items are set through a span that is created over the backing array of the list. The C# compiler has a good understanding of the built-in collections, hence it can produce the most efficient code. It also understand the built-in types for example, it may use Array.Empty<T>
for empty array expressions.
Custom Collections with Collection Expressions
A custom collection can opt-in for the collection expression feature in two ways:
Implement
IEnumerable<T>
and public or internalvoid Add(T item)
method. Make sure the collection has a parameterless constructor.Implement the methods required by
IEnumerable<T>
(note, that the actual interface does not need to be declared). Declare a CollectionBuilder attribute on the collection, for example:[CollectionBuilder(typeof(MyCollectionBuilderType), "Create")]
. Then implement the referenced MyCollectionBuilderType type containing a staticMyCollection<T> Create<T>(ReadOnlySpan<T> items)
method, returning the custom collection in question:
internal static class MyCollectionBuilderType { internal static MyCollection<T> Create<T>(ReadOnlySpan<T> items) => new MyCollection<T>(items); }
Applying this attribute is only possible when we have direct access to the source code of the collection. This means that 3rd party collections in NuGet packages cannot be enriched to work with collection expressions if we have no access to the source itself.
If any of the criteria mentoined in point 1 or 2 is not met, the compiler will issue a build errors:
CS9174 A collection expression of this type cannot be used in this context because it may be exposed outside of the current scope. OR CS9214 Collection expression type must have an applicable constructor that can be called with no arguments.
The official documentation mentions that all built-in collections are well-behaved. To my understanding this means if the code compiles the collection will be created (unless some unexpected error occurs such as an OOM). With custom collections such a guarantee might not exists. Imagine a collection that limits its size to 3 elements. When instantiated with MyLimitedCollection<int> a = [0,1,2,3]
, it would throw an exception runtime. This would not be a great developer experience as there is no good way to indicate such limitations through this API. Hence only well-behaving collections should opt-in to the collection expressions feature.
Similarly, MyCollection<int> a = []
will compile and work, however the compiler will not use a MyCollection.Empty<T>
(despite being declared) to instantiate an empty custom collection.
Implementation with Add() method
When the collection opts-in to collection expressions with the void Add(T item)
method, the compiler will emit IL that adds each item from the expression one-by-one invoking the Add
method. The following C# code:
MyCollection<int> c = [1, 2, 3];
compiles to:
IL_000f: newobj instance void class MyCollection`1<int32>::.ctor() IL_0014: dup IL_0015: ldc.i4.1 IL_0016: callvirt instance void class MyCollection`1<int32>::Add(!0) IL_001b: dup IL_001c: ldc.i4.2 IL_001d: callvirt instance void class MyCollection`1<int32>::Add(!0) IL_0022: dup IL_0023: ldc.i4.3 IL_0024: callvirt instance void class MyCollection`1<int32>::Add(!0)
Invoking the Add
method for each item might not be the most efficient way to initialize all collections. A custom collection type might expose some sort of a void AddRange(...)
method that adds multiple items to the collection in a more efficient way.
Implementation with CollectionBuilder attribute
When methods such as void AddRange(...)
is available on the collection's API, or a default parameter-less constructor is not available, the collection can opt-in to the collections expression feature with the [CollectionBuilder(typeof(MyCollectionBuilderType), "Create")]
attribute. Depending on the custom collection's implementation this might have performance advantages. In this case the compiler generates more efficient code. For the same C# code:
MyCollection<int> c = [1, 2, 3];
produces the following IL:
IL_000f: ldtoken field valuetype '<PrivateImplementationDetails>'/'__StaticArrayInitTypeSize=12_Align=4' '<PrivateImplementationDetails>'::'4636993D3E1DA4E9D6B8F87B79E8F7C6D018580D52661950EABC3845C5897A4D4' IL_0014: call valuetype [System.Runtime]System.ReadOnlySpan`1<!!0> [System.Runtime]System.Runtime.CompilerServices.RuntimeHelpers::CreateSpan<int32>(valuetype [System.Runtime]System.RuntimeFieldHandle) IL_0019: call class MyCollection`1<!!0> MyCollectionInitializer::Create<int32>(valuetype [System.Runtime]System.ReadOnlySpan`1<!!0>)
In this case actual values are in written to the application binary. At runtime this IL creates a span over the right segment of the binary file and passes this span to the custom collection for initialization.
Note, that while this is more efficient from the compiler's point of view, the final performance will depend on the implementation of
Add
andCreate/AddRange
methods of the collection.
Conclusion
In this post I had a quick introduction over the collection expression feature of C# 12. I investigated the IL code generated by the compiler for different built-in collections. Then I outlined two ways custom collections can work with this feature. Finally, I explored the IL code generated for these two approaches.