Laszlo - Frozen Collections

Frozen Collections

08/05/2023

5 minutes

.NET 8 (Preview 1) introduces new collection types including FrozenDictionary and FrozenSet. Both types are frozen counterparts of Dictionary<TKey, TValue> and HashSet<T>. These types reside in the System.Collections.Frozen namespace in SystemCollections.Immutable.dll package.

The frozen semantics mean, that these collections resist to change once they become frozen: they are immutable. However, they are different to the existing immutable collections, as these are being even more restrictive to change. While immutable collections allow change by creating a new immutable collection frozen semantics discourage such operations. Immutable collections use 'clever' data structures in memory that make operations like Add and Remove (relatively) cheap given the underlying data may not change. Contrary, frozen collections can optimize for lookups (or for construction, but more on that later).

For example, ImmutableStack<T> uses a linked list implementation instead of using an array as backing data structure. It has two fields: a head which contains data stored by a user, and a tail which is a pointer to the next item on the stack. Pushing a new item to the stack creates a new instance of ImmutableStack<T> where the head field stores the new data pushed, and the tail references the previous instance of ImmutableStack<T>.

Note, that these are the actual field names in the implementation of ImmutableStack at the time of writing this blog post.

In a similar way the pop operation returns the tail. In these cases, adding and removing items to an immutable stack is cheap. However, enumerating all items in this stack is less optimal compared to reading through an array of items.

Frozen

Creating a frozen collection has different semantics compared to immutables. The BCL exposes extension methods: ToFrozenDictionary and ToFrozenSet, which lets converting a frozen collection from an existing one. There is also an empty frozen instance exposed through the static Empty property:

FrozenDictionary<int,int>.Empty

// Not supported:
FrozenDictionary<int, int>.Empty.TryAdd(1, 1);

However, if one tries to add items to the empty FrozenDictionary<TKey,TValue> a NotSupportedException is thrown.

Deep Frozen Dictionary

Looking into the implementation of FrozenDictionary reveals a few interesting details.

FrozenDictionary itself is an abstract class.
When creating a FrozenDictionary they can be optimized for construction or reading.
The specific implementation depends on the source dictionary's size and type of the key.

Let's take an actual example by freezing Dictionary<int, string> with 100 items. An overload of ToFrozenDictionary allows to make it optimized for reading:

DictionaryData.ToFrozenDictionary(optimizeForReading: true);

In this case type of the key of this dictionary is a value type. Based on the size of the dictionary and the actual value type, either a SmallComparableValueTypeFrozenDictionary or Int32FrozenDictionary or ValueTypeDefaultComparerFrozenDictionary is used. The implementation of ToFrozenDictionary also considers reference types, and in particular string keys.

In the case above with 100 items we get a Int32FrozenDictionary. The implementation of Int32FrozenDictionary is relatively short, the class can fit on my screen. It has two interesting readonly fields:

private readonly FrozenHashTable _hashTable;
private readonly TValue[] _values;

TValue[] stores the values of the original dictionary in an array. This array is built from the Dictionary upon construction. FrozenHashTable is responsible for translating a key into a start and an end index representing a range in TValue[] which has the value corresponding to the key. FrozenHashTable is shared by multiple other frozen collections. Its implementation is considerably more complex with calculating the number of buckets upon constructions.

Performance

Does all this manifest as a performance improvement? As outlined earlier, the performance characteristics will greatly depend on the concrete FrozenDictionary implementation. In this post focusing on FrozenDictionary<int, string> I performed the following measurements:

Using TryGet to lookup a specific key that exists in the dictionary.
Enumerate all items of the dictionary.

Number of keys in the dictionary: 4

Method	Mean	Error	StdDev
FrozenDictionaryTryGet	2.864 ns	0.0786 ns	0.0735 ns
DictionaryGet	3.223 ns	0.0475 ns	0.0421 ns
FrozenDictionaryEnumerate	8.704 ns	0.0745 ns	0.0697 ns
DictionaryEnumerate	17.695 ns	0.1689 ns	0.1580 ns

Number of keys in the dictionary: 10

Method	Mean	Error	StdDev
FrozenDictionaryTryGet	2.847 ns	0.0706 ns	0.0661 ns
DictionaryGet	3.195 ns	0.0735 ns	0.0651 ns
FrozenDictionaryEnumerate	19.413 ns	0.2199 ns	0.2057 ns
DictionaryEnumerate	28.225 ns	0.4060 ns	0.3599 ns

Number of keys in the dictionay: 11

Method	Mean	Error	StdDev
FrozenDictionaryTryGet	3.216 ns	0.1082 ns	0.1063 ns
DictionaryGet	3.270 ns	0.1014 ns	0.0948 ns
FrozenDictionaryEnumerate	21.197 ns	0.2632 ns	0.2333 ns
DictionaryEnumerate	30.961 ns	0.4135 ns	0.3868 ns

Number of keys in the dictionary: 4000

Method	Mean	Error	StdDev
FrozenDictionaryTryGet	3.079 ns	0.0674 ns	0.0630 ns
DictionaryGet	3.104 ns	0.0303 ns	0.0236 ns
FrozenDictionaryEnumerate	5,870.162 ns	25.6190 ns	22.7106 ns
DictionaryEnumerate	8,288.615 ns	58.9123 ns	52.2242 ns

Measurements on the early version of the preview show that there are performance gains achieved, however the size of the gain varies greatly on the data type, the number of dictionary items and the operations we perform.

Conclusion

I see Frozen data types are perfect for the use case where a collection is built up once but used for an extended period of time for read operations. For example, an application may create lookup tables during a startup phase, which lookup tables can be used throughout the lifetime of the application. Frozen types might also be beneficial in very tight loops in case the loop needs to read data from a collection. However, using it generally should be done with caution: freezing an existing dictionary or set is considerable amount of work, which can absorb the benefits on the read operations.

Frozen BCL performance collections

Hello, I am Laszlo

.NET developer