Lookups with Switch Expressions

In a recent PR I have optimized parsing known header values, such CacheControlHeaderValue or MediaTypeHeaderValue.

The original code used a Dictionary<> to look up a Func<string, T?> parsing method, with a string input and a generic type T return type. In this blog post I will summarize the pros and cons of different alternatives that were considered as part of the optimization.

FrozenDictionary

This dictionary contains known Func<string, T?> parsers. These known parsers do not change at runtime. Using a FrozenDictionary would bring performance benefits for the lookup operations. However, freezing the dictionary itself takes time at runtime, because when freezing the new dictionary builds an efficient lookup. This is usually worth the trade-off as freezing only needs to happen once during the lifetime of the application. However, in this case all the contents of this dictionary are known at compile time.

One suggestion could be to avoid the runtime costs by building the lookup data structure at compile time.

Switch

Using switch expression could move building the lookup data structure to compile time. C# compiler generates a decision tree to find the right parsing method based on the generic type parameter. Using switch expressions with Type Pattern Matching could an excellent tool. However, as only the return type of the method is generic, the C# 12 language does not allow pattern matching on this generic type parameter.

This means using the switch expression over the generic type parameter is not a feasible solution. Instead, the code could look up the name of the type with reflection and use switch statements build a decision tree based on the first character of the type name:

private static T? GetSwitch<T>(Dictionary<string, string> headers, string name)
{
    var value = headers[name];
        var tn = typeof(T).Name;

    object? temp = null;
    switch (tn[0])
    {
        case 'C' when typeof(T) == typeof(CacheControlHeaderValue):
            temp = ParseCacheControlHeaderValue;
            break;
        case 'C' when typeof(T) == typeof(ContentDispositionHeaderValue):
            temp = ParseCacheContentDispositionHeaderValue;
            break;
        // Other cases
    }
    // Invokking the parser
    // returning the results
}

Once the switch decides on the branch, we still need to compare the actual type of the generic type parameter with the known parser type. Otherwise, we would not be able to distinguish CacheControlHeaderValue and ContentDispositionHeaderValue or someone could create a type with the same name and different namespace. Also notice that switch statement uses a named method ParseCacheControlHeaderValue as a delegate instead of an anonymous method. This allows us to refer to the parser methods at compile time by its name.

If-Else

A suggestion has been raised by the .NET team to simply the proposal above by using if-else branches that compare actual type of the generic type parameter with the known types. This suggestion turns out to be beneficial for performance. The sample below shows the idea for the suggested code structure:

private static T? GetByTypeCast<T>(Dictionary<string, string> headers, string name)
{
    var value = headers[name].ToString();
    if (typeof(T) == typeof(CacheControlHeaderValue))
    {
        return (T?)(object?)ParseCacheControlHeaderValue(value);
    }
    else if (typeof(T) == typeof(ContentDispositionHeaderValue))
    {
        return (T?)(object?)ParseCacheContentDispositionHeaderValue(value);
    }
    // Other cases

Using the double casting is unavoidable to make the C# 12 compiler happy. More precisely it is avoidable only when the method delegates (like ParseCacheControlHeaderValue) are casted to Func<string, T> before at the callsite.

Comparison

The codebase refers to 9 known parsers. Most of these parsers return a reference type result, except two parsers that return DateTime and long value. In .NET 8 the JIT got smarter and even in Tier 0 it can apply dead code elimination. Given the JIT compiles separate code for generic value types, but the same canon for reference types, it can eliminate some of the branches of the method in question. For example, invoking the method GetByTypeCast with a long type parameter GetByTypeCast<long>(...) will only contain the parsing logic for the case that handles the long return types. Similarly, invoking GetByTypeCast<CacheControlHeaderValue>(...) will result code that eliminates the branches dedicated for the value types.

Value Types

This is key when further looking into the code jitted for the value types. For example, when the generic type parameter is long, the JIT can simplify the code further as shown below. The result assembly (machine) code of the method for the if-else solution is reduced to a simple type check and the parsing logic.

; Assembly listing for method BenchmarksAspNetCore:GetByTypeCast[System.Nullable`1[long]](System.Collections.Generic.Dictionary`2[System.String,System.String],System.String):System.Nullable`1[long] (Tier1)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; Tier1 code
; optimized code
; rsp based frame
; partially interruptible
; PGO data available, profile data was all zero
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data

G_M000_IG01:                ;; offset=0x0000
       push     rbx
       sub      rsp, 32
       mov      rbx, rcx

G_M000_IG02:                ;; offset=0x0008
       mov      rcx, rdx
       mov      rdx, r8
       cmp      dword ptr [rcx], ecx
       call     [System.Collections.Generic.Dictionary`2[System.__Canon,System.__Canon]:get_Item(System.__Canon):System.__Canon:this]
       cmp      byte  ptr [rax], al
       mov      rcx, rbx
       mov      rdx, rax
       call     [BenchmarksAspNetCore:ParseCacheInt64(System.String):System.Nullable`1[long]]
       mov      rax, rbx

G_M000_IG03:                ;; offset=0x0027
       add      rsp, 32
       pop      rbx
       ret

; Total bytes of code 45

The switch statement based implementation can also apply the dead code elimination optimization, however it still emits the code required for building the decision tree. Hence it results in a total of 347 bytes of code. At runtime it also needs to execute more code as it looks up the generic type parameters name using reflection and then walks through the decision tree:

call      qword ptr [7FF807706DA8]; System.RuntimeType.GetCachedName(System.TypeNameKind)
cmp       dword ptr [rax+8],0
jbe       near ptr M01_L09
movzx     r14d,word ptr [rax+0C]
cmp       r14d,45
jbe       short M01_L04
cmp       r14d,4D
je        short M01_L00
cmp       r14d,4E

Reference Types

The case for reference types is more interesting as less code can be eliminated, so the if-else based solution might need to evaluate more branch conditions. In the case of the if-else in the best case 1 and in the worst case 7 branch conditions are evaluated. In average that is 4 conditions.

In case the of the switch statements the code generates a lookup tree that evaluates in 2-4 conditions and an additional comparison of the type parameter and the known type. While in certain conditions this result less comparisons to the if-else solution, it comes at an extra cost of loading the type name of the generic type parameter and reading its first character:

call      qword ptr [7FF8076E6DA8]; System.RuntimeType.GetCachedName(System.TypeNameKind)
cmp       dword ptr [rax+8],0
jbe       near ptr M01_L23
movzx     r15d,word ptr [rax+0C]

Conclusion

Eventually the if-else solution provides better overall performance characteristics, hence that solution has been chosen as the accepted implementation in the pull request.

Note, that this does not provide a general rule of thumb for deciding whether an if-else like solutions are better to switch statements or switch expressions. This investigation is very specific to the described case: comparing generic type parameters with known types in C# 12. The actual types of the known types are also important factor. Given all known types were reference types or value types, one or the other solution would easily swing the pendulum.

While the purpose of this post was not to give a rule of thumb, it offers the background to better understand and replicate such analysis in tight loops that also required to be optimized.