Switch Expression or Visitor pattern
12/07/2024
5 minutes
A few months ago I looked into optimizing dictionary lookups using switch expressions.
In this post I investigate a seemingly similar problem, but with vastly different constraints, that eventually demand a vastly different solution.
Problem Definition
Given a set of polymorphic objects and a DTO. A method needs to iterate all the polymorphic objects and evaluate them against the DTO. Each polymorphic object evaluates a different (C#) property of the DTO. While there are a few designs to achieve this, in this case I seek for a design that does not introduce a coupling between the polymorphic objects and the DTO. I compare two possible solutions in this post.
First, let me show a minimal version of the polymorphic objects discussed above:
public abstract class Value { } public class AValue : Value { public double A { get; set; } } public class BValue : Value { public double B { get; set; } } // Similar for CValue, DValue, EValue, FValue, ...
In this post I use double
value in place of the more complex 'DTO' struct
used in the real-life example. Each object needs to compare its double
property (A, B, etc.) with the DTO. Note, that the Value
type or the derived AValue
, BValue
etc. types must not reference the DTO and vice-versa to maintain decoupling.
Possible Solutions
The first solution uses the C# switch expressions feature in conjunction with type pattern matching.
In the snippet below I created an enclosing type that has fields of _data
referring to the polymorphic collection of objects and _comperand
as the DTO. This type knows how to compare the DTO with the Value
types. For simplicity, in the example below all polymorphic objects are compared the same way with the DTO. In the real-life example these would be different properties and operations for each derived type.
// fields excluded from this sample public int SwitchExpressionType() { int sum = 0; for (int i = 0; i < _data.Length; i++) { sum += _data[i] switch { AValue a => a.A > _comperand ? 1 : 0, BValue b => b.B > _comperand ? 1 : 0, // ... _ => throw new NotImplementedException() }; } return sum; }
The second approach uses the visitor pattern. A visitor object encapsulates the comparisons logic. The polymorphic types can reference the visitor, while avoiding referencing the DTO:
public class Visitor(Value[] data, double comperand) { public int Sum { get; set; } public int Process() { Sum = 0; for (int i = 0; i < data.Length; i++) data[i].Visit(this); return Sum; } internal void Visit(AValue a) => Sum += a.A > comperand ? 1 : 0; internal void Visit(BValue b) => Sum += b.B > comperand ? 1 : 0; // ... } public abstract class Value { public abstract void Visit(Visitor visitor); } public class AValue : Value { public double A { get; set; } public override void Visit(Visitor visitor) => visitor.Visit(this); } // Similar for BValue, CValue, DValue, EValue, FValue, ...
In production level code the
Value
polymorphic types should reference an abstraction of the visitor instead of the concrete type.
Performance Comparison
The distribution of the derived Value
types significantly impacts the performance of these solutions. In this post I assume (and use) a discrete uniform distribution, meaning each types are equally probable. The DTO's value and each object's double
property is generated with the non-cryptographic Random
class.
I use .NET 9 Preview 3 and BenchmarkDotNet v0.13.12 to compare the performance. The input is an array of 1000 Value
objects. Each of them is an instance of one of the AValue
, ..., FValue
, ... types (matching the number of types indicated in the results).
In the initial comparison I used 6 derived types.
Method | Mean | Error | StdDev | Code Size |
---|---|---|---|---|
SwitchExpressionType | 5.481 us | 0.0639 us | 0.0598 us | 464 B |
Visitor | 1.510 us | 0.0298 us | 0.0366 us | 134 B |
The visitor pattern seems to offer a 3.5 times faster solution. Let me repeat the test with 4, 12, and 18 derived types:
4 derived types with uniform distribution:
Method | Mean | Error | StdDev | Code Size |
---|---|---|---|---|
SwitchExpressionType | 4.025 us | 0.0453 us | 0.0424 us | 369 B |
Visitor | 1.446 us | 0.0166 us | 0.0155 us | 134 B |
12 derived types with uniform distribution:
Method | Mean | Error | StdDev | Code Size |
---|---|---|---|---|
SwitchExpressionType | 10.665 us | 0.1170 us | 0.1037 us | 756 B |
Visitor | 3.280 us | 0.0607 us | 0.0790 us | 134 B |
18 derived types:
Method | Mean | Error | StdDev | Code Size |
---|---|---|---|---|
SwitchExpressionType | 14.937 us | 0.1835 us | 0.1716 us | 1,043 B |
Visitor | 3.809 us | 0.0708 us | 0.0627 us | 134 B |
Both the SwitchExpressionType
and Visitor
methods performance decrease when there are more types to handle. What drives the performance degradation?
Based on further testing, I observe that the performance of the switch expression depends on two things:
how many arms (patterns) the switch expression has
how many case guards are evaluated at runtime
For example, when the input collection has a single type only, the performance depends on whether it matches the first or the last type in the patterns of the switch expression. When observing the optimized (Tier-1) code using JITDiasm (environment variable: $env:DOTNET_JitDisasm
), a careful reader can observer that the possible types of the switch expression are compared one-by-one with the actual type. A fragment of the generated assembly code:
G_M000_IG04: ;; offset=0x0075 mov rdx, r14 mov rcx, 0x7FFB2138B3D8 call [CORINFO_HELP_ISINSTANCEOFCLASS] test rax, rax jne G_M000_IG16 G_M000_IG05: ;; offset=0x0091 mov rdx, r14 mov rcx, 0x7FFB2138B2D8 call [CORINFO_HELP_ISINSTANCEOFCLASS] test rax, rax jne G_M000_IG17
The performance of the Visitor pattern depends on the number of actual types used polymorphic collection. However, the JIT optimized Tier-1 code is the same with a single type or when using 18 derived types. The virtual method calls on the base type require the runtime's help to determine the exact method to be invoked:
mov rax, qword ptr [rax+0x40] call [rax+0x20]Value:Visit(Visitor):this
Here the results seem to be primarily driven by the performance of virtual method dispatch.