Understanding RuntimeHelpers.IsKnownConstant Method
02/04/2023
9 minutes
In this post my goal is to gain understanding of the RuntimeHelpers.IsKnownConstant
method. I have come across it in the Performance Improvements in .NET 7 blog post. There are two examples presented in the post:
bool StartsWith(char value)
Math.Round
with MidpointRounding mode is specified
I will learn about JIT optimization including RuntimeHelpers.IsKnownConstant
through a simple and slightly artificial example. A bool IsPalindromePosition(int position)
method is part of a type which has a static property ReadOnlySpan<char> Text
. The method returns true when the given position
from the beginning of the string returns the same character as one from the end. For the sake of simplicity this method omits all input validation.
public static ReadOnlySpan<char> Text => "hellobolleh"; [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)] public bool IsPalindromePosition(int position) { if (int.IsOddInteger(Text.Length) && position == Text.Length / 2 + 1) return true; return Text[position] == Text[Text.Length - position - 1]; } [MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)] public bool Work() { var result = IsPalindromePosition(0); return result; }
The method is called from another method: Work()
.
This implementation builds upon the realization that if the Text
property's length is an odd number, and the callsite of IsPalindromePosition
requests the position for the median character of Text
- as it has no pair to compare with - the method can return true without indexing into the string. What is the benefit of having a special handling for this edge case?
Known Constants: 0
When IsPalindromePosition()
method is invoked with the value of 0 in the Work()
method, the following code is JITTED for the Work()
method:
; Assembly listing for method Program:Work():bool:this ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows ; Tier-1 compilation ; optimized code ; rsp based frame ; partially interruptible ; No PGO data ; 5 inlinees with PGO data; 19 single block inlinees; 1 inlinees without PGO data G_M000_IG01: ;; offset=0000H G_M000_IG02: ;; offset=0000H 48B8D820804781010000 mov rax, 0x181478020D8 488B00 mov rax, gword ptr [rax] 4883C00C add rax, 12 488BD0 mov rdx, rax 0FB712 movzx rdx, word ptr [rdx] 0FB74014 movzx rax, word ptr [rax+14H] 3BD0 cmp edx, eax 0F94C0 sete al 0FB6C0 movzx rax, al G_M000_IG03: ;; offset=0023H C3 ret ; Total bytes of code 36
This is neat, as the whole IsPalindromePosition()
method got inlined. Even better, the first if
branch has been completely eliminated, as the JIT compiler knows that with the value 0
will never satisfy the conditions. Notice, the value of 14H (20 decimal), is used to point at last character of the string (Text
is 11 char long, which is 22 bytes). Comparing the first and last characters means to compare the first and last 2 bytes. A word ptr
is addressing these 2 bytes, word ptr [rdx]
addresses the first character and word ptr [rax+14H]
addresses the last character. This is even more evident, when I change the input parameter of IsPalindromePosition()
method to 1
. The corresponding generated assembly code looks as:
0FB75202 movzx rdx, word ptr [rdx+02H] 0FB74012 movzx rax, word ptr [rax+12H]
Known Constants: 6
However, changing the input parameter to a value that points to the middle character, generates significantly different code:
IsPalindromePosition(6);
; Assembly listing for method Program:Work():bool:this ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows ; Tier-1 compilation ; optimized code ; rsp based frame ; partially interruptible ; No PGO data ; 5 inlinees with PGO data; 19 single block inlinees; 1 inlinees without PGO data G_M000_IG01: ;; offset=0000H G_M000_IG02: ;; offset=0000H B801000000 mov eax, 1 G_M000_IG03: ;; offset=0005H C3 ret ; Total bytes of code 6
This code is significantly shorter and simpler. It simply loads the value of 1 into the eax
register, which means it returns true. It is the complement branch of the previous if
condition. The JIT compiler notices that the text is known (as compiled into the assembly) as well as the input parameter is known too. These concrete values satisfy the first if
condition, hence it can inline the results of the int.IsOddInteger(Text.Length) && position == Text.Length / 2 + 1
. JIT also applies dead code elimination to remove the rest of the method. This is extremely powerful for an application that is known to use constant runtime arguments.
The General Case
However, it is more often the case that value is unknown at compile time. In these cases the JIT cannot apply the same optimizations. To demonstrate this, change the Work()
method to pass a value that is not constant, and this results a completely different assembly code. Here, I used DateTime.UtcNow.Ticks
to generate a number between 0 and 10 inclusive as the length of the Text
is 11 characters:
IsPalindromePosition((int)(DateTime.UtcNow.Ticks % 11));
The JITTED code is:
; Assembly listing for method Program:Work():bool:this ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows ; Tier-1 compilation ; optimized code ; rsp based frame ; partially interruptible ; No PGO data ; 5 inlinees with PGO data; 20 single block inlinees; 1 inlinees without PGO data G_M000_IG01: ;; offset=0000H 4883EC28 sub rsp, 40 G_M000_IG02: ;; offset=0004H FF15C6E61000 call [DateTime:get_UtcNow():DateTime] 48B9FFFFFFFFFFFFFF3F mov rcx, 0x3FFFFFFFFFFFFFFF 4823C8 and rcx, rax 48BAA38B2EBAE8A28B2E mov rdx, 0x2E8BA2E8BA2E8BA3 488BC2 mov rax, rdx 48F7E9 imul rdx:rax, rcx 488BC2 mov rax, rdx 48C1E83F shr rax, 63 48D1FA sar rdx, 1 4803C2 add rax, rdx 486BC00B imul rax, rax, 11 482BC8 sub rcx, rax 8BC1 mov eax, ecx 83F806 cmp eax, 6 7507 jne SHORT G_M000_IG04 G_M000_IG03: ;; offset=0042H B801000000 mov eax, 1 EB39 jmp SHORT G_M000_IG05 G_M000_IG04: ;; offset=0049H 48BAD82040BABF010000 mov rdx, 0x1BFBA4020D8 488B12 mov rdx, gword ptr [rdx] 4883C20C add rdx, 12 488BCA mov rcx, rdx 83F80B cmp eax, 11 7325 jae SHORT G_M000_IG06 448BC0 mov r8d, eax 420FB70C41 movzx rcx, word ptr [rcx+2*r8] F7D8 neg eax 83C00A add eax, 10 83F80B cmp eax, 11 7313 jae SHORT G_M000_IG06 8BC0 mov eax, eax 0FB70442 movzx rax, word ptr [rdx+2*rax] 3BC1 cmp eax, ecx 0F94C0 sete al 0FB6C0 movzx rax, al G_M000_IG05: ;; offset=0082H 4883C428 add rsp, 40 C3 ret G_M000_IG06: ;; offset=0087H E8B485C35F call CORINFO_HELP_RNGCHKFAIL CC int3 ; Total bytes of code 141
This code is significantly larger with 141 bytes, but it also has the whole IsPalindromePosition()
method inlined, with code for both branches of the if
statement. The assembly code at label G_M000_IG02
tests the conditions of the if
statement.
There are interesting things to notice: cmp eax, 6
the second half of the condition (&& position == Text.Length / 2 + 1
) is still constant folded. While label G_M000_IG04
needs to bound check twice, for each indexing operation.
However, notice that the optimization for the edge case when the input value points to the median character is not necessary. IsPalindromePosition()
method works correctly, even without the following code:
if (int.IsOddInteger(Text.Length) && position == Text.Length / 2 + 1) return true;
Even better, this results a smaller assembly code:
; Assembly listing for method Program:Work():bool:this ; Emitting BLENDED_CODE for X64 CPU with AVX - Windows ; Tier-1 compilation ; optimized code ; rsp based frame ; partially interruptible ; No PGO data ; 3 inlinees with PGO data; 12 single block inlinees; 0 inlinees without PGO data G_M000_IG01: ;; offset=0000H 4883EC28 sub rsp, 40 G_M000_IG02: ;; offset=0004H FF15C6E61000 call [DateTime:get_UtcNow():DateTime] 48B9FFFFFFFFFFFFFF3F mov rcx, 0x3FFFFFFFFFFFFFFF 4823C8 and rcx, rax 48BAA38B2EBAE8A28B2E mov rdx, 0x2E8BA2E8BA2E8BA3 488BC2 mov rax, rdx 48F7E9 imul rdx:rax, rcx 488BC2 mov rax, rdx 48C1E83F shr rax, 63 48D1FA sar rdx, 1 4803C2 add rax, rdx 486BC00B imul rax, rax, 11 482BC8 sub rcx, rax 8BC1 mov eax, ecx 48BAD820C09C8F020000 mov rdx, 0x28F9CC020D8 488B12 mov rdx, gword ptr [rdx] 4883C20C add rdx, 12 488BCA mov rcx, rdx 83F80B cmp eax, 11 7325 jae SHORT G_M000_IG04 448BC0 mov r8d, eax 420FB70C41 movzx rcx, word ptr [rcx+2*r8] F7D8 neg eax 83C00A add eax, 10 83F80B cmp eax, 11 7313 jae SHORT G_M000_IG04 8BC0 mov eax, eax 0FB70442 movzx rax, word ptr [rdx+2*rax] 3BC1 cmp eax, ecx 0F94C0 sete al 0FB6C0 movzx rax, al G_M000_IG03: ;; offset=0076H 4883C428 add rsp, 40 C3 ret G_M000_IG04: ;; offset=007BH E8C085C25F call CORINFO_HELP_RNGCHKFAIL CC int3 ; Total bytes of code 129
The benefits of smaller assembly size are difficult to measure in this case. Although this code omits the cmp eax, 6
instruction the microbenchmarks I performed showed negligible improvements in execution time.
However, if I take this implementation and compare it with the original in a benchmark measuring the case of the input value being zero, we get a significant performance difference:
| Method | Mean | Error | StdDev | Median | |--------------- |----------:|----------:|----------:|----------:| | Work | 0.0042 ns | 0.0063 ns | 0.0083 ns | 0.0000 ns | | WorkNoEdgeCase | 0.2765 ns | 0.0393 ns | 0.0589 ns | 0.3000 ns |
The Best of Both
How can we achieve the best of both worlds: have a small code size for the generic case, but also leverage the performance gains for the known value? The .NET team uses the RuntimeHelpers.IsKnownConstant
method, which is a known JIT intrinsic. It means when the JIT compiles this method, it can replace IsKnownConstant(int t) => false
when the value is known, hence enabling optimizations.
Unfortunately, this method is internal, so we cannot use it in application code today. Although, it looks a powerful optimization, in practice I find it difficult to apply it for real use-cases. Here are the preconditions I found so far for choosing applicable methods:
the method must have a generic implementation
certain rare edge case must exist that we can optimize for with a custom implementation
the conditions for this edge case should depend on the input parameter
optimizing for this edge case would not bring benefit for the generic case
The overall benefit for applying this method, would be avoiding bloated the JITTED code for the generic case. I believe end user applications can optimize to their profile even without RuntimeHelpers.IsKnownConstant
, however libraries in performance sensitive tight loops could take advantage of this internal helper method.