Understanding RuntimeHelpers.IsKnownConstant Method

In this post my goal is to gain understanding of the RuntimeHelpers.IsKnownConstant method. I have come across it in the Performance Improvements in .NET 7 blog post. There are two examples presented in the post:

  • bool StartsWith(char value)

  • Math.Round with MidpointRounding mode is specified

I will learn about JIT optimization including RuntimeHelpers.IsKnownConstant through a simple and slightly artificial example. A bool IsPalindromePosition(int position) method is part of a type which has a static property ReadOnlySpan<char> Text. The method returns true when the given position from the beginning of the string returns the same character as one from the end. For the sake of simplicity this method omits all input validation.

public static ReadOnlySpan<char> Text => "hellobolleh";

[MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
public bool IsPalindromePosition(int position)
{
    if (int.IsOddInteger(Text.Length) && position == Text.Length / 2 + 1)
        return true;

    return Text[position] == Text[Text.Length - position - 1];
}

[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
public bool Work()
{
    var result = IsPalindromePosition(0);
    return result;
}

The method is called from another method: Work().

This implementation builds upon the realization that if the Text property's length is an odd number, and the callsite of IsPalindromePosition requests the position for the median character of Text - as it has no pair to compare with - the method can return true without indexing into the string. What is the benefit of having a special handling for this edge case?

Known Constants: 0

When IsPalindromePosition() method is invoked with the value of 0 in the Work() method, the following code is JITTED for the Work() method:

; Assembly listing for method Program:Work():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 5 inlinees with PGO data; 19 single block inlinees; 1 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H

G_M000_IG02:                ;; offset=0000H
       48B8D820804781010000 mov      rax, 0x181478020D8
       488B00               mov      rax, gword ptr [rax]
       4883C00C             add      rax, 12
       488BD0               mov      rdx, rax
       0FB712               movzx    rdx, word  ptr [rdx]
       0FB74014             movzx    rax, word  ptr [rax+14H]
       3BD0                 cmp      edx, eax
       0F94C0               sete     al
       0FB6C0               movzx    rax, al

G_M000_IG03:                ;; offset=0023H
       C3                   ret

; Total bytes of code 36

This is neat, as the whole IsPalindromePosition() method got inlined. Even better, the first if branch has been completely eliminated, as the JIT compiler knows that with the value 0 will never satisfy the conditions. Notice, the value of 14H (20 decimal), is used to point at last character of the string (Text is 11 char long, which is 22 bytes). Comparing the first and last characters means to compare the first and last 2 bytes. A word ptr is addressing these 2 bytes, word ptr [rdx] addresses the first character and word ptr [rax+14H] addresses the last character. This is even more evident, when I change the input parameter of IsPalindromePosition() method to 1. The corresponding generated assembly code looks as:

0FB75202             movzx    rdx, word  ptr [rdx+02H]
0FB74012             movzx    rax, word  ptr [rax+12H]

Known Constants: 6

However, changing the input parameter to a value that points to the middle character, generates significantly different code:

IsPalindromePosition(6);
; Assembly listing for method Program:Work():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 5 inlinees with PGO data; 19 single block inlinees; 1 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H

G_M000_IG02:                ;; offset=0000H
       B801000000           mov      eax, 1

G_M000_IG03:                ;; offset=0005H
       C3                   ret

; Total bytes of code 6

This code is significantly shorter and simpler. It simply loads the value of 1 into the eax register, which means it returns true. It is the complement branch of the previous if condition. The JIT compiler notices that the text is known (as compiled into the assembly) as well as the input parameter is known too. These concrete values satisfy the first if condition, hence it can inline the results of the int.IsOddInteger(Text.Length) && position == Text.Length / 2 + 1. JIT also applies dead code elimination to remove the rest of the method. This is extremely powerful for an application that is known to use constant runtime arguments.

The General Case

However, it is more often the case that value is unknown at compile time. In these cases the JIT cannot apply the same optimizations. To demonstrate this, change the Work() method to pass a value that is not constant, and this results a completely different assembly code. Here, I used DateTime.UtcNow.Ticks to generate a number between 0 and 10 inclusive as the length of the Text is 11 characters:

IsPalindromePosition((int)(DateTime.UtcNow.Ticks % 11));

The JITTED code is:

; Assembly listing for method Program:Work():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 5 inlinees with PGO data; 20 single block inlinees; 1 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H
       4883EC28             sub      rsp, 40

G_M000_IG02:                ;; offset=0004H
       FF15C6E61000         call     [DateTime:get_UtcNow():DateTime]
       48B9FFFFFFFFFFFFFF3F mov      rcx, 0x3FFFFFFFFFFFFFFF
       4823C8               and      rcx, rax
       48BAA38B2EBAE8A28B2E mov      rdx, 0x2E8BA2E8BA2E8BA3
       488BC2               mov      rax, rdx
       48F7E9               imul     rdx:rax, rcx
       488BC2               mov      rax, rdx
       48C1E83F             shr      rax, 63
       48D1FA               sar      rdx, 1
       4803C2               add      rax, rdx
       486BC00B             imul     rax, rax, 11
       482BC8               sub      rcx, rax
       8BC1                 mov      eax, ecx
       83F806               cmp      eax, 6
       7507                 jne      SHORT G_M000_IG04

G_M000_IG03:                ;; offset=0042H
       B801000000           mov      eax, 1
       EB39                 jmp      SHORT G_M000_IG05

G_M000_IG04:                ;; offset=0049H
       48BAD82040BABF010000 mov      rdx, 0x1BFBA4020D8
       488B12               mov      rdx, gword ptr [rdx]
       4883C20C             add      rdx, 12
       488BCA               mov      rcx, rdx
       83F80B               cmp      eax, 11
       7325                 jae      SHORT G_M000_IG06
       448BC0               mov      r8d, eax
       420FB70C41           movzx    rcx, word  ptr [rcx+2*r8]
       F7D8                 neg      eax
       83C00A               add      eax, 10
       83F80B               cmp      eax, 11
       7313                 jae      SHORT G_M000_IG06
       8BC0                 mov      eax, eax
       0FB70442             movzx    rax, word  ptr [rdx+2*rax]
       3BC1                 cmp      eax, ecx
       0F94C0               sete     al
       0FB6C0               movzx    rax, al

G_M000_IG05:                ;; offset=0082H
       4883C428             add      rsp, 40
       C3                   ret

G_M000_IG06:                ;; offset=0087H
       E8B485C35F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

; Total bytes of code 141

This code is significantly larger with 141 bytes, but it also has the whole IsPalindromePosition() method inlined, with code for both branches of the if statement. The assembly code at label G_M000_IG02 tests the conditions of the if statement.

There are interesting things to notice: cmp eax, 6 the second half of the condition (&& position == Text.Length / 2 + 1) is still constant folded. While label G_M000_IG04 needs to bound check twice, for each indexing operation.

However, notice that the optimization for the edge case when the input value points to the median character is not necessary. IsPalindromePosition() method works correctly, even without the following code:

if (int.IsOddInteger(Text.Length) && position == Text.Length / 2 + 1)
    return true;

Even better, this results a smaller assembly code:

; Assembly listing for method Program:Work():bool:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 3 inlinees with PGO data; 12 single block inlinees; 0 inlinees without PGO data

G_M000_IG01:                ;; offset=0000H
       4883EC28             sub      rsp, 40

G_M000_IG02:                ;; offset=0004H
       FF15C6E61000         call     [DateTime:get_UtcNow():DateTime]
       48B9FFFFFFFFFFFFFF3F mov      rcx, 0x3FFFFFFFFFFFFFFF
       4823C8               and      rcx, rax
       48BAA38B2EBAE8A28B2E mov      rdx, 0x2E8BA2E8BA2E8BA3
       488BC2               mov      rax, rdx
       48F7E9               imul     rdx:rax, rcx
       488BC2               mov      rax, rdx
       48C1E83F             shr      rax, 63
       48D1FA               sar      rdx, 1
       4803C2               add      rax, rdx
       486BC00B             imul     rax, rax, 11
       482BC8               sub      rcx, rax
       8BC1                 mov      eax, ecx
       48BAD820C09C8F020000 mov      rdx, 0x28F9CC020D8
       488B12               mov      rdx, gword ptr [rdx]
       4883C20C             add      rdx, 12
       488BCA               mov      rcx, rdx
       83F80B               cmp      eax, 11
       7325                 jae      SHORT G_M000_IG04
       448BC0               mov      r8d, eax
       420FB70C41           movzx    rcx, word  ptr [rcx+2*r8]
       F7D8                 neg      eax
       83C00A               add      eax, 10
       83F80B               cmp      eax, 11
       7313                 jae      SHORT G_M000_IG04
       8BC0                 mov      eax, eax
       0FB70442             movzx    rax, word  ptr [rdx+2*rax]
       3BC1                 cmp      eax, ecx
       0F94C0               sete     al
       0FB6C0               movzx    rax, al

G_M000_IG03:                ;; offset=0076H
       4883C428             add      rsp, 40
       C3                   ret

G_M000_IG04:                ;; offset=007BH
       E8C085C25F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3

; Total bytes of code 129

The benefits of smaller assembly size are difficult to measure in this case. Although this code omits the cmp eax, 6 instruction the microbenchmarks I performed showed negligible improvements in execution time.

However, if I take this implementation and compare it with the original in a benchmark measuring the case of the input value being zero, we get a significant performance difference:

|         Method |      Mean |     Error |    StdDev |    Median |
|--------------- |----------:|----------:|----------:|----------:|
|           Work | 0.0042 ns | 0.0063 ns | 0.0083 ns | 0.0000 ns |
| WorkNoEdgeCase | 0.2765 ns | 0.0393 ns | 0.0589 ns | 0.3000 ns |

The Best of Both

How can we achieve the best of both worlds: have a small code size for the generic case, but also leverage the performance gains for the known value? The .NET team uses the RuntimeHelpers.IsKnownConstant method, which is a known JIT intrinsic. It means when the JIT compiles this method, it can replace IsKnownConstant(int t) => false when the value is known, hence enabling optimizations.

Unfortunately, this method is internal, so we cannot use it in application code today. Although, it looks a powerful optimization, in practice I find it difficult to apply it for real use-cases. Here are the preconditions I found so far for choosing applicable methods:

  • the method must have a generic implementation

  • certain rare edge case must exist that we can optimize for with a custom implementation

  • the conditions for this edge case should depend on the input parameter

  • optimizing for this edge case would not bring benefit for the generic case

The overall benefit for applying this method, would be avoiding bloated the JITTED code for the generic case. I believe end user applications can optimize to their profile even without RuntimeHelpers.IsKnownConstant, however libraries in performance sensitive tight loops could take advantage of this internal helper method.