Laszlo

Hello, I am Laszlo

Software-Enginner, .NET developer

Contact Me

Division in Assembly

HTTP/3 specification reserves a range of identifiers for streams and frame types. The range as 0x1f * N + 0x21 for non-negative integer values of N.

A received identifier should be validated against the reserved range. This involves subtrackting 33 and then validating if the result is a multiple of 31. The number 31 holds special importance in this context as it is represented by 2N - 1, or 0x0001_1111. This property influences the approaches used for validation.

Several strategies were considered to verify whether a value is a multiple of 31:

  • bit manipulation (summing every 5 bits) value to check if the sum is a multiple of 31.

  • lookup table

  • using divrem Input % 31 == 0

  • using integer division and multiplication (Input / 31) * 31 == Input

  • multiplication and bit shifting (divide by 2).

A few of these approaches are applicable when the divisor is 31. I decided to compare the following solutions:

public bool Multiply() => (Input / 31) * 31 == Input;

public bool DivRem() => Input % 31 == 0;

Comparative Testing and Results

To compare these approaches, tests were conducted using inputs 62, 100, and 124. The results indicated that the DivRem method was marginally favored. However, the observed differences were minimal, and the accompanying error and standard deviation suggested that the measurements were accurate and not due to random error. Notably, the code size for both approaches was identical, prompting further investigation into the Just-In-Time (JIT) compiled code.

| Method   | Input | Mean      | Error     | StdDev    | Median    | Code Size |
|--------- |------ |----------:|----------:|----------:|----------:|----------:|
| Multiply | 62    | 0.2566 ns | 0.0041 ns | 0.0036 ns | 0.2564 ns |      40 B |
| DivRem   | 62    | 0.0015 ns | 0.0021 ns | 0.0020 ns | 0.0001 ns |      40 B |
| Multiply | 100   | 0.2569 ns | 0.0020 ns | 0.0019 ns | 0.2558 ns |      40 B |
| DivRem   | 100   | 0.0000 ns | 0.0000 ns | 0.0000 ns | 0.0000 ns |      40 B |
| Multiply | 124   | 0.2094 ns | 0.0025 ns | 0.0022 ns | 0.2089 ns |      40 B |
| DivRem   | 124   | 0.0016 ns | 0.0026 ns | 0.0022 ns | 0.0001 ns |      40 B |

JIT Compilation and Benchmarking Insights

Upon examining the JITted code, it was found that both the DivRem and Multiplication methods produced equivalent assembly code. BenchmarkDotNet, the tool used for measurement, is not always precise for instruction-level benchmarking, and there is an open issue addressing this limitation. Ultimately, the assembly code generated was an optimized implementation for division by 31, provided by the JIT compiler. In practice, it does not matter which solution is chosen, as both result in the same optimized code.

; Divison31.DivRem()
       mov       ecx,[rcx+8]
       mov       edx,84210843
       mov       eax,edx
       imul      ecx
       add       edx,ecx
       mov       eax,edx
       shr       eax,1F
       sar       edx,4
       add       eax,edx
       mov       edx,eax
       shl       edx,5
       sub       edx,eax
       sub       ecx,edx
       sete      al
       movzx     eax,al
       ret
; Total bytes of code 40

Conclusion

The analysis revealed that the JIT compiler optimizes division by 31 using multiplication and bit shifting (2^N / M), just as outlined in the earlier approaches. Therefore, regardless of the method selected in code, the underlying assembly remains efficient and consistent.