Calculating U.S. Treasury Notes Pricing

A few years ago, I worked on a type that could process double values received as real-time market data updates and helped to convert them to be displayed for humans on a UI.

The type was used to process market data updates of U.S. Treasury Notes. The pricing of U.S. Treasury Notes is detailed on the cmegroup site. In this post I will focus on the pricing of bond and note products while keeping futures aside.

The prices of these products are displayed in fractions. The price of such a product may look as 99-032. This means it is 99 full points plus 3/32s of a point plus 2/8 of 1/32s of a point. The last digit of the price can take up values of 0 2, + and 6, where plus denotes 4/8 or 1/2.Hence 99-032 equals 99 + 3/32 + 2/256 = 99.1015625. Note, that the last digit is always a multiple of 1/4th of a 1/32nd. Another example may be 100-23+ which equals 100.734375. However, this post will focus on displaying such data, converting it from a double such as 100.734375 to 100-23+.

A few years back .NET was less performant. In this post, I will revisit the original implementation as well as review 3 implementations I would consider building today to take advantage of the performance benefits of the modern .NET. Even the original implementation is significantly faster when run on .NET 8 due to the number of optimizations that have been built into the BCL and the JIT.

Please note, that the code samples in this post expect non-evil inputs. This post does not showcase production grade code snippets. Please don't use them as-is in real-world applications. The examples assume en-us as the default culture.

Original Implementation

I don't recall the exact original implementation and I omit edge cases that are not relevant for U.S. Treasury Notes, the original implementation could take shape as shown below.

public class FractionalClassic(int Integer, int Fraction32, int Fraction4)
{
    public double AsDouble() => Integer + (Fraction32 / 32.0) + (Fraction4 / 128.0);

    public override string ToString()
    {
        char lastChar = Fraction4 switch
        {
            0 => '0',
            1 => '2',
            2 => '+',
            3 => '6',
            _ => throw new ArgumentOutOfRangeException()
        };
        return $"{Integer}-{Fraction32:00}{lastChar}";
    }
}

Here I am using C# 12's primary constructor. The type accepts the Integer value which represents the full points, Fraction32 the 1/32s and Fraction4 is 1/4 of a 1/32nd. It has an additional method on top of the default methods inherited from object. AsDouble() returns the original value as a double. It also has the ToString() method overridden which uses C# interpolated string feature to format the full and fractional points into the desired string object. The original implementation probably concatenated the string as return Integer.ToString() + "-" + Fraction32.ToString("00") + lastChar; given at the time of implementation string interpolation was not yet available.

The difference between the interpolated string version above and the referenced string concatenation becomes apparent when one looks at the memory allocations. Based on my measurements with 100 fractional double values the string concatenation allocates 13 KB on the heap while the version with string interpolations allocates 7.31 KB.

Each of the following implementations shown in this post is benchmarked with BenchmarkDotNet. The benchmarks are given a double array with 100 values to be converted into string objects. The benchmark splits the double value into full and fractions points, then uses the showcased fractional type implementation to create a string.

The corresponding benchmark of the above method is implemented as:

[Benchmark]
public void ClassicToString()
{
    int i = 0;
    foreach (double price in Prices)
    {
        var integerPart = (int)price;
        var remainder = (price - integerPart) * 32;
        byte fraction32 = (byte)remainder;
        byte fraction4 = (byte)((remainder - fraction32) * 4);
        var fractional = new FractionalClassic(integerPart, fraction32, fraction4);
        DisplayPrices[i++] = fractional.ToString();
    }
}

Note, that the result string objects are not thrown away, but referenced by the DisplayPrices array. Double values that do not conform to a valid fractional number are rounded down to the nearest fraction.

Fractional Improvements

In the second part of this blog post I will incrementally improve the current solution.

Fractional

Let's introduce Fractional type. Compared to the original implementation the Fractional is a value type implemented as a record struct. This avoids an extra heap allocation, given this is typically a short lived. When passed around it gets copied by value. It has different type of fields too: it uses an int for the full points and uses byte for the fractions. Given fractional point values are limited to 32 and 4 respectively, the smaller data type is enough to represent them. The type uses required properties with essential validation in the setters. For example, it does not allow negative prices (note, that in rare real-life use-cases a negative price could be valid) and does not allow larger fractions to 31 and 4.

public record struct Fractional : ISpanFormattable
{
    private int _integer;
    private byte _fraction32;
    private byte _fraction4;

    public required int Integer
    {
        get => _integer; init
        {
            if (value < 0)
                throw new ArgumentOutOfRangeException();
            _integer = value;
        }
    }
    public required byte Fraction32
    {
        get => _fraction32; init
        {
            if (value > 31 || value < 0)
                throw new ArgumentOutOfRangeException();
            _fraction32 = value;
        }
    }
    public required byte Fraction4
    {
        get => _fraction4; init
        {
            if (value > 3 || value < 0)
                throw new ArgumentOutOfRangeException();
            _fraction4 = value;
        }
    }

    public double AsDouble() => _integer + (Fraction32 / 32.0) + (Fraction4 / 128.0);

    public string ToString(string? format, IFormatProvider? formatProvider)
    {
        Span<char> destination = stackalloc char[14];
        if (TryFormat(destination, out int written, format, formatProvider))
            return destination[0..written].ToString();
        throw new ArgumentOutOfRangeException();
    }

    public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format = default, IFormatProvider? provider = default)
    {
        charsWritten = 0;
        if (!Integer.TryFormat(destination, out int currentWritten))
            return false;
        charsWritten += currentWritten;
        if (charsWritten + 1 > destination.Length)
            return false;
        destination[charsWritten++] = '-';
        if (!Fraction32.TryFormat(destination[charsWritten..], out currentWritten, "00"))
            return false;
        charsWritten += currentWritten;
        if (charsWritten + 1 > destination.Length)
            return false;

        char lastChar = Fraction4 switch
        {
            0 => '0',
            1 => '2',
            2 => '+',
            3 => '6'
        };
        destination[charsWritten++] = lastChar;
        return true;
    }

    public override string ToString() => ToString(null, null);
}

Another key difference is how it creates the string object. The ToString() method allocates an array on the stack for 14 characters. In a real application this should be a larger buffer, or in case the latter invoked TryFormat() fails, it could allocate a regular array on the heap. For simplicity this code only uses the stack allocated array. The type implements the ISpanFormattable interface that comes with the premise of a TryFormat() method. This method writes the fields of the type as a string into the provided destination buffer. Both the int and byte types implement the ISpanFormattable interface, so the code defers formatting to their TryFormat() method. The - character and the final digit are written as regular chars.

This solution brings performance improvements in terms of mean execution time and allocated memory as well:

|                 Method |       Mean |    Error |   StdDev |   Gen0 |   Gen1 | Allocated |
|----------------------- |-----------:|---------:|---------:|-------:|-------:|----------:|
|     FractionalToString | 5,071.4 ns | 47.27 ns | 41.91 ns | 0.6561 | 0.0076 |   4.06 KB |
|        ClassicToString | 6,857.8 ns | 52.74 ns | 44.04 ns | 1.1902 | 0.0153 |   7.31 KB |

FastFractional

The next iteration is called creatively the FastFractional. This post inlines the discussed methods of the type, while the complete implementation can be found on GitHub.

One of the two key differences with this type is that instead of having Fraction32 and Fraction4 properties we only have a single Fractions128 property. This not just makes the struct smaller and inherently faster to copy by value, but it also simplifies parsing the double input values.

The second key difference is the ToString() method implementation:

public record struct FastFractional : ISpanFormattable
{
    private byte _fractions128;

    public required uint Integer { get; init; }
    public required byte Fractions128
    {
        // ...
    }

    public string ToString(string? format, IFormatProvider? formatProvider)
    {
        return string.Create(CountDigits(Integer) + 4, ((Integer, Fractions128)), static (destination, state) =>
        {
            if (state.Integer > 89 && state.Integer < 128)
                BinaryPrimitives.WriteUInt64LittleEndian(MemoryMarshal.AsBytes(destination), GetInteger((int)state.Integer));
            else
                state.Integer.TryFormat(destination, out _);
            BinaryPrimitives.WriteUInt64LittleEndian(MemoryMarshal.AsBytes(destination[^4..]), GetFractions(state.Fractions128));
        });
    }

    // ...

    private static ulong GetInteger(int value)
    {
        ReadOnlySpan<ulong> table = new ulong[] { 0x20002000300039, 0x20002000310039, ... };
        return table[value - 90];
    }

    private static ulong GetFractions(int index)
    {
        ReadOnlySpan<ulong> table = new ulong[] { 0x3000300030002d, 0x3200300030002d, ... };
        return table[index];
    }

    private static int CountDigits(uint value)
    {
        // Algorithm based on https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster.
        ReadOnlySpan<long> table = new long[]
        {
                4294967296,
                8589934582,
                //...                
        };
        long tableValue = Unsafe.Add(ref MemoryMarshal.GetReference(table), uint.Log2(value));
        return (int)((value + tableValue) >> 32);
    }
}

The ToString() method uses the string.Create with the SpanAction overload to create the result string.

CountDigits() counts the number of chars required for the full points. The fractional points always use 3 chars and an additional - concatenating full and fractional points. The CountDigits() method uses an algorithm described by Daniel Lemire, and also used by .NET 8. The idea behind the algorithm is to calculate the number of chars required when the number is formatted as a binary number and convert that to decimal using log2 to log10 conversion formula. Negative prices are not considered by this implementation.

The SpanAction of string.Create method provides a callback, which writes the full and fractional points as number with two operations. A key observation for these numbers is that most of the values in practice fall in the [90..128] range. There might be prices higher or lower, but most real-life examples fall within this range. This gives an opportunity for an interesting optimization: we may pre-calculate the underlying byte values of the formatted number instead of formatting the integer value every time. The second optimization is that the numbers within this range are 2 or 3 characters. The 3 chars require 6 bytes, so we may use a larger, 8 byte ulong type to encode the underlying bytes of the original number. This optimization is also used by .NET 7. Finally, instead of writing the 3 characters with instructions, we can copy the ulong value with WriteUInt64LittleEndian() method as a single operation.

When the full points of the price does not fall within the [90..128] range the regular int.TryFormat() method is used to format the number as a string.

The fractional part of the price also utilizes the same idea: all values between -000 and -316 can be pre-encoded ulong numbers. The results are kept in a ulong array of 128 items. Finding the right number is looking it up in an array. Then formatting the value into the string happens by writing it using WriteUInt64LittleEndian to the destination buffer.

This approach yields a significant jump in the mean execution time on the given input sample. Note that the values of the input sample fall within the optimized range.

|                 Method |       Mean |    Error |   StdDev |   Gen0 |   Gen1 | Allocated |
|----------------------- |-----------:|---------:|---------:|-------:|-------:|----------:|
| FastFractionalToString |   736.3 ns |  9.31 ns |  8.71 ns | 0.6628 | 0.0095 |   4.06 KB |
|     FractionalToString | 5,071.4 ns | 47.27 ns | 41.91 ns | 0.6561 | 0.0076 |   4.06 KB |
|        ClassicToString | 6,857.8 ns | 52.74 ns | 44.04 ns | 1.1902 | 0.0153 |   7.31 KB |

CopyFractional

The final iteration of the type is called CopyFractional. The sample below highlights the differences compared to the previous solutions, while the complete example can be found on GitHub.

public string ToString(string? format, IFormatProvider? formatProvider)
{
    return string.Create(CountDigits(Integer) + 4, ((Integer, Fractions128)), static (destination, state) =>
    {
        ReadOnlySpan<char> calculated = "090-000090-002...";
        var startIndex = (int)(state.Integer - 90) * (7 * 128) + state.Fractions128 * 7;
        if (state.Integer > 99 && state.Integer < 128)
        {
            calculated.Slice(startIndex, 7).CopyTo(destination);
        }
        else if (state.Integer > 89 && state.Integer <= 99)
        {
            calculated.Slice(startIndex + 1, 6).CopyTo(destination);
        }
        else
        {
            state.Integer.TryFormat(destination, out _);
            BinaryPrimitives.WriteUInt64LittleEndian(MemoryMarshal.AsBytes(destination[^4..]), GetFractions(state.Fractions128));
        }
    });
}

This approach uses a similar idea to the previous solution. However, instead of writing the full and fractional parts in two iterations it uses a single copy when the value falls within the [90..128] range. All values are pre-formatted and concatenated in a single string that is stored in the application binaries. This string can be accessed using the ReadOnlySpan<char> calculated = instruction. ToString() method finds the corresponding slice of the preformatted string and copies it to the destination buffer.

The benchmarks show that this approach as the fastest on my test machine:

|                 Method |       Mean |    Error |   StdDev |   Gen0 |   Gen1 | Allocated |
|----------------------- |-----------:|---------:|---------:|-------:|-------:|----------:|
| FastFractionalToString |   736.3 ns |  9.31 ns |  8.71 ns | 0.6628 | 0.0095 |   4.06 KB |
| CopyFractionalToString |   676.9 ns | 13.01 ns | 11.54 ns | 0.6628 | 0.0095 |   4.06 KB |
|     FractionalToString | 5,071.4 ns | 47.27 ns | 41.91 ns | 0.6561 | 0.0076 |   4.06 KB |
|        ClassicToString | 6,857.8 ns | 52.74 ns | 44.04 ns | 1.1902 | 0.0153 |   7.31 KB |

However, this approach comes with a significant trade-off: the size of the binary increases by more than 60 KB as the whole preformatted string is compiled into the IL.

SIMD

A careful reader may wonder if Single instruction, multiple data, SIMD could be leveraged for the above task. While it is possible to vectorize parts of this algorithm, the final string object has to be created with a non-SIMD instruction. So far, I have not found an efficient solution to vectorize the algorithm, but if there is a managed way achieving this, I would be more than happy to learn about it. The fact that eventually the vectorized loop needs to iterate all the values of the vector one-by-one to create the individual string objects diminishes the performance benefit of using SIMD. My best attempts to vectorize the algorithm reached a mean execution time within the range of ~1000 ns.