Laszlo

Hello, I am Laszlo

Software-Engineer, .NET developer

Contact Me

Object Stack Allocation in .NET 10

Object Stack Allocation is a performance optimization feature in .NET 10 that allows certain objects to be allocated on the stack instead of the heap. The JIT compiler identifies objects and object allocations that don't escape from a method and may decide to allocate these non-escaping objects on the stack. When these objects are stack-allocated, they get freed automatically when the stack frame is freed as the method returns, reducing the load on the Garbage Collector (GC).

Recent improvements in this area have been detailed in the Performance Improvements in .NET 10 blog post.

In this post, I will investigate the current state and limitations of this feature. Please note that the findings reveal some of the current internal limitations of the runtime, which may change or be adjusted in future releases.

Using Stopwatch

Find out more »


SIMD Sum 2

In this post, I'll explore how to optimize a simple array summing operation using SIMD (Single Instruction Multiple Data) operations in C#. The example is inspired by Matt Godbolt's GOTO 2024 talk What Every Programmer Should Know about How CPUs Work - Matt Godbolt - GOTO 2024 about CPU architecture and branch prediction. We'll see how leveraging SIMD instructions can dramatically improve performance by reducing branch mispredictions and processing multiple elements in parallel.

A part of this talk describes the branch prediction feature of CPU. It uses a simple task for demonstration: a method is given a large set of random numbers, sums the total of the numbers and separately also sums the numbers below 128. The talk shows a sample implementation in Python and C++ and explains the reasons for the observed performance difference.

In this post I will implement this example in C# with a single difference: the set of input numbers are bytes and not ints.

Naive Implementation

Find out more »


Lazy Properties in .NET 10

In this post, I explore a couple of ways to create lazy properties in C# and .NET 10. What does a lazy property mean in the context of this post? It is an object instance property that gets initialized with a value the first time its getter is invoked. The getter of the property does not need to provide thread-safe initialization. Let's review a couple of solutions available before .NET 10:

All the examples below initialize a string property. The initializing method is static and extracted as a member of a separate class:

public class Shared
{
    public static int _counter = 0;

    public static string Zeros()
    {
        Interlocked.Increment(ref _counter);
        return new string('0', _counter);
    }
}

The Zeros() method returns a new string object containing a number of 0 characters. The number of 0 characters corresponds to the number of times the method has been executed.

Find out more »


Span on Garbage Collection

A Span represents a continuous array of memory. As it is implemented by a ref struct the compiler makes sure that it does not escape to the heap (in .NET9). A Span may point to multiple types of memory segments, for example, it can point to an unmanaged memory segment, stack allocated memory or heap allocated memory. Span and ReadOnlySpan types are using an interior pointer which allows them to point to an address that is not necessarily the object's MT, but an address inside the object's memory representation. For example, they can point to the nth element of an array.

From the garbage collector's point of view, the interior pointers need special handling: the interior pointer must be translated to an address that points to the corresponding object's MT so it can be 'marked' as used memory. This is needed as an otherwise unrooted object would get garbage collected. The GC uses the brick table for the address translation.

As a ref struct type lives on the stack, it shall not cause additional allocations or pressure on the GC. Yet the address translation is extra work that the GC needs to do. The design decision for these types to be a ref struct is driven by the additional work required for the address translation. This way the GC does not need to handle interior pointers within heap allocated objects.

Does address translation have a measurable impact on garbage collection?

Find out more »


AsAsyncEnumerable Extension

This post explores a template implementation of method that converts an IEnumerable<T> to IAsyncEnumerable<T>. IAsyncEnumerable has been introduced to provide asynchronous iteration over values. The following implementation takes a synchronous enumerable and converts it to asynchronous, allowing the execution of async code in-between each iteration of the source. One could achieve similar results by using a regular foreach loop with an await-ed method in the loop body. However, one might want to create an extension method over for simplified syntax as shown below.

This sample extension method and type contains no 'real' async method invocation, but a developer can easily extend it with a tailored async method call or a delegate (Func<T,Task>) invocation. A good place for such a call would be in the body of MoveNextAsync method. This type as-is only useful for converting sync enumerable to async for example, for mocking purposes in unit tests.

Design decisions for the following type follow the design of the built-in Iterator of .NET class library.

  • The source argument is validated without the need of iterating the AsyncEnumerable<T> instance.
  • The iterator does not cancel iteration itself, based on co-operative cancellation it could (should) pass the cancellation token async method invoked.
  • The enumerable and the enumerator are implemented by the same type.
  • The state of the enumerator is captured by the _state variable.
  • The iterator is not thread-safe.
  • The inner iterator is initialized lazily.
  • The Current property returns default in non-iterating states, which may be null.

Find out more »