Laszlo

Hello, I am Laszlo

Software-Engineer, .NET developer

Contact Me

Using ClrMD for string analysis

Analysis of a .NET application's memory dump often reveals that byte[] and string types are among the most commonly allocated objects. This is not surprising, as most UI and web applications use string types to display data to users or byte[]/string to pass data over the network in HTTP/2 requests and responses. These string objects commonly represent configuration values, string literals, or data allocated by dependent libraries.

Analyzing strings in such a noisy environment is challenging. While common tools like Visual Studio, PerfView, and dotnet-dump provide high-level analysis, drilling into the details is often difficult. ClrMD is one of the tools that enables programmatic analysis of memory dumps. In my previous post Getting Started with ClrMD, I introduced how to analyze a full memory dump of a .NET application. This post explores a specialized analysis for strings: identifying application-defined types that hold references to large strings. These are the strings that developers have direct control over in their application code.

Execution Steps

First, collect a memory dump of a running application:

Find out more »


Object Stack Allocation in .NET 10

Object Stack Allocation is a performance optimization feature in .NET 10 that allows certain objects to be allocated on the stack instead of the heap. The JIT compiler identifies objects and object allocations that don't escape from a method and may decide to allocate these non-escaping objects on the stack. When these objects are stack-allocated, they get freed automatically when the stack frame is freed as the method returns, reducing the load on the Garbage Collector (GC).

Recent improvements in this area have been detailed in the Performance Improvements in .NET 10 blog post.

In this post, I will investigate the current state and limitations of this feature. Please note that the findings reveal some of the current internal limitations of the runtime, which may change or be adjusted in future releases.

Using Stopwatch

Find out more »


SIMD Sum 2

In this post, I'll explore how to optimize a simple array summing operation using SIMD (Single Instruction Multiple Data) operations in C#. The example is inspired by Matt Godbolt's GOTO 2024 talk What Every Programmer Should Know about How CPUs Work - Matt Godbolt - GOTO 2024 about CPU architecture and branch prediction. We'll see how leveraging SIMD instructions can dramatically improve performance by reducing branch mispredictions and processing multiple elements in parallel.

A part of this talk describes the branch prediction feature of CPU. It uses a simple task for demonstration: a method is given a large set of random numbers, sums the total of the numbers and separately also sums the numbers below 128. The talk shows a sample implementation in Python and C++ and explains the reasons for the observed performance difference.

In this post I will implement this example in C# with a single difference: the set of input numbers are bytes and not ints.

Naive Implementation

Find out more »


Lazy Properties in .NET 10

In this post, I explore a couple of ways to create lazy properties in C# and .NET 10. What does a lazy property mean in the context of this post? It is an object instance property that gets initialized with a value the first time its getter is invoked. The getter of the property does not need to provide thread-safe initialization. Let's review a couple of solutions available before .NET 10:

All the examples below initialize a string property. The initializing method is static and extracted as a member of a separate class:

public class Shared
{
    public static int _counter = 0;

    public static string Zeros()
    {
        Interlocked.Increment(ref _counter);
        return new string('0', _counter);
    }
}

The Zeros() method returns a new string object containing a number of 0 characters. The number of 0 characters corresponds to the number of times the method has been executed.

Find out more »


Span on Garbage Collection

A Span represents a continuous array of memory. As it is implemented by a ref struct the compiler makes sure that it does not escape to the heap (in .NET9). A Span may point to multiple types of memory segments, for example, it can point to an unmanaged memory segment, stack allocated memory or heap allocated memory. Span and ReadOnlySpan types are using an interior pointer which allows them to point to an address that is not necessarily the object's MT, but an address inside the object's memory representation. For example, they can point to the nth element of an array.

From the garbage collector's point of view, the interior pointers need special handling: the interior pointer must be translated to an address that points to the corresponding object's MT so it can be 'marked' as used memory. This is needed as an otherwise unrooted object would get garbage collected. The GC uses the brick table for the address translation.

As a ref struct type lives on the stack, it shall not cause additional allocations or pressure on the GC. Yet the address translation is extra work that the GC needs to do. The design decision for these types to be a ref struct is driven by the additional work required for the address translation. This way the GC does not need to handle interior pointers within heap allocated objects.

Does address translation have a measurable impact on garbage collection?

Find out more »