Laszlo - Using ClrMD for string analysis

Using ClrMD for string analysis

04/05/2026 | 6 minutes to read

Analysis of a .NET application's memory dump often reveals that byte[] and string types are among the most commonly allocated objects. This is not surprising, as most UI and web applications use string types to display data to users or byte[]/string to pass data over the network in HTTP/2 requests and responses. These string objects commonly represent configuration values, string literals, or data allocated by dependent libraries.

Analyzing strings in such a noisy environment is challenging. While common tools like Visual Studio, PerfView, and dotnet-dump provide high-level analysis, drilling into the details is often difficult. ClrMD is one of the tools that enables programmatic analysis of memory dumps. In my previous post Getting Started with ClrMD, I introduced how to analyze a full memory dump of a .NET application. This post explores a specialized analysis for strings: identifying application-defined types that hold references to large strings. These are the strings that developers have direct control over in their application code.

Execution Steps

First, collect a memory dump of a running application:

> dotnet-dump collect --type Full -p 5444 -o dump_full_heap.dmp

A full dump is the largest type of dump, containing all the memory of the executing process. It might contain sensitive information like PII or secrets. Handle such dumps from production environment with great care!

The next step is creating a console application and adding a reference to Microsoft.Diagnostics.Runtime nuget package.

Now, let's write the analysis code. The approach is to enumerate all objects and check if their type name includes the namespace of our application. For each matching object, we enumerate its string references and sum the size of strings that exceed a certain threshold (250 bytes or approximately 114 chars).

The main complexity arises from string sharing between objects. Sharing occurs when two objects (of different types) reference the same string instance. For example, an Entity Framework object typically shares string references with its corresponding data/API entity. Understanding which strings are shared is crucial for effective analysis, as optimizing string usage in one layer (e.g., the business layer) may not reduce memory usage if the strings are still referenced elsewhere (e.g., by Entity Framework entities).

To address this, our solution uses two additional data structures:

A HashSet<ulong> containing memory addresses of strings referenced by multiple application objects
A Dictionary<ulong, ClrType> tracking strings referenced by only a single application object

Here's how the tracking works:

When we first encounter a string, we add it to the Dictionary<ulong, ClrType>
If we encounter the same string again (same memory address), we mark it as shared:
- Remove it from the Dictionary
- Update the size calculations for the original type
- Add the string's address to the HashSet<ulong>

Note: A single object can reference the same string multiple times through different fields or properties. The following code demonstrates how to handle this scenario.

// ... Opening the dmp file excluded for brevity
var heap = runtime.Heap;

Dictionary<ClrType, ReferencedStringSize> summary = new();
Dictionary<ulong, ClrType> firstReference = new();
HashSet<ulong> sharedStringReferences = new();

foreach (var obj in heap.EnumerateObjects())
{
    if (!obj.IsValid || obj.Type?.Name == null || !obj.Type.Name.StartsWith("MyApplication"))
        continue;

    // Consider testing if obj is rooted unless GC was triggered before dumping

    foreach (var referencedObj in obj.EnumerateReferences())
    {
        if (!summary.TryGetValue(obj.Type, out var currentParent))
        {
            currentParent = new ReferencedStringSize { Size = 0, SharedSize = 0, Name = obj.Type.Name };
            summary.Add(obj.Type, currentParent);
        }

        if (!referencedObj.IsValid || referencedObj.Type == null || !referencedObj.Type.IsString || referencedObj.Size < 250)
            continue;

        // Shared string reference, already seen multiple times, increment the shared size
        if (sharedStringReferences.Contains(referencedObj.Address))
            currentParent.SharedSize += referencedObj.Size;
        else
        {
            // Shared string but only seen once before.
            if (firstReference.TryGetValue(referencedObj.Address, out var firstReferencingType))
            {
                // Adjust the sizes for the first referencing type.
                // Increment the current parent's shared size
                summary[firstReferencingType].SharedSize += referencedObj.Size;
                summary[firstReferencingType].Size -= referencedObj.Size;
                currentParent.SharedSize += referencedObj.Size;
                firstReference.Remove(referencedObj.Address);
                sharedStringReferences.Add(referencedObj.Address);
            }
            else
            {
                // This is a not yet seen string, add increment the size
                // and track the containing type as a first reference
                currentParent.Size += referencedObj.Size;
                firstReference.Add(referencedObj.Address, obj.Type);
            }
        }
    }
}

foreach (var obj in summary.Values.OrderByDescending(x => x.Size + x.SharedSize).Take(3))
    Console.WriteLine(obj);

public record class ReferencedStringSize
{
    public required string Name { get; init; }
    public required ulong Size { get; set; }
    public required ulong SharedSize { get; set; }
}

While this code provides a foundation for string analysis, it should be adapted to your specific needs. For instance, the definition of a 'large' string varies between applications (note that 'large' here doesn't necessarily mean Large Object Heap allocation). The code serves as a template that you can customize to analyze string usage patterns in your application.

Results

Let's examine the results from analyzing a sample application's memory dump. The application contains three types:

MyType1: Contains 100 string references, each 1000 characters long (2022 bytes per string)
MyType2: Shares 52 string references with MyType1, plus it has additional string references of 100 characters each (these shorter strings aren't included in our analysis results due to the size threshold)
MyType3: Contains 100 unique string references, each 300 characters long (622 bytes per string)

Console Output

MyApplication.MyType1, Size = 97056, SharedSize = 105144
MyApplication.MyType2, Size = 0, SharedSize = 105144
MyApplication.MyType3, Size = 62200, SharedSize = 0

Conclusion

Using ClrMD for string analysis provides powerful insights into how strings are allocated and shared across different types in a .NET application. The approach demonstrated in this post offers several benefits:

It helps identify which application types hold references to large strings, enabling targeted optimization efforts.
By distinguishing between shared and non-shared strings, it provides a more accurate picture of memory ownership and potential optimization opportunities.
The analysis can be customized to focus on strings above specific size thresholds relevant to your application.

This technique is particularly valuable when investigating memory issues in applications with complex string handling patterns, such as web applications or data processing systems.

The sample results clearly demonstrate how the analysis can reveal string sharing patterns between different types, which might not be immediately obvious through conventional memory profiling tools. This information can be crucial for making informed decisions about memory optimization strategies in your .NET applications.

ClrMD string dotnet-dump memory

Hello, I am Laszlo

Software-Engineer, .NET developer