Measuring Finalized Objects

This a quick post on how to measure the number of finalized objects with PerfView and WinDBG. The reason to measured this is because many appliciations use thousands of objects with finalizers. As finalization may not been suppressed on these objects, it can cause a significant performance degradation, which does not show up on tail latency, but only on throughput tests. Focusing only on a single operation only, can easily hide such an issue.

Even today I seem to run into libraries / code paths that excessively use objects with finalizers. .NET runtime's garbage collector handles finalizable objects separately from other objects. Housekeeping for live finalizable objects requires more resouces compared to non-finalizable ojects. Cleaning up dead finalizable objects requires even more resouces, including to run the Finalizer method, which is eventually user code.

When finalizers are combined with the dispose pattern, one should call GC.SuppressFinalize to exempt the objects from finalization. Thus saving the resoucers otherwise required for the cleanup. Although allocating objects with finalizers are still slower to regular objects.

Measuring the number of finalized objects

For my test, I created a .NET 5 application with a finalizable type A (referred as Finalizer.A below).

Measuring the finalized objects is simple with PerfView:Go to Collect -> Collect and set the GC Only checkbox under Advanced section. After measurement session, see the results under the Memory Group / GC Stats. Select the process measured, the results have a section on the Finalized Object Counts.

Seeing a high number for the finalized object counts always raise questions: are some Dispose / GC.SuppressFinalize calls missed? As the table also who the type name, this should be easy to check and fix.

In .NET 5, the number of objects ready for finalization are also available through the GC.GetGCMemoryInfo() API call.

Measuring the number of finalizable objects

When GC.SuppressFinalize is correctly called in the dispose method of the finalizable objects and the objects are properly diposed, the above section will be missing in the GC Stats report.

The question of 'number of finalizable objects' still matters, because just allocating objects for types with finalizer methods take longer than without finalizers. How can we measure the number of objects with Finalizers, but not Finalized?

The simplest way to get this information is by using WinDBG with the SOS extension. The !finalizequeue queue returns the number of objects that are finalizable. According to my testing, these numbers disregard if SuppressFinalize method has been called on the objects. The output of !finalizequeue show a snaphot of the heap, compared to the PerfView finalized objects count. This means that objects with finalizers already collected will not accumulate. So in case we have many short living objects, we might see small number of finalizable objects for gen0 / gen1, for this reason Ready for finalization should be also considered. If it is larger than 0 in consecutive snapshots, it might point to a lot of short living finalizable objects.

The results of !finalizequeue command on the sample application can be seen below. The sample application has 10 live instances of type A with all suppressed from finalization:

0:009> !finalizequeue
SyncBlocks to be cleaned up: 0
Free-Threaded Interfaces to be released: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------

generation 0 has 18 finalizable objects (000001DC779AE780->000001DC779AE810)
generation 1 has 0 finalizable objects (000001DC779AE780->000001DC779AE780)
generation 2 has 0 finalizable objects (000001DC779AE780->000001DC779AE780)
Ready for finalization 0 objects (000001DC779AE810->000001DC779AE810)
Statistics for all finalizable objects (including all objects ready for finalization):
              MT    Count    TotalSize Class Name
00007ffeb7760b68        2           48 System.WeakReference`1[[System.Diagnostics.Tracing.EventSource, System.Private.CoreLib]]
00007ffeb7762f80        1          184 System.Diagnostics.Tracing.NativeRuntimeEventSource
00007ffeb7763960       10          240 Finalizer.A
00007ffeb7737798        1          368 System.Diagnostics.Tracing.RuntimeEventSource
00007ffeb77381b8        4          448 System.Diagnostics.Tracing.EventSource+OverideEventProvider
Total 18 objects

Note, that !finalizequeue shows all objects with finalizers under the Ready for finalization line before the first GC.

Another approach is using PerfView .NET SampleAlloc checkbox under Advanced section. This will generate events on every ~100KB allocated memory. Checking the types under Events view for Microsoft-Windows-DotNETRuntime/GC/AllocationTick will show the objects triggering the event, including the name of the type of the allocated object. Exporting this data to Excel will allow you to create a histogram of the most allocated types during the performance measurement session. This will not give an exact number of objects, but will rather show a distribution. As the events include the type names, one can validate that the most allocated types are finalizable or not.