EventListeners for CPU metrics

Introduction

EventListeners can be used to subscribe to built-in events of the .net core. In this post I will show how to subscribe to cpu-usage metric, and I will put the value of this event into context on different platforms.

There can be many reasons to subscribe to built-in events, especially to cpu-usage event counter. Although CPU usage can be monitored by a separate application (such as dotnet-counters), we might want to subscribe to this even within the process too. This way we may push the current resource usage periodically to an application metrics monitoring infrastructure (suck as the ELK or TICK stack). The cpu-usage counter measures the system and user time % used by the process.

The previous post has shown how the eventing infrastructure works within dotnet.

EventListener for CPU Usage

There are two ways to subsribe to cpu-usage counter. In both we start by deriving from EventListener class. We can decide to create a generic event listener or one specific to a given EventSource. In case of built-in events generic, event listeners are more complex, because to enable events for a given event source EnableEvents(obj) method should be called with the EventSource object to enable passed as an argument. In an EventListener when overriding OnEventSourceCreated() method we can get a reference to EventSource objects, but OnEventSourceCreated() might be invoked before the our listener's constructor completes. So we cannot really pass our filtering criteria on EventSources through the constructor.The only way to get a reference to EventSources for GenericEventListeners is to re-iterate them (not detailed in this post).

For the purpose of this example, a simple non-generic EventListener serves our needs:

public sealed class SystemRuntimeEventListener : EventListener
{
  public double Value { get; private set; }

  protected override void OnEventSourceCreated(EventSource eventSource)
  {
    if (eventSource.Name.Equals("System.Runtime"))
      EnableEvents(eventSource, EventLevel.LogAlways, EventKeywords.All, new Dictionary<string, string> { {"EventCounterIntervalSec", "1"} });
  }

  protected override void OnEventWritten(EventWrittenEventArgs eventData)
  {
    if (eventData.Payload == null || eventData.Payload.Count == 0)
      return;
    if (eventData.Payload[0] is IDictionary<string, object> eventPayload && eventPayload.TryGetValue("Name", out var nameData) && nameData is string name && name == "cpu-usage")
    {
      if (eventPayload.TryGetValue("Mean", out var value))
      {
        if (value is double dValue)
        {
          Value = dValue;
          base.OnEventWritten(eventData);
        }
      }
    }
  }
}

This class subscribes to System.Runtime event source. Next for every event written it checks if the payload contains metrics from the cpu-usage counter. If it has a mean double value, it is stored in a local field. For this code to work correctly, you need to build and run your application in x64, otherwise you may experience double tiering when the value written to the property (or protected the underlying field with Interlocked/Volatile memory barriers).

Windows

On Windows cpu-usage is calculated by this RuntimeEventSourceHelper. The user and kernel processor time are both considered.

Comparing the output value yields the same result as taskmgr.exe's CPU column on the Details tab. Using the Processes tab the % value might be different, it is adjusted by % Processor Performance counter. Comparing with perfmon, % Processor Time of the process must be divided by the number of CPUs in the machine.

Linux

On Linux one might use the top command to check the values of the EventCounter with the one reported by the system. The numbers are slightly off again, in this case, with number of CPU's in the machine. The Linux version of the RuntimeEventSourceHelper devides the system reported values with the number of CPUs (so we get a similar result as on Windows). Top on the other hand does not do the same. On the below image, we may see that the % reported by the process may go over a 100%

top

Conclusion

When working with cpu-usage, the numbers reported may look a bit off in the first place, but they are actually reported correctly and inline with measurements for other tools. A developer needs a good understanding on how different % CPU usages may relate to each other, to gain confidence on their reliability. In the scope of this post CPU %-s are reported with the same frequency and there is no distinction between kernel and user times, considering those would require further investigation on the available counters.