Introduction to APM 2

In the series of these posts I will look into how we can implement an APM solution for .NET applications. Application Performance Management (APM) helps to monitor and diagnose application performance. There are numerous libraries and tools out there to solve this given problem. In this series of posts I will focus on implementing APM for .NET applications (.NET Framework 461 and above, .NET Core and NET5) with OpenTelemetry.

In the series of these posts I will look into the following topics:

  • W3C correlation Id specification

  • Creating and recording spans with ActivitySource

  • Using OpenTelemetry and Jaeger

This post looks into creating and recording spans.

Introduction

There are two simple ways to create Activities:

  • Using Activity constructor

  • Using ActivitySource

Activity Constructor

When working with activities we can use its constructor to create a new instance. The constructor has one argument, the name of the operation associated with the activity. To use an activity call the Start() and Stop() methods and measure the operation between them. Most developers create an extension method which creates a new Activity, calls the Start() method and returns and IDisposable, which calls Stop() method on dispose. At this point the Activity may be reported through DiagnosticSource. This way diagnostic listeners may subscribe to activities lifetime events.

Activities in general allow to attach key-value pairs as tags or baggage. Typically string - string key-value pairs are attached, but there are overloads to take string - object key-value pairs as well. This way an activity may be enriched with metadata, for semantic logging and distributed tracing.

Using ActivitySource

An even simpler solution to create activities is to use ActivitySource. With ActivitySource we can create custom activities by calling the StartActivity() method. To create an ActivitySource object, the suggested approach is to add a static readonly field to the class, passing the name of the declaring type.

private static readonly ActivitySource Telemetry = new ActivitySource(typeof(Worker).FullName);

The name of the ActivitySource is important to filter during the collection of activities. To filter by types / namespaces in the instrumented application, the suggested approach is use the containing type's full name as the name of the activity source.

Use an ActivitySource and create a new Activity by calling the StartActivity() method passing the operation name and optional ActivityKind. The ActivityKind indicates the activity's relationship to other software components: weather it is a parent for an incoming request, an internal operation, or indicates an outgoing request.

using (Telemetry.StartActivity("Request", ActivityKind.Server))
{
    await Task.Delay(100);
}

The StartActivity() method may return null if there are no listeners listening to this ActivitySource instance. Using the result with a using block should not cause an issue, because using blocks do not throw with null arguments. For example the code below compiles and runs without an error:

using(null) { }

However when we attach tags or baggage, we must be aware that the returned value of the StartActivity() method may be null. This means we must null check the value before we add tags. The more and more tags we have attached to an activity the more and CPU cycles we may loose on repetitive null checks:

using (Telemetry.StartActivity("Work")?.AddTag("1", "hello").AddTag("2", "hello")) { }

The ?. also increases the mental complexity and readability of the above code, and it is easy to miss.

There are three approaches to tackle this issue:

  1. Create a Disposable struct

  2. Create an IOption<T> type

  3. Use null checks as shown above

The table below shows a comparison between the three options. The three method names:

  • DisposableActivity is a disposable struct

  • NullDisposable uses vanila null checks

  • JustActivity uses IOption<T> with two implementation Just and None

The number suffix denotes how many AddTag() method calls are added to the using Activity.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i5-1035G4 CPU 1.10GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.202
  [Host]     : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT
  DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT


|              Method |     Mean |    Error |   StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |---------:|---------:|---------:|-------:|------:|------:|----------:|
|     NullDisposable0 | 856.6 ns | 16.67 ns | 14.78 ns | 0.2012 |     - |     - |     632 B |
| DisposableActivity0 | 865.1 ns |  9.16 ns |  8.57 ns | 0.2012 |     - |     - |     632 B |
|       JustActivity0 | 899.7 ns | 17.97 ns | 47.34 ns | 0.2089 |     - |     - |     656 B |
|     NullDisposable4 | 959.9 ns | 19.00 ns | 28.44 ns | 0.2613 |     - |     - |     824 B |
| DisposableActivity4 | 973.4 ns | 18.84 ns | 28.20 ns | 0.2613 |     - |     - |     824 B |
|       JustActivity4 | 976.9 ns | 19.48 ns | 28.55 ns | 0.2689 |     - |     - |     848 B |

Both DisposableActivity and NullDisposable avoids additional allocations on the heap. DisposableActivity uses a struct and an extension method to avoid the developer explicitly using null checks on the Activity. The creation of this structs incurs as a cost compared to NullDisposable.

The below implementation gives a reference implementation for DisposableActivity:

public readonly struct DisposableActivity : IDisposable
{
    private readonly Activity _activity;
    public DisposableActivity(Activity activity)
    {
        _activity = activity;
    }

    public DisposableActivity AddTag(string key, string value)
    {
        _activity?.AddTag(key, value);
        return this;
    }

    public void Dispose() => _activity?.Dispose();
}

DisposableActivity checks if the activity is null or not on every AddTag() method call. With DisposableActivity type one can add an extension method for ActivitySource for a simple usage:

 public static DisposableActivity Measure(this ActivitySource source, string operationName, ActivityKind kind = ActivityKind.Internal)
{
    _ = source ?? throw new ArgumentNullException(nameof(source));
    var activity = source.StartActivity(operationName, kind);
    return new DisposableActivity(activity);
}

Using DisposableActivity's extension method start a new Activity:

using (Telemetry.Measure("Work")) { }

Using a struct might feel controversial as it may incur an allocation when Dispose() method called through the IDisposable interface, but as long as the user avoid an explicit cast, no allocation will incur, even with when using async-await. DisposableActivity is sealed, the compiler can omit boxing using the constrained IL keyword at the invocation of Dispose() method.

JustActivity might incur an extra allocation on the heap. When StartActivity() returns null, a pre-allocated None type can be re-used. When a non-null Activity is returned a new JustActivity object is allocated. The performance table shows that an additional 24 bytes are allocated on the heap compared to the other solutions.

To reduce the cost of allocations, one could pool JustActivity objects using an ObjectPool<T>. With large number of additional tags, the overhead of heap allocation (including garbage collection) and pooling diminishes compared to the previous solutions.

Finally, notice that the difference between the mean execution times for JustActivity and DisposableActivity are comparably slower with 4 tags, compared to no tags.

Using ActivityListener

To subscribe to an activity we can use ActivityListener type. A developer can create it is own type which has an ActivityListener field. The activity listener will give convenient methods to control what activities we are interested in:

  • ActivityStarted is invoked when an activity is started

  • ActivityStopped is invoked when an activity is stopped or disposed

  • ShouldListenTo can decide if a given ActivitySource should be listened to. One can observe the ActivitySource's name and other properties for this decision.

  • SampleX callbacks gives an opportunity to decide if an activity should be sampled

ActivityListener callbacks are invoked by when activities are created by ActivitySources. ActivityListener gives a simple way to subscribe to activities compared to DiagnosticListeners.

Conclusion

In this post I looked into how a developer can create activities to measure the duration of operations. I explored three different solutions to handle the case of null activities returned from ActivitySource. Introduced ActivityListener for listening to Activities being fired. ActivitySources and ActivityListeners work hand-in-hand to create and listen to activities.In the next post I will show how OpenTelemetry leverages this mechanism to provide APIs, collectors and exporters to collect and publish performance methods.