Introduction to APM 2
08/29/2021
7 minutes
In the series of these posts I will look into how we can implement an APM solution for .NET applications. Application Performance Management (APM) helps to monitor and diagnose application performance. There are numerous libraries and tools out there to solve this given problem. In this series of posts I will focus on implementing APM for .NET applications (.NET Framework 461 and above, .NET Core and NET5) with OpenTelemetry.
In the series of these posts I will look into the following topics:
W3C correlation Id specification
Creating and recording spans with ActivitySource
Using OpenTelemetry and Jaeger
This post looks into creating and recording spans.
Introduction
There are two simple ways to create Activities:
Using Activity constructor
Using ActivitySource
Activity Constructor
When working with activities we can use its constructor to create a new instance. The constructor has one argument, the name of the operation associated with the activity. To use an activity call the Start()
and Stop()
methods and measure the operation between them. Most developers create an extension method which creates a new Activity, calls the Start()
method and returns and IDisposable
, which calls Stop()
method on dispose. At this point the Activity
may be reported through DiagnosticSource
. This way diagnostic listeners may subscribe to activities lifetime events.
Activities in general allow to attach key-value pairs as tags or baggage. Typically string
- string
key-value pairs are attached, but there are overloads to take string
- object
key-value pairs as well. This way an activity may be enriched with metadata, for semantic logging and distributed tracing.
Using ActivitySource
An even simpler solution to create activities is to use ActivitySource. With ActivitySource we can create custom activities by calling the StartActivity()
method. To create an ActivitySource
object, the suggested approach is to add a static readonly field to the class, passing the name of the declaring type.
private static readonly ActivitySource Telemetry = new ActivitySource(typeof(Worker).FullName);
The name of the ActivitySource is important to filter during the collection of activities. To filter by types / namespaces in the instrumented application, the suggested approach is use the containing type's full name as the name of the activity source.
Use an ActivitySource
and create a new Activity
by calling the StartActivity()
method passing the operation name and optional ActivityKind
. The ActivityKind
indicates the activity's relationship to other software components: weather it is a parent for an incoming request, an internal operation, or indicates an outgoing request.
using (Telemetry.StartActivity("Request", ActivityKind.Server)) { await Task.Delay(100); }
The StartActivity()
method may return null if there are no listeners listening to this ActivitySource
instance. Using the result with a using
block should not cause an issue, because using blocks
do not throw with null arguments. For example the code below compiles and runs without an error:
using(null) { }
However when we attach tags or baggage, we must be aware that the returned value of the StartActivity()
method may be null. This means we must null check the value before we add tags. The more and more tags we have attached to an activity the more and CPU cycles we may loose on repetitive null checks:
using (Telemetry.StartActivity("Work")?.AddTag("1", "hello").AddTag("2", "hello")) { }
The ?.
also increases the mental complexity and readability of the above code, and it is easy to miss.
There are three approaches to tackle this issue:
Create a Disposable struct
Create an
IOption<T>
typeUse null checks as shown above
The table below shows a comparison between the three options. The three method names:
DisposableActivity is a disposable struct
NullDisposable uses vanila null checks
JustActivity uses
IOption<T>
with two implementation Just and None
The number suffix denotes how many AddTag()
method calls are added to the using Activity.
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042 Intel Core i5-1035G4 CPU 1.10GHz, 1 CPU, 8 logical and 4 physical cores .NET Core SDK=5.0.202 [Host] : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT DefaultJob : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT | Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated | |-------------------- |---------:|---------:|---------:|-------:|------:|------:|----------:| | NullDisposable0 | 856.6 ns | 16.67 ns | 14.78 ns | 0.2012 | - | - | 632 B | | DisposableActivity0 | 865.1 ns | 9.16 ns | 8.57 ns | 0.2012 | - | - | 632 B | | JustActivity0 | 899.7 ns | 17.97 ns | 47.34 ns | 0.2089 | - | - | 656 B | | NullDisposable4 | 959.9 ns | 19.00 ns | 28.44 ns | 0.2613 | - | - | 824 B | | DisposableActivity4 | 973.4 ns | 18.84 ns | 28.20 ns | 0.2613 | - | - | 824 B | | JustActivity4 | 976.9 ns | 19.48 ns | 28.55 ns | 0.2689 | - | - | 848 B |
Both DisposableActivity and NullDisposable avoids additional allocations on the heap. DisposableActivity uses a struct and an extension method to avoid the developer explicitly using null checks on the Activity
. The creation of this structs incurs as a cost compared to NullDisposable.
The below implementation gives a reference implementation for DisposableActivity:
public readonly struct DisposableActivity : IDisposable { private readonly Activity _activity; public DisposableActivity(Activity activity) { _activity = activity; } public DisposableActivity AddTag(string key, string value) { _activity?.AddTag(key, value); return this; } public void Dispose() => _activity?.Dispose(); }
DisposableActivity
checks if the activity is null or not on every AddTag()
method call. With DisposableActivity
type one can add an extension method for ActivitySource
for a simple usage:
public static DisposableActivity Measure(this ActivitySource source, string operationName, ActivityKind kind = ActivityKind.Internal) { _ = source ?? throw new ArgumentNullException(nameof(source)); var activity = source.StartActivity(operationName, kind); return new DisposableActivity(activity); }
Using DisposableActivity
's extension method start a new Activity:
using (Telemetry.Measure("Work")) { }
Using a struct
might feel controversial as it may incur an allocation when Dispose()
method called through the IDisposable
interface, but as long as the user avoid an explicit cast, no allocation will incur, even with when using async-await. DisposableActivity is sealed, the compiler can omit boxing using the constrained IL keyword at the invocation of Dispose()
method.
JustActivity might incur an extra allocation on the heap. When StartActivity()
returns null, a pre-allocated None
type can be re-used. When a non-null Activity
is returned a new JustActivity
object is allocated. The performance table shows that an additional 24 bytes are allocated on the heap compared to the other solutions.
To reduce the cost of allocations, one could pool JustActivity
objects using an ObjectPool<T>
. With large number of additional tags, the overhead of heap allocation (including garbage collection) and pooling diminishes compared to the previous solutions.
Finally, notice that the difference between the mean execution times for JustActivity and DisposableActivity are comparably slower with 4 tags, compared to no tags.
Using ActivityListener
To subscribe to an activity we can use ActivityListener
type. A developer can create it is own type which has an ActivityListener
field. The activity listener will give convenient methods to control what activities we are interested in:
ActivityStarted is invoked when an activity is started
ActivityStopped is invoked when an activity is stopped or disposed
ShouldListenTo can decide if a given
ActivitySource
should be listened to. One can observe theActivitySource
's name and other properties for this decision.SampleX callbacks gives an opportunity to decide if an activity should be sampled
ActivityListener
callbacks are invoked by when activities are created by ActivitySource
s. ActivityListener
gives a simple way to subscribe to activities compared to DiagnosticListener
s.
Conclusion
In this post I looked into how a developer can create activities to measure the duration of operations. I explored three different solutions to handle the case of null activities returned from ActivitySource
. Introduced ActivityListener
for listening to Activities being fired. ActivitySource
s and ActivityListener
s work hand-in-hand to create and listen to activities.In the next post I will show how OpenTelemetry leverages this mechanism to provide APIs, collectors and exporters to collect and publish performance methods.