HttpClient Diagnostics

HttpClient has the capability to propagate correlation Id-s in the HTTP headers of traceparent and tracestate. Every recent .NET release had changes in this scope, for the last few releases the followings has changed:

  • Automatic Id propagation

  • AspNet Core creates a new parent activity for each request (by default)

  • Actvitity's DefaultIdFormat changed in .NET 5

  • ActivitySource introduced

In .NET 6 HttpClient allows greater control on how traceIds and spanIds are propagated on downstream HTTP calls. It accomplishes this with the help of DistributedContextPropagator, which comes with a few built in propagator strategies. In this post I will look into testing these strategies work with OpenTelemetry and Jaeger.

Under the hood

HttpClient uses a DelegatingHandler called DiagnosticsHandler. This class is responsible for wrapping the operation of sending and receiving HTTP request-response in an Activity. While in the past DiagnosticsHandler also injected the traceId and traceParent headers(depending on the activity Id format), in .NET 6 it delegates this responsibility to DistributedContextPropagator.

DistributedContextPropagator itself is an abstract class, with three implementations relevant in this post:

  • LegacyPropagator

  • PassThroughPropagator

  • NoOutputPropagator

Each implementation encapsulates a different strategy on propagating the correlation headers to downstream.

There are two options to switch between these implementations. First it may be configured for the whole application by setting the static Current property of DistributedContextPropagator:

DistributedContextPropagator.Current = DistributedContextPropagator.CreateNoOutputPropagator();

A more granular approach is to configure it for each HttpClient:

var httpClient = new HttpClient(new SocketsHttpHandler()
{
    ActivityHeadersPropagator = DistributedContextPropagator.CreateDefaultPropagator()
});

A more elegant way to this is with an HttpClientFactory during DI registration.

Test Setup

For this post I create a test infrastructure. It consists of a client application that sends requests over HTTP, and a web API serving a response to these requests. Both applications are instrumented with OpenTelemetry version 1.2.0-rc1. By default, no sources are filtered and an always on sampler is used. The client application (sending the HTTP requests) is named ConsoleApp and the web API is named Service. All traces are exported to an instance of Jaeger running on localhost.

Client

Below is the code of the client application using .NET 6. The application first sets up the OpenTelemetry tracer. Secondly, it creates 2 activities:

  • client-root is a root activity capturing all operations made by the client

  • fetch-resource" activity wraps sending the HTTP request and reading all the content of the response

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var tracer = Sdk.CreateTracerProviderBuilder()
    .SetResourceBuilder(ResourceBuilder.CreateEmpty().AddService("ConsoleApp"))
    .SetSampler(new ParentBasedSampler(new AlwaysOnSampler()))
    .AddSource("*")
    .AddJaegerExporter(options =>
    {
        options.ExportProcessorType = ExportProcessorType.Simple;
        options.AgentHost = "localhost";
    })
    .Build();

var activitySource = new ActivitySource("ConsoleApp");
using (activitySource.StartActivity("client-root"))
{
    var httpClient = new HttpClient(new SocketsHttpHandler()
    {
        ActivityHeadersPropagator = DistributedContextPropagator.CreateDefaultPropagator()
    });
    using (activitySource.StartActivity("fetch-resource"))
    {
        var response = await httpClient.SendAsync(new HttpRequestMessage(HttpMethod.Get, "https://localhost:7247/test3"));

        var content = await response.Content.ReadAsStreamAsync();
    }
}

tracer.Dispose();

Server

The server side application uses ASP.NET Core minimal APIs. It sets up a default web API application with one customization: AddOpenTelemetryTracing(). The application handles a GET request at the path /test3. The handler creates a child activity named server-work which wraps the operation required to generate the response. In this case the service returns a new Person record.

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetryTracing(telemetryBuilder =>
{
    telemetryBuilder
    .SetResourceBuilder(ResourceBuilder.CreateEmpty().AddService("Service"))
    .SetSampler(new ParentBasedSampler(new AlwaysOnSampler()))
    .AddSource("*")
    .AddAspNetCoreInstrumentation()
    .AddJaegerExporter(options =>
    {
        options.ExportProcessorType = ExportProcessorType.Simple;
        options.AgentHost = "localhost";
    });
});


var app = builder.Build();
app.MapGet("/test3", () =>
{
    var activitySource = new ActivitySource("Server");
    using (activitySource.StartActivity("server-work"))
    {
        return new Person("noname", 1);
    }
});
app.Run();

public record Person(string Name, int Age);

Note, that in my tests I instrumented the code with a few more await Task.Delay(100); calls, to better demonstrate real life like latency on throughout the traces.

Default Propagator

With the default propagator and wildcard source filtering AddSource("*") on the client side, Jaeger displays a 5 level diagram representing the user operation. On the top the client-root activity is captured, and each sub-activity is aligned as a sub-span. All spans are oredered into a single hierarchy, meaning they are all in a parent-child relationship.

Default Propagator

Notice, that the middle span named, System.Net.Http.HttpRequestOut, is the result of the activity that was generated by HttpClient. This is a reasonable default behavior, but some developers may prefer to capture spans that are solely generated by their own code. In that case client application, one might replace the asterisk symbol with the application name in the AddSource() method call: AddSource("ConsoleApp"). This will suffice as in the application there is a single ActivitySource with a matching name. Re-running the applications, the following trace is captured in Jaeger:

Default Propagator Filtered

We managed to filter out the span generated by HttpClient. However, a side effect that the trace is restarted on the Service side of the application. All spans are displayed still under a single trace, but the Service level is left aligned with the ConsoleApp level. Someone might desire to achieve such a view; I find it confusing as it does not communicate clearly that the fetch-resouce activity triggered the we API call on the Service.

No-op Propagator

The no-op propagator still creates an activity with HttpClient, but as the name suggests, it does not propagate correlation headers on the HTTP request. In Jaeger the single user action is not displayed in a single trace, but as separate traces on the client side and on the server side as well.

No-Op Propagator Client Side

Console apps show 3 levels of spans, including the activity created by the HttpClient. The server side trace is displayed on a separate diagram:

No-Op Propagator Server Side

Pass-Through Propagator

The pass-through propagator propagates the application's root Activity's traceId. It needs to walk through the parent-child hierarchy of Activities bottom-up, which could be a performance hit in a large hierarchy and a tight application loop. An activity is still generated by HttpClient, but the not propagated to downstream. On the screenshot below, OpenTelemetry is instructed to collect all sources.

Pass-Through Propagator

Jaeger links the client and server spans in a single trace. However, the Service spans are linked under the root activity of the ConsoleApp. This is even better visible when the client application filters to the custom ActivitySource:

Pass-Through Propagator Filtered

Custom Propagator

It is possible to create a custom propagator as well. For example, someone would like to propagate the parent activity's traceId instead of the one generated by HttpClient. To create such a propagator, we need to provide a custom implementation for DistributedContextPropagator. There are three methods to override:

  • ExtractTraceIdAndState() extracts the traceparent and the tracestate from the incoming request

  • ExtractBaggage() to extract any additional key-value pairs from the request

  • Inject() to inject the trace values into a carrier object, which is the HttpRequestMessage in this case

We also need to provide a collection of strings which identifies the fields that are propagator would get or set.

In this case the propagator is expected to only inject the trace context in the client app, thus I leave ExtractTraceIdAndState() and ExtractBaggage() unimplemented:

public class ParentIdPropagator : DistributedContextPropagator
{
    private const string TraceParent = "traceparent";
    private const string TraceState = "tracestate";

    public override IReadOnlyCollection<string> Fields => new[] { TraceParent, TraceState };

    public override IEnumerable<KeyValuePair<string, string?>>? ExtractBaggage(object? carrier, PropagatorGetterCallback? getter) => throw new NotImplementedException();

    public override void ExtractTraceIdAndState(object? carrier, PropagatorGetterCallback? getter, out string? traceId, out string? traceState) => throw new NotImplementedException();

    public override void Inject(Activity? activity, object? carrier, PropagatorSetterCallback? setter)
    {
        if (setter is null)
            return;

        var traceParent = Activity.Current?.ParentId;
        if (traceParent is null)
            return;
        setter(carrier, TraceParent, traceParent);
        
        var traceState = Activity.Current?.TraceStateString;
        if (!string.IsNullOrEmpty(traceState))
            setter(carrier, TraceState, traceState);
    }
}

The W3C standard propagates the trace context through the traceparent and tracestate headers. To set these values one would need to call the setter callback with the carrier and a field name and field value to set. In this case the HTTP header values are set, when there is a parent activity Id, while the trace state is only set when it has a value and traceparent is available.

ParentId Propagator Filtered

This propagator (with AddSource("ConsoleApp") filtering) completely leaves out the activity generated by HttpClient and propagates instead its parent traceId. Jaeger links the Service span in a single trace, as a child of fetch-resource span. This way we may omit the HttpClient's autogenerated activity in the traces, but still link surrounding spans in a parent-child relationship.