Constructing URLs

HttpClient type in .NET is one of the most dominant types to create network calls using HTTP protocol. One of the questions I face every time: how can I build the Uri that is then passed to HttpRequestMessage or HttpClient to identify the resource to be requested. The API surface of these types accepts strings and Uris, and Uris can be built from strings.

Many line of business (LOB) applications also need to present links/URI that point out from the current application to some external resource. The parts of these URIs typically get concatenated from a well-known host, port, some path that may vary based on the resource and a query string.

In this post I am going explore different ways of creating URIs, focusing on creating the path segments. While schemas, hosts, ports, or query strings are going to be part of the final URI built, I am going to handle them as well formed constants for the purpose of this blog post. I do this because most applications define the base URIs in the settings, including the schema, the host and the port, such as https://localhost:5000/. The path typically varies on the resource to be referenced. The query can also vary, but many RESTful APIs tend to choose path parameters and request bodies instead. However, the findings and concepts described in this post for the paths can generally applied for the other parts of the URI as well.

I am writing this blog post in the .NET 8 timeframe. This post uses the API surface of the preview versions of .NET 9.

Creating URIs

There are many built-in types and libraries to create URIs.

HttpRequestMessage requires a Uri. There are many ways to create Uris: one can use the built-in UriBuilder or the well-known nuget package Flurl. I tend to find these solutions offering more than what I really need in most cases. Hence, in this post I explore other efficient ways of creating URI paths.

The most common poor man's solution is using some sort of a concatenation. Here are a few options:

  • string.Concat

  • Composing baseUri and relativeUri with Uri constructor. Referred later as WithBaseUrl.

  • Using interpolated string with DefaultInterpolatedStringHandler

  • Using CompositeFormat

Note that the current C# compiler may choose to use string.Concat even for interpolated strings. When the concatenation can be done using one of the overloads of the string.Concat method, it chooses the overload for better performance.

For example, the code snippet below uses string interpolation with one of the HttpRequestMessage's constructors, that accepts the URI as a string parameter. In this case, all interpolated variables (BaseUrl, Id, TypeValue) reference string objects.

return new HttpRequestMessage(HttpMethod.Get, $"{BaseUrl}/path/entity/{Id}/type/{TypeValue}");

A performance comparison of these options reveals interesting characteristics of these approaches. While the implementations of the methods can be found in this repository, key finding is that solution named CompleteUri seems to be one promising solution.

| Method          | Mean     | Error   | StdDev  | Gen0   | Code Size | Allocated |
|---------------- |---------:|--------:|--------:|-------:|----------:|----------:|
| AbsolutString   | 108.9 ns | 1.33 ns | 1.24 ns | 0.0408 |   3,409 B |     256 B |
| WithBaseUrl     | 229.1 ns | 2.16 ns | 2.02 ns | 0.0827 |     724 B |     520 B |
| CompleteUri     | 104.0 ns | 1.38 ns | 1.23 ns | 0.0408 |   2,923 B |     256 B |
| UriBuilder      | 355.5 ns | 3.76 ns | 3.52 ns | 0.1016 |   2,141 B |     640 B |
| CompositeFormat | 120.8 ns | 1.54 ns | 1.44 ns | 0.0408 |   2,342 B |     256 B |

CompleteUri uses the following implementation: it creates an interpolated string, that is passed to the constructor of Uri. In the test above, this is slightly more efficient compared to passing a string URI parameter to HttpRequestMessage:

new HttpRequestMessage(HttpMethod.Get, new Uri($"{BaseUrl}/path/entity/{Id}/type/{TypeValue}"));

Path Segment Source Concerns

String interpolation seems to provide a good base for concatenating paths. However, there are few common issues to solve:

  • what if the BaseUrl in the example above already has a trailing / character?

  • what if TypeValue already contains a / prefix or suffix?

  • what if the interpolated string excludes slashes, for example ...entity{Id}type...?

BaseUrl as typically defined in application settings. The person defining a base URI will likely do it in a different place to the code repository. Its value is usually stored in a YAML or JSON configuration file among other settings. Without the context of the source code this person may choose to use, or not to use a trailing slash. The Id and TypeValue variables may be sourced from a database or from master data. These data sources have no context on the way the path is built. Finally, these parameters might hold multiple segments, ie. TypeValue could return /something/else, where it feels very natural to start with its configuration value with a leading / prefix.

String interpolation to concat the URI segments is a ubiquitous approach. However, the above concerns often result that paths may end up with double slashes (//) or no slashes at all. Both can be a problem.

  • missing / characters will result in an invalid path when accessing a resource

  • double // characters can result routing issues, depending on the server. Today, I found servers that rewrite the URI or redirect to a URL that uses a single / instead. Some other servers will serve the content as if the path contained a single / character only. Some other servers return HTTP 400 or 404 - Not Found as a response.

I have seen multiple production outages due to invalid concatenation and configuration data resulting a // or no slash separators.

Path-aware Concatenation

While it is generally not true that every appended part will be a complete path segment (or multiple segments), in most LOB applications I have not seen path concatenation that was not aligned to these segments (meaning every substitution in the interpolated string is aligned to /). In the rest of this post, I will build on this assumption.

One plausible solution for the above issues could be reaching out to Path's Combine or Join methods. While these seemingly address some of the problems, they concat paths using the platform specific / or \ separator characters. Moreover, they might end up creating paths with // or extreme cases with the Combine method the whole initial segment might be left out.

Another solution could be validating the substituted values with application settings validation and validating dstabase records, master data etc. These data source could be entirely different system. Hence, there is no single place where all validations should take place, other than the place where the URI is created. This might be too late for some applications for a validation, and it is better to try to handle the missing or superfluous slashes.

One such approach could use a defensive trim operation to remove the leading and trailing / character for each interpolated value. Then re-add the correct slash characters in the interpolated strings, such as: $"{BaseUrl.Trim('/')}/path/entity/{Id.Trim('/')}/type/{TypeValue.Trim('/')}". The downside of this solution that is results extra string allocations as the Trim method may create a new string object. See the corresponding allocations in the benchmarks table below with method name StringConcat.

| Method                        | Id      | TypeValue | BaseUrl              | Mean     | Error    | StdDev   | Code Size | Gen0   | Allocated |
|------------------------------ |-------- |---------- |--------------------- |---------:|---------:|---------:|----------:|-------:|----------:|
| StringConcat                  | /someId | /part     | https(...):5000 [22] | 44.50 ns | 0.438 ns | 0.410 ns |   3,291 B | 0.0318 |     200 B |
| StringInterpolation           | /someId | /part     | https(...):5000 [22] | 34.31 ns | 0.260 ns | 0.231 ns |   3,050 B | 0.0204 |     128 B |
| CustomInterpolationStackalloc | /someId | /part     | https(...):5000 [22] | 25.49 ns | 0.258 ns | 0.216 ns |   1,924 B | 0.0204 |     128 B |
| CustomInterpolation           | /someId | /part     | https(...):5000 [22] | 32.94 ns | 0.251 ns | 0.234 ns |   2,691 B | 0.0204 |     128 B |

We can achieve similar results using the DefaultInterpolatedStringHandler without the extra allocations by using the AsSpan method: $"{BaseUrl.AsSpan().Trim('/')}/path/entity/{Id.AsSpan().Trim('/')}/type/{TypeValue.AsSpan().Trim('/')}". In this case each input is referenced by a Span before trimming. The trim operation on the Span avoids the additional allocation. The downside of this approach is slightly more verbose, and it faces a problem with non-string inputs. Non-string inputs first would need to be formatted into a string as we are not sure if they contain a leading or trailing slash. Most built-in types are safe in this regard (as they will not output a leading or trailing separators by default), but any custom type implementing IFormattable or ISpanFormattable type is still prone to errors.

.NET allows to create a custom InterpolatedStringHandler, which offers an interesting option to explore next. In the table above one such solution is named as CustomInterpolation and CustomInterpolationStackalloc.

Custom Interpolated String Handler

The recipe for creating a custom InterpolatedStringHandler is pretty much given by the comments in DefaultInterpolatedStringHandler. There is an excellent tutorial that explains the basics of creating custom handlers. Moreover, there is an excellent comment in DefaultInterpolatedStringHandler that provides directions for creating performance sensitive implementations.

Note that creating a custom InterpolatedStringHandler is possible in .NET 9, achieving the same level of compiler intrinsics as the DefaultInterpolatedStringHandler has, is not possible. We cannot instruct the compiler to use an alternative method in a similar fashion as the compiler uses for string.Concat. This unfortunately results in a minor performance penalty when having only a few (<4) segments to concatenate.

For a custom interpolation handler, I created two types:

  • The actual interpolation handler: [InterpolatedStringHandler] public ref struct UriPathInterpolatedStringHandler,

  • and a static type that uses the interpolation handler as one of its parameters on a method: public static string Create(UriPathInterpolatedStringHandler handler). The end users will only use this second type.

The compiler then re-writes the call site of the Create method by appending the parts of the interpolated string to the handler. The handler needs to expose a few methods, that is going to be invoked at runtime by compiler emitted code. A handler at minimum requires a void AppendLiteral(string value) method, however the above linked source to DefaultInterpolatedStringHandler has a great suggestion on the AppendFormatted overloads to implement.

public void AppendFormatted(scoped ReadOnlySpan<char> value)
{
    if (TryResolveSlashes(value) && value.TryCopyTo(_chars.Slice(_pos)))
        _pos += value.Length;
    else
        GrowAndAppend(value);
}

In the above code snippet, the compiler will pass an interpolated string part to the AppendFormatted method. The method tries to resolve the slashes, then tries to append the new part. Appending may fail because the internal buffer of the handler might not have enough space for the new part. In this case the handler grows the internal buffer, then retries the append operation.

One custom implementation to resolve / separators can follow as:

  • when the already provided parts end with a slash and the currently appended part starts with a slash: then reduce the length of the already written value by one to exclude the trailing slash, then append as normal

  • when the already provided parts do not end with a slash and the currently appended part does not start with a slash: then insert a slash at the end of the existing parts and continue appending the new part as normal

  • otherwise append the new part

The code snippet below outlines such an implementation:

private bool TryResolveSlashes(scoped ReadOnlySpan<char> value)
{
    // Ignore slashes at the beginning of string.
    if (_pos <= 0)
        return true;

    var lastSlash = _chars[_pos - 1] == Slash;
    var firstSlash = value[0] == Slash;
    if (firstSlash && lastSlash)
        _pos = int.Max(0, _pos - 1);
    else if (!firstSlash && !lastSlash)
    {
        if (_chars.Length <= _pos + 1)
            return false;
        _chars[_pos++] = Slash;
    }
    return true;
}

Appending a non-string value is a slightly more involved task. In this case if the object appended implements ISpanFormattable or IFormattable then it cannot be checked if it starts with a leading / before being formatted. I assume that most types implementing these interfaces do not format the values with a leading or trailing slash. So, in this case, the handler can make sure that the already written parts end with a /.

After appending a ISpanFormattable or IFormattable value a // is detected, then it can be resolved by copying the newly written part by an offset of 1 to the 'left'.

As shown above a custom string interpolation handler provides a reasonable performance while safeguarding the path segment separators, as long as the inputs can be trusted not to intentionally contain malformed segments. The string interpolation handler could be further extended to prevent such cases.

This custom interpolation handler can be also simpler than the DefaultInterpolatedStringHandler, because it avoid dealing with certain edge cases, like alignment or a custom formatter.

Canonical URI Paths

While many parts of a URI can be validated, in this post I will focus on the path. To validate if a URI is well-formed (schema, ports, query, etc.), I suggest using the built-in Uri type. Here, I will extend the current interpolated string handler, with canonicalizing dot segments feature.

URIs may contain dot segments, such as /../, /.., /./, /. denoting the parent or the current segments. Well behaving servers typically handle such paths and resolve the dot segments before matching a route. Many HTTP clients also create canonical URIs, for example the built-in Uri does so, when using the default options. To disable canonicalization with the Uri type a developer can pass new UriCreationOptions() { DangerousDisablePathAndQueryCanonicalization = true } to the constructor of it.

When someone requires a Uri object,using the Uri type itself is a reasonable way to canonicalize the given path. However, when a string URI is required without dot segments, the above interpolated string handler can be extended to perform dot segment removal. While this won't perform the same level of validation as provided by Uri type, it is an improvement compared to raw string concatenation. It also has great performance, because it avoids the extra object allocation for the Uri itself, as well as the dot segment removal can be done in-place, using the same buffer that is already used by the interpolation handler.

The table below compares creating string URI with the enhanced interpolation string handler and with the Uri type. The Happy performance measurements indicate when there is no dot segment in the input, and the other measurements involve 2 dot segment removals.

| Method                               | Mean      | Error    | StdDev   | Code Size | Gen0   | Allocated |
|------------------------------------- |----------:|---------:|---------:|----------:|-------:|----------:|
| Uri                                  | 362.08 ns | 3.796 ns | 3.365 ns |   3,378 B | 0.0625 |     392 B |
| UriCanonicalCustomInterpolation      |  81.98 ns | 0.709 ns | 0.664 ns |   5,548 B | 0.0191 |     120 B |
| HappyUri                             | 216.57 ns | 1.234 ns | 1.031 ns |   3,346 B | 0.0420 |     264 B |
| HappyUriCanonicalCustomInterpolation |  48.16 ns | 0.514 ns | 0.456 ns |   4,955 B | 0.0204 |     128 B |

The dot-segment removal algorithm is based on RFC3986. The current implementation utilizes SIMD operations to vectorize the algorithm steps. A recipe for this process can be found in the linked GitHub source. This referenced code is not production ready, please make sure it is well-tested before using it in a production app.

The dot segment removal can be done in-place, without the need for allocating a new buffer. The following steps give an overview of the algorithm:

  1. The URI is searched for the next chars /.

  2. When there is a match, the searched the characters until the read position are copied to the write position

  3. Then the succeeding characters after the match are tested for being a dot segment

  4. When a .. segment is found the write position is updated to remove the last written segment.

  5. When a . segment is found the read position is updated to skip ahead to the next segment.

  6. The read and write positions are updated

Conclusion

In this post I have reviewed different ways of creating URIs in .NET 8. I have explored the most common approaches from a performance point of view. Using string interpolation is one of the most efficient approaches, however it gives no guarantees on the correctness of the URI.

I have outlined the most common mistakes I have seen that bring down production applications. Then created a custom string interpolation handler, that bares good performance qualities, and provides basic validation on the path segments of the URI.

Finally, I have shown how the current implementation can be extended with dot segments removal in the path. This should give a general direction on how the implementation can be extended. However, before implementing all possible validations (on the schema, host, port, query, etc.), I suggest using the well-tested built-in types of .NET. The outlined approach of this post is only viable for application code on hot paths, require more guarantees than bare concatenation.