Regex and Faster

In the series of these posts, I take a regular task, and implement it multiple times in an iteratively manner, while also improving performance.

The task is the following, given an input string with placeholders in it, and a map of placeholders to actual values. Replace the placeholders in the input string with the actual text values from the map and return the final string. For example, given input string Hello [%placeholder%]! and map [%placeholder%] => World should return Hello World!. The placeholder's [%placeholder%] part is referred as the key, and World part as the value in this post.

To further clarify the task, these constraints are also true:

  • placeholder keys are delimited with [% at the beginning and %] at the end

  • placeholders are non-recursive (a placeholder's value may contain another (or the same) placeholder's key, but it should not be processed recursively)

  • input text contains no placeholder that are embedded by other placeholders (such as hello [%outer[%inner%]example%])

  • the key of a placeholder contain letters and underscores

  • the same fixed set of placeholders are used on multiple input strings

  • placeholders are used at most once per input text

To place the task in a real-life example, imagine a system handling order requests. For each request the system needs to create an e-mail based on a template, a pdf containing billing information, a pdf for shipping information, notification for end users etc. The same set of order parameters (placeholder values) need to be replaced in the template (input string) of the e-mail, pdf, notification etc.

In the series of these posts, I present multiple, iterative implementations. The first part of the series established a common set of unit tests to be passed by all presented implementations. It also presented four solutions using regular expressions to achieve to goal of this task. This post discusses alternative implementations without using regular expressions.

Attached code samples may be too large for a blog post to present effectively, the actual code is linked from GitHub.

Solution 4

Solution 4 uses a different approach compared to previous implementations. Instead of using a regular expression to search for the placeholders, it searches for the begin [% and end %] sequences of the placeholder key. It has two helper methods:

  • FindSectionStart() method searches forward for the placeholder begin sequence and returns if that is found. It also returns a span of the input string from its beginning up until the placeholder.

  • FindEndSection() method searches forward for the placeholder end sequence. It returns true if that is found, otherwise returns false. It also returns false if a [ or ] is found without the corresponding % of the end sequence. This method returns a span as an out parameter, which contains a string up until its termination point.

Notice, that FindEndSection() method does not terminate in all possible cases (a placeholder may contain letters or numbers), but it rather focuses on characters that are present in the placeholder begin or end sequence. For the current implementation it does not matter what characters are in between, as a worst-case scenario a placeholder key is matched that is not present in the placeholder's map (assuming the placeholders map contains valid keys).

Substitute() method uses the above two helper methods to sequentially process the input string. If no begin or end sequence is found, the processed span of characters are appended to a StringBuilder. If a possible placeholder key is matched, the key is looked up in the placeholders map. When the key is found in the placeholders' map, the corresponding value is appended to the string builder, otherwise the matched key.

When a possible ReadOnlySpan<char> key is matched, how can it be looked up in a Dictionary<string, string> map?

One possible way to achieve this is to call the ToString() method on the span, but this occurs an extra string allocation, which would defeat the performance gains. However the type string has a static GetHashCode method, which calculate the hash code of a ReadOnlySpan<char>. If same hashing is done on the placeholders' map, we can look up the placeholder key-value pairs from a Dictionary<int, string> based on this hash value. The constructor of Solution 4 builds this dictionary by iterating over the elements of the input map and calculating the hashcode of each placeholder's key. The constructor also calculates the extra length required by the placeholder key-value replacements. This code is for demonstration purpose, in production one might want to handle the case, when the generated hashcodes collide.

Note, that the constraint 'placeholders are used at most once per input text' allows to estimate on the maximum length of the returned string. This allows an optimization for the StringBuilder type, by setting the capacity parameter in its constructor. With that the string builder can allocate a large enough buffer upfront for the appended parts, without the need for intermittent allocations.

This approach yields further performance improvements, Solution 4 so far has the best execution time compared to all previous implementations discussed in the first post.

|    Method |        Mean |     Error |      StdDev |  Gen 0 | Allocated |
|---------- |------------:|----------:|------------:|-------:|----------:|
| Solution0 |  7,494.9 ns | 298.81 ns |   866.91 ns | 3.6850 |     11 KB |
| Solution1 | 24,029.0 ns | 763.83 ns | 2,252.18 ns | 8.4381 |     26 KB |
| Solution2 |  1,681.4 ns |  79.69 ns |   234.97 ns | 0.8831 |      3 KB |
| Solution3 |  1,047.4 ns |  12.76 ns |    11.94 ns | 0.8831 |      3 KB |
| Solution4 |    540.8 ns |  10.81 ns |    10.62 ns | 1.0071 |      3 KB |

Solution 5

Solution 5 is based on the same design as Solution 4. The key difference between Solution 4 and Solution 5 is that Solution 5 uses a char buffer instead of a StringBuilder. The buffer is rented from an ArrayPool<char> as the first operation of the Substitute() method and returned as the last operation. The idea here is to avoid throwing away and re-allocating the temporary memory that is required by the StringBuilder.

Note, that this idea could be also implemented by utilizing a pool of StringBuilder objects, so that each StringBuilder could be rented upon use, cleared and returned for future re-use.

Because the buffer char array is reused in future Substitute() method calls, the overall memory usage of this method is reduced to 1 KB, which is required for the string allocation of the result string.

|    Method |        Mean |     Error |      StdDev |  Gen 0 | Allocated |
|---------- |------------:|----------:|------------:|-------:|----------:|
| Solution0 |  7,494.9 ns | 298.81 ns |   866.91 ns | 3.6850 |     11 KB |
| Solution1 | 24,029.0 ns | 763.83 ns | 2,252.18 ns | 8.4381 |     26 KB |
| Solution2 |  1,681.4 ns |  79.69 ns |   234.97 ns | 0.8831 |      3 KB |
| Solution3 |  1,047.4 ns |  12.76 ns |    11.94 ns | 0.8831 |      3 KB |
| Solution4 |    540.8 ns |  10.81 ns |    10.62 ns | 1.0071 |      3 KB |
| Solution5 |    360.4 ns |   7.13 ns |     9.99 ns | 0.3490 |      1 KB |

Using the ArrayPool<char>, the Substitute() method still remains thread-safe.

Solution 6

Solution 6 is a step sideways compared to previous solutions. Based on the current benchmark inputs, it is not faster than Solution 5. It has minor changes to FindSectionStart() and FindEndSection() methods. The main changes are in the Substitute() method, which replaces the buffer rented from the ArrayPool<char> with a collection of segments. The idea behind this implementation is to avoid duplicate copying of the input and placeholder values into a temporary buffer and then to the result string. In Solution 5 a copy occurred when the found segment was copied into the temporary buffer, and a copy occurred when the final string was created from the buffered value. In Solution 6 when a segment is found an entry is added to a list. The entry is a tuple of a reference to the source string, a start and an end index. For the type of the entry spans may not be used, as the list to lives on the heap. An alternative approach would be to use ReadOnlyMemory<T>, but my benchmark showed that a value tuple yields better performance results.

While previous approaches have been using a large enough buffer to assemble the string, Solution 6 must explicitly calculate the exact length of the result string throughout the matching operation. To assemble the result string, the static string.Create method is used. The SpanAction delegate iterates over the list of tuples and copies the given slice of the referenced string into the result.

|    Method |        Mean |     Error |      StdDev |  Gen 0 | Allocated |
|---------- |------------:|----------:|------------:|-------:|----------:|
| Solution0 |  7,494.9 ns | 298.81 ns |   866.91 ns | 3.6850 |     11 KB |
| Solution1 | 24,029.0 ns | 763.83 ns | 2,252.18 ns | 8.4381 |     26 KB |
| Solution2 |  1,681.4 ns |  79.69 ns |   234.97 ns | 0.8831 |      3 KB |
| Solution3 |  1,047.4 ns |  12.76 ns |    11.94 ns | 0.8831 |      3 KB |
| Solution4 |    540.8 ns |  10.81 ns |    10.62 ns | 1.0071 |      3 KB |
| Solution5 |    360.4 ns |   7.13 ns |     9.99 ns | 0.3490 |      1 KB |
| Solution6 |    387.4 ns |   7.29 ns |     7.16 ns | 0.3490 |      1 KB |

Further investigation could tell how this solution scales on different inputs compared to previous solutions, but that goes beyond the scope of this post.

Solution 7

Solution 7 is the last non-recursive version discussed in these posts. This implementation is based on Solution 5. Solution 7 uses a large enough array instead renting one from ArrayPool<T>. The buffer array is stored in a private field, hence making this solution not thread-safe. The initial operation of the Substitute() checks if the current temporary array is large enough for the input string. If it is not, a new one is allocated with a large enough size, and the old one becomes garbage. This implementation assumes that templates (input strings) are repetitive, and over time the required space grows large enough to accommodate all future templates (for the lifetime of the process running this code).

Another minor tweak of this implementation is the usage of [SkipLocalsInit] attributes, which also requires that the project has allow unsafe blocks enabled. This tells the compiler to omit the .locals init flag which might leave local variables uninitialized. Based on my analysis, this is a safe operation for this code.

|    Method |        Mean |     Error |      StdDev |  Gen 0 | Allocated |
|---------- |------------:|----------:|------------:|-------:|----------:|
| Solution0 |  7,494.9 ns | 298.81 ns |   866.91 ns | 3.6850 |     11 KB |
| Solution1 | 24,029.0 ns | 763.83 ns | 2,252.18 ns | 8.4381 |     26 KB |
| Solution2 |  1,681.4 ns |  79.69 ns |   234.97 ns | 0.8831 |      3 KB |
| Solution3 |  1,047.4 ns |  12.76 ns |    11.94 ns | 0.8831 |      3 KB |
| Solution4 |    540.8 ns |  10.81 ns |    10.62 ns | 1.0071 |      3 KB |
| Solution5 |    360.4 ns |   7.13 ns |     9.99 ns | 0.3490 |      1 KB |
| Solution6 |    387.4 ns |   7.29 ns |     7.16 ns | 0.3490 |      1 KB |
| Solution7 |    336.2 ns |   6.55 ns |     9.18 ns | 0.3490 |      1 KB |

Benchmarks indicate this solution to be the fastest overall, 22 times faster to the initial approach and using 11 times less memory. Is this good enough result to stop further optimization? In real-life scenarios performance targets must be set, so engineers may work towards reaching the given goal. Such performance targets might demand a maximum memory / heap usage, a mean execution time, or a 95th percentile target for given resource usage along with typical input parameters.

For this post I had no such performance target, but I stop further optimization after Solution 7, marking it as the final solution.

Conclusion

With this post I have covered all solution where I used non regex based implementation to find and replace placeholders. Throughout these iterations, the goal has been to gradually improve the performance both for the mean execution time and for the allocated memory as well. Solution 7 reaches good enough results both memory and performance wise.

In the next and final post of this series, I present implementations allowing recursive placeholders.