Laszlo - ValueTask<T> versus Task<T>

`ValueTask<T>` versus `Task<T>`

02/10/2019

7 minutes

Introduction

ValueTask has been recently introduced to C# to comprehend use-cases where allocation of a Task would be a problem. When it comes to deciding whether to use Task or ValueTask it is always suggested to measure the given application code path.

When a Task is awaited a state machine is being generated by the compiler. This state machine (a struct) executes through the asynchronous states of the method, also it preserves the internal state of the method. When we hit a state where we need to await another task, this struct will become boxed, so the state can be preserved between different threads that might execute different states of the method. When we execute synchronously though, no boxing would be required. However, the Task we return still needs to be allocated. The class library tries its best to use cached tasks, still we might hit a case where a new Task is being allocated. To avoid this allocation we can use ValueTask. Here are some remarks of the ValueTask documentation:

A method may return an instance of this value type when it's likely that the result of its operation will be available synchronously, and when it's expected to be invoked so frequently that the cost of allocating a new Task for each call will be prohibitive.

There are tradeoffs to using a ValueTask instead of a Task. For example, while a ValueTask can help avoid an allocation in the case where the successful result is available synchronously, it also contains two fields, whereas a Task as a reference type is a single field. This means that a method call returns two fields worth of data instead of one, which is more data to copy. It also means, that if a method that returns a ValueTask is awaited within an async method, the state machine for that async method will be larger, because it must store a struct containing two fields instead of a single reference.

Test Case

This got me thinking about this tradeoff, and I wanted to see it in action. Hence, I created 4 methods, which have different signatures, but identical implementation:

public async Task<int> TaskAwaited(double probability, int syncWork)
{
  double randomValue = _rand.NextDouble();
  if(syncWork > 0)
  {
    Thread.SpinWait(syncWork);
  }
  if(_rand.NextDouble() < probability)
  {
    return (int)(randomValue * 20000 + 50000);
  }
  await Task.Yield();
  return (int)(randomValue * 20000 + 20000);
}

It receives two parameters a probability, which indicates the probability of this method executing synchronously, and a syncWork which indicates the length of synchronous work to be done. I used SpinWait to do the sync work, and Task.Yield to give up execution of the current thread. The method returns a random integer.

I had very similar implementation added with the following signatures:

async Task<decimal> TaskAwaited(double probability, int syncWork)
async ValueTask<int> ValueTaskAwaited(double probability, int syncWork)
async ValueTask<decimal> ValueTaskAwaited(double probability, int syncWork)

I used BenchmarkDotNet to benchmark these methods' execution time and heap allocation. These benchmarks had two parameters:

the probability to return synchronously
the synchronous work to execute

[Params(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)]
public double Probability { get; set; }

[Params(0, 3)]
public int Work { get; set; }

These tests were executed on CoreCLR with dotnet core 2.2. Note, as the methods complete with a certain probability synchronously, BenchmarkDotNet issued the following notification upon execution:

MultimodalDistribution
  ValueTaskTesterInt.BenchmarkTaskAwaited: Core -> It seems that the distribution can have several modes

Execution time

Integer

Let's start by examining Task<int> with a spinning of 0.

Vertical bars' values indicate nanosecond, while the horizontal values indicate the probability used in the given test case.

As we can see Task is faster while ValueTask is slower. Things tend to even out a little with higher probability of synchronous execution, but even so Tasks remain faster.

When we increase the spinning to 3, Task<int> and ValueTask<int> becomes a lot more even, there is no significant difference:

Decimal

So let's compare this to decimals. Remember that decimals are larger, that means more copying work to be done when it comes to ValueTasks.

Spinning with 0:

Looking at the trends, they are really similar to the integer use case with spinning of 0. Note, that values are shifted up though, which can be caused by the fact that operations on decimals are generally slower compared to integers.

When we repeat the benchmarks with the spinning of 3, we get similar trends as seen for integers with spinning of 3.

Looking at these results, the advantage of ValueTask does not show up. It is more difficult to use, more things to pay attention and it is slightly slower.

So where is my benefit?

Memory Allocation

First, let's examine Task<int> and Task<decimal>. What is their cost? I am using a 64 bit machine which results the following:

Task<int>: 72 B
Task<decimal>: 80 B

An integer is a 32 bit (4 byte) and decimal is 128 bit (16 byte) structure, then how is this possible? The framework allocates memory up to the nearest 8 bytes. To give this a context:

Task<double>: 72 B
Task<(int,int)>: 72 B
Task<(int,int,int)>: 80 B
Task<(long,long)>: 80 B

Note, that double is a 64 bit structure still using the same space within a Task as an integer. (int,int) is a ValueTuple structure having 2 integers within, etc.

Integer

With that in mind, let's compare Task<int> and ValueTask<int> with 0 spinning from Memory point of view:

Vertical bars' values are in bytes allocated on the heap, while the horizontal values indicate the probability used in the given test case.

The trend shows that when most calls complete asynchronously, Task is more efficient, but as soon as we reach around 20% of the calls to complete synchronously, ValueTask<int> starts to pay of its benefits by allocating less bytes on the heap.

Note, that using spinning of 3, we end up having the same trends, hence values are omitted.

Decimal

Let's see the same test cases for decimal:

As decimal is significantly bigger (at least from this test's point of view), we see that a lot more memory is allocated on the heap for the asynchronous cases, and benefits start to pay of only when more than 30% of the calls complete synchronously.

Note, that there is no difference in the trends when using a synchronous spinning of 3.

Some might wonder why Task<int> at probability of 1 shows 144 while Task<decimal> shows 160. In my test case I had 2 async Task methods. One method was highlighted above, and one that was marked with [Benchmark] attribute, hence values are doubled. This means Task<int> is using 72 bytes, and Task<decimal> is using 80 bytes. And why this difference is only 8 bytes instead of 12 is explained in the beginning of the section.

Conclusion

When deciding on ValueTask<T> we shall always measure our code. Among of many factors, we shall always pay additional attention on the probabilty of our asynchronous task completing synchronously and the data size of the generic type used.

C# dotnet performance threading

Hello, I am Laszlo

.NET developer

`ValueTask<T>` versus `Task<T>`

02/10/2019

7 minutes

Introduction

Test Case

Execution time

Integer

Decimal

Memory Allocation

Integer

Decimal

Conclusion