The wrong answer for a performance question

At a recent event I have challenged a design and implementation from performance point of view. My concern was using the C# decimal type over double when there was no need for it from number- precision, calculation, or representation side. I was curious if the developers have made their "due diligence" and compared the performance impact of choosing one type over another.

To summarize the answer to my questions in short, no measurements were made by the developers. However, to address my curiosity, I was told that the operation in question "only takes a few seconds" and it was also demonstrated by clicking a button that triggers the operation and a few seconds later the results were displayed. All good, business stakeholders were happy, and the event continued. It was not a forum to get into a deeper discussion about this answer. I have seen similar responses in the past, but unfortunately, I cannot think of a single aspect of the response that would be answering the question.

First, a single click and a demonstration of the operation is not a performance measurement. No numbers were involved, no resource utilization was measured, no throughput and latency has been calculated. Moreover, you cannot even reason about these numbers from a single request. Second, the business has not set any performance goals. Third, the demonstration happened on a test environment with minimal test data and a single user putting load on the system. Finally, I asked for a performance comparison between two implementations, but the answer and the demonstration did not touch this aspect.

Because I do not have the environment and the code to run and measure such operation, I will have to rely on generic reasoning and assumptions to explain why I was eager to see such performance comparison.

Let's assume

Let us assume that a web API service implements this operation. To serve a request three things must be done. I denote the duration of each step with x, y, and z:

  • fetch data from a database x

  • do calculations: y1 performing it with decimal and y2 with double

  • process the request, typically in an ASP.NET Core application this involves running the middleware, routing, filters and data serialization and deserialization for the request. This step is denoted by z

The complete operation duration is x + yk + z. That means a user must wait this long to receive a response. What is the ratio of the duration while having k=1 and k=2? One might argue that it is close to 1, as usually x takes significantly longer compared to the rest of the terms: x > yk + z. It means one or the other implementation will not yield a substantial difference in the latency. This answers one of the questions. What can we tell about the ratio of the throughput?

From the application point of view, querying a database does not (shall not) use significant amount of resources on the web server. While the query is executed by the database the web application is free to serve other requests. Which means we can substitute 0 to x as it is irrelevant for the discussion of throughput.

Note, that this also assumes that the database can scale to infinite, while in a real-world scenario a database will become a bottleneck at one point. This post will not dive into addressing such an issue, as read database operations scale very well due to intensive caching.

Let us assume that the double implementation serves n times more requests compared to the decimal implementation during a unit of time:

y1 + z = n(y2 + z)

What is the value of n?

The next thing to look at is the calculation step. It has been shown that calculations may be ~21 times slower with decimals compared to doubles. This is a generic number; the exact performance difference would need to be measured for each algorithm, OS, hardware etc. For now, based on the linked post I will make a very generic assumption that y1 = 21 * y2.

21 * y2 + z = n(y2 + z)

z term denotes the processing the input requests and data serialization. Let us make an assumption: this step takes about the same time as the calculation itself: it shall not be significantly longer, otherwise one would try to find different application architecture. It means y2 ~ z. This is an assumption; concrete measurements must be made in a real application.

21 * y2 + y2 = n(y2 + y2)

After further transformations:

22 y2 = 2 * n * y2

11 y2 = n * y2

11 = n.

This means that using the same resources (CPU, memory) one implementation could serve 11 times more of the same request during the unit of time. While the latency for all requests would be the same, using a more efficient algorithm the CPU is free to serve more requests.

Conclusion

This is a theoretical thought exercise. Make real performance measurements in real applications. Do not be shortsighted by measuring latency numbers only, in a server application throughput is just as important. On the other hand, before doing premature optimizations, always aim to fulfill performance goals set by the stakeholders.