07/27/2018
I have seen a Pluralsight course recently on performance by Sasha Goldstein. At one point, he mentioned how he had used processor pipelining to find a maximum number of an array of integers even faster the than the regular for/while loop. This got me thinking, can we get the presented solution even faster? (the catch: he also presented some other features of the clr/framework which will help with this)
So how can get even faster (in certain situations) finding a maximum integer in an unordered array?
In this investigation, I am going to use micro-benchmarking, which is certainly a way of measuring one or a couple of instructions, but I find the results difficult to generalize: things being measured are correct, but small changes to the code or the environment can have significant impact on the results.
In my tests I will use an array of 2MB, which easily fits into the processor cache of my machine. The array is filled with integers [0..10000). It is important to have it the whole array fit in the processor cache, so we can avoid measuring the latency for bringing a chunk of data from memory to the processor.
Find out more