How to test the execution time of a code in Visual Studio 2017?

Asked

Viewed 322 times

0

I’m trying to test the execution time of a code, but I’m always getting incorrect values, the first test will always be the one with the worst time. And most of the time the second test is always 0.

#include <iostream>
#include <math.h>
#include <intrin.h>
#include <chrono>

using namespace std;

#define MAX_LOOP                100000
#define NUM                     10000.f

auto sse_sqrt( float n )
{
    __m128 reg = _mm_load_ss( &n );
    return _mm_mul_ss( reg, _mm_rsqrt_ss( reg ) ).m128_f32[ 0 ];
}

auto stl_sqrt_timer()
{
    auto start = std::chrono::high_resolution_clock::now();

    for ( auto i = 0; i < MAX_LOOP; i++ )
    {
        auto v = std::sqrt( NUM );
    }

    auto end = std::chrono::high_resolution_clock::now();

    return ( end - start ).count();
}

auto sse_sqrt_timer()
{
    auto start = std::chrono::high_resolution_clock::now();

    for ( auto i = 0; i < MAX_LOOP; i++ )
    {
        auto v = sse_sqrt( NUM );
    }

    auto end = std::chrono::high_resolution_clock::now();

    return ( end - start ).count();
}

int main()
{
    cout << "sse_sqrt: " << sse_sqrt_timer() << "\n";
    cout << "stl_sqrt: " << stl_sqrt_timer() << "\n";

    cin.ignore();

    return 0;
}

First execution: sse_sqrt: 12461 stl_sqrt: 0

Second execution: sse_sqrt: 2643 stl_sqrt: 378

Reversing the order of tests:

stl_sqrt: 23032 sse_sqrt: 378

stl_sqrt: 2265 sse_sqrt: 0

I am compiling in Release x86, with optimization /Ox

1 answer

1


Take into account that certain optimizations can affect performance more than you think. For example, if a variable is defined, initialized, modified and everything else but all useless (because its value is not really used), its existence may be omitted along with the instructions that compute its new value. This is one of the most violent causes of bugs in performance measurements.

Therefore, calculating something unnecessarily can cause the executable not to have the calculation, as well as even a loop can be simplified to the point that it does not occur in execution. Therefore, to measure correctly with optimizations you must make a code that takes this into account. For example, accumulate the result of the calculation so that all operations have relevance in defining the value of a variable and then use this variable in a way that guarantees its usefulness, such as printing its value (or pretend to print, passing as argument in the printf but not including it in the formatting of what will be printed). What do you mean? See the following example.

  int index , sum , chrono ;
  chrono = time(0) ;                                 // Mede ponto de partida.
  for( index=0 ; index<999999 ; index++ ){
      sum += index ;                                 // Executa a instrução que quer medir.
  }
  chrono = time(0)-chrono ;                          // Mede intervalo.
//printf( "Terminou em %d segundos.\n" , chrono ) ;
  printf( "Terminou em %d segundos.\n" , chrono , sum ) ;

Another thing, take into account that programs may not be running 100% of the time in the CPU, thus dilating the time in some passages. There are several ways around this, but none are perfect. Another thing that increases the measured time is carrying out instructions that keep the loop (condition, increment) beyond the calculation itself that you want to measure, which resolves by making a particular measurement of these excessive instructions to know what this additional time is. One more thing, if the first measurement still has strange results, then make a fictitious first measurement and discard it.

Any doubt?

  • Thanks for the clear answer, I tested your code and always ends in 0 seconds, any idea?

  • Actually, it’s just an example to understand what I meant in that context. The time(0) has one-second granularity, ie it is not high resolution clock/timer. Continue using std::chrono to have the necessary precision.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.