Discrepant running times in parallel programming

Asked

Viewed 410 times

17

I made a parallel code in C in order to verify its execution time. Was treated:

  • Threads;
  • Mutex;
  • False Sharing;
  • Synchronization.

When executing the time of Linux with the execution of the code, in general it was possible to compute in the following time:

Resultado da soma no modo concorrente = 36028797153181696  

real    0m0.340s  
user    0m1.300s  
sys     0m0.000s  

However, every +- 5 runs of these, there was a dramatically different time:

Resultado da soma no modo concorrente = 36028797153181696  

real    0m1.863s
user    0m5.584s
sys     0m0.000s  

What left to treat?
What the times mean real, user, sys?

Follows the code that calculates the sum of elements of 1~N

#include <stdio.h>
#include <pthread.h>
#define QTD 268435456 //1024//16384 //8192
#define QTD_N 4

unsigned long long int soma = 0;
unsigned long int porcao = QTD/QTD_N;

struct padding{
    unsigned long long int s;//soma parcial;
    unsigned int i,start,end; 
    unsigned int m; // identificador de parcela
    char p[40];
};

pthread_mutex_t mutex_lock;


void *paralelo(void *region_ptr){
    struct padding *soma_t;
    soma_t = region_ptr;
    soma_t->s = 0;    
    soma_t->start = soma_t->m * porcao + 1;
    soma_t->end = (soma_t->m + 1) * porcao; 
    for(soma_t->i = soma_t->start; soma_t->i <= soma_t->end ; soma_t->i++){
        soma_t->s += soma_t->i;
    }

    pthread_mutex_lock(&mutex_lock);
    soma += soma_t->s;
    pthread_mutex_unlock(&mutex_lock);
    pthread_exit(NULL);
}

int main(void){ 
    pthread_t thread[QTD_N];
    struct padding soma_t[QTD_N];
    int i;
    void *status;
    pthread_mutex_init(&mutex_lock,NULL);
    for(i = 0 ; i < QTD_N ; i++){
        soma_t[i].m = i;
        pthread_create(&thread[i], NULL, paralelo, &soma_t[i]);
    }

    for(i = 0 ; i < QTD_N ; i++){
        pthread_join(thread[i],&status);
    }
    pthread_mutex_destroy(&mutex_lock);

    printf("Resultado da soma no modo concorrente = %lli\n",soma);
    return 0;
}
  • I ran your code a few dozen times and could not reproduce the discrepancy in the times. You can give more details about your test environment?

  • on Linux: [gcc codigo. c -o executaval -lpthread] to generate the binary.Then, do [time . /executable] and will generate the result of the code and then the execution times

1 answer

14


About the time, the result is expressed in three times:

  • real: This is simply the time you would count on a clock (often referred to as Wall clock). It is the difference between the start time and the end time.
  • user: Here is the total amount of time your process spent awake and running at some core. Note that if it is running on more than one core at the same time, it counts n times. Then this value may be greater than the real.
  • sys: Here’s how much time the system has taken to do something that has been asked for by your process, such as printing things on the screen or reading files. It is a processing done indirectly.

On the difference measured during execution, one thread terminates its processing quickly while the other three spend more time processing, for whatever reason. It’s clear here:

grafico

This phenomenon seems to disappear when the code is optimized by the compiler before transforming execution soma_t->s and soma_t->i in records. That way there is no more reading and writing of memory in the loop (ie: the problem may have something to do with the processor cache).

Note that it is the system that takes the trouble to decide when and for how long each thread will run. The reason in your specific case that this happened eludes me, but it is something common. The behavior of this type of code is not deterministic, do not expect it to be.

  • 1

    In my opinion, it seems more than any reason, because the time varies only in some executions and for having a very different time. I wondered if I didn’t leave some detail of architecture to be considered.Anyway, I’m delighted with your reply. Thank you!

  • 1

    The least unlikely reason I could think of was randomization of address space plus the alignment of cache of stack variables. And maybe some interference from another process in the core. Kind of hard to find a good cause. But turning on optimization seemed to solve. Maybe someone will come up with a good explanation later.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.