0
Hello, everybody!
I have one more question about matrices, more precisely in the memory occupation of each one.
I’m trying to simulate what the guy in this video Performance x64: Cache 2 Cache Blocking did, but is not beating the execution times. He used some extra library for generating the matrices.
I need to prove in the code that when I divide the memory into blocks the execution is faster, that is, with fewer Smes. Does anyone have any idea where I’m going wrong? Thank you.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#define BLOCO 10
#define DIMENSAO 100
int main(){
double matrizA[DIMENSAO][DIMENSAO];
double matrizB[DIMENSAO][DIMENSAO];
double matrizC[DIMENSAO][DIMENSAO];
double matrizD[DIMENSAO][DIMENSAO];
double matrizE[DIMENSAO][DIMENSAO];
srand(time(NULL));
//matrizA
for(int i = 0; i < DIMENSAO; i++){
for(int j = 0; j < DIMENSAO; j++){
matrizA[i][j] = (double)rand()/RAND_MAX*10.0;
}
}
//matrizB
for(int i = 0; i < DIMENSAO; i++){
for(int j = 0; j < DIMENSAO; j++){
matrizB[i][j] = (double)rand()/RAND_MAX*10.0;
}
}
//Matriz B (transposta)
for(int i = 0; i < DIMENSAO; i++){
for(int j = 0; j < DIMENSAO; j++){
for (int k = 0; k < DIMENSAO; k++) {
matrizD[j][i] = matrizB[i][j];
}
}
}
//produto matrizA x matrizB
clock_t t1;
t1 = clock();
for(int linha = 0; linha < DIMENSAO; linha++){
for(int coluna = 0; coluna < DIMENSAO; coluna++){
for (int k = 0; k < DIMENSAO; k++) {
matrizC[coluna][linha] += matrizA[k][linha] * matrizB[coluna][k];
}
}
}
t1 = clock() - t1;
printf("Tempo de execução A x B: %.2lf \n", ((double)t1)/((CLOCKS_PER_SEC/1000)));
//Produto de A x B(transposta)
clock_t t2;
for(int linha = 0; linha < DIMENSAO; linha+=BLOCO){
for(int coluna = 0; coluna < DIMENSAO; coluna+=BLOCO){
t2 = clock();
for(int blocoLinha = linha; blocoLinha < linha + BLOCO; blocoLinha++){
for(int blocoColuna = coluna; blocoColuna < coluna + BLOCO; blocoColuna++){
for (int k = 0; k < DIMENSAO; k++) {
matrizE[blocoColuna][blocoLinha] += matrizA[k][blocoLinha] * matrizC[blocoColuna][k];
}
}
}
t2 = clock() - t2;
printf("Tempo de execução A x B (transposta): %.2lf \n", ((double)t2)/((CLOCKS_PER_SEC/1000)));
}
}
}
"but it’s not beating the execution times"...what times you are getting and what would be the expected result? What is the processor and memory you are using? Which operating system? Which compiler? What build options? Are there other programs running while you are testing? All these details (and perhaps some others) can greatly influence the outcome! I recommend you edit the question and add as much detail as possible to get a more assertive answer
– Gomiero
Another important detail is that in the video indicated in the question, in addition to the "guy" does not inform these details above, apparently, the code it uses for the test seems to have other serious problems because the method
Matrix::MUL
that it tests: allocates memory, generates the result in a temporary matrix, copies the result to the target matrix and displaces the temporary matrix, calls methods, etc., and to test the cache performance, this cannot all be included no way at runtime of the tested code snippet.– Gomiero
Hi, @Gomiero, thanks for the answers. So, some things I said I didn’t really pay attention to, but in the case of memory allocation I made a point of not having, to really observe the performance in both cases. I am using Debian, GCC, Core i7-5500U CPU @ 2.40GHz, 8GB of RAM. I will scan the code again and post and improve the question. The "guy" in the video is Chris Rose, I only know of this video anyway, sorry I didn’t put his name, my fault.
– tux