Parallelizing n queens problem with Openmp

Asked

Viewed 167 times

1

I’m having trouble parallelizing my C code with Openmp and getting good performance on the n queens problem.

The results are always considerably slower with parallelization enabled and I do not know the reason.

The code is this below:

    #include <stdio.h>
#include <stdlib.h>
#include "omp.h"

int checkQueen(int **queens, int linha, int col, int n);
void printTabuleiro(int **queens, int n);
void play(int **queens, int col, int n, int *sol, int *maxQueens, int *count);


//MAIN
int main(){
    int n = 10, sol = 0, maxQueens = 0, count = 0;

    //alocando rainhas
    int **queens = (int**) malloc(n  * sizeof(int *));
    for(int i = 0; i < n; ++i)
        queens[i] = (int*) malloc(n  * sizeof(int));

    #pragma omp parallel for
    for(int i = 0; i < n; ++i){
        for(int j = 0; j < n; ++j){
            queens[i][j] = 0;
        }
    }

    play(queens, 0, n, &sol, &maxQueens, &count);

    //imprime resultados
        printf("Maximo de rainhas: %d \n", maxQueens);
        printf("Numero de solucoes: %d \n", sol);


    return 0;
}

//testando se é possivel colocar rainha em determinada posição

int checkQueen(int **queens, int linha, int col, int n){

    int a = 0;

    if(queens[linha][col] == 1)
        return 1;
    #pragma omp parallel num_threads(n)
    {
        for(int i = linha, j = col; j >= 0 && i >= 0; --i, --j){
            if(queens[i][j] == 1){
                #pragma atomic
                a = 1;
            }
        }


        if(!a){
            for(int i = linha, j = col; j >= 0 && i < n ; ++i, --j){
                if(queens[i][j] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha, j = col; j < n && i >= 0 ; --i, ++j){
                if(queens[i][j] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha, j = col; j < n && i < n ; ++i, ++j){
                if(queens[i][j] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha; i < n ; ++i){
                if(queens[linha][i] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = linha; i >= 0 ; --i){
                if(queens[linha][i] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = col; i < n ; ++i){
                if(queens[i][col] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }


        if(!a){
            for(int i = col; i >= 0 ; --i){
                if(queens[i][col] == 1){
                    #pragma atomic
                    a = 1;
                }
            }
        }

    }

    return a;
}

//Imprime tabuleiro
void printTabuleiro(int **queens, int n){
    printf("//////////////////////////////////////////////");
    printf("\n//// 1 - rainhas / 0 - posicoes vazias ////\n");
    printf("//////////////////////////////////////////////");
    for(int i = 0; i < n; ++i){
        printf("\n");
        for(int j = 0; j < n; ++j){
            printf("%d - ", queens[i][j]);
        }
    }
    printf("\n");
}

//executa testes
void play(int **queens, int col, int n, int *sol, int *maxQueens, int *count){
    if (col == n){
        printf("\nSolucao #%d\n", ++*sol);
        printTabuleiro(queens, n);
        return;
    }

    for(int i = 0; i < n; ++i){
        if(!checkQueen(queens, i, col, n)){
            queens[i][col] = 1;
            ++*count;
            play(queens, col+1, n, sol, maxQueens, count);

            if(*count > *maxQueens){
                *maxQueens = *count;
            }

            --*count;
            queens[i][col] = 0;
        }
    }


}

1 answer

4

Your parallelization strategy is wrong. The problem is the block delimited with that within the function checkQueen

    #pragma omp parallel num_threads(n)

It turns out that the code of the contents of this block checks the entire board. How you’re flipping n threads there, each one of those n threads will check the entire board, which means a lot of redundant work. This block will not be finished until the last of these threads is finished. Add this to the complexity and performance impact to keep the atomic assignments, and the result will be much slower than a single-thread version.

I mean, the problem is that instead of you divide the work in n threads, you the multiplied for n threads.

To make matters worse, the function checkQueen is called multiple times within the recursive function play. This means that this overhead of creating and synchronizing threads will be multiplied by the number of times that checkQueen is called, making everything even slower.

I strongly suggest to you, parallelize the function play, and not the function checkQueen. To benefit from parallelism, it is important that you rely on as few parts as possible that are sequential single-thread, and that’s exactly how the function play is.

Also, to get a good performance gain when parallelizing, it’s important that you prevent threads from having to interact with each other. It’s perfectly possible to let them interact and it’s often inevitable, but in doing so, you’re likely to have some performance cost, and so that’s something to avoid/minimize. The way you implemented the checkQueen has a lot of interactions between threads through the variable a.

The for within the main to initialize everything with zero also has no appreciative performance gain when being parallelized. On the contrary, the overhead that thread management should impose will probably be greater than the gain. In fact, for there to be performance gains, the amount of workload that each thread has to earn has to be big enough for it to surpass that overhead, and none of the parts that you’ve parallelized have that feature.

I also have two comments:

  • Do the checkQueen return 1 when it is not possible to put a queen and 0 when it is sounds a little strange, as it implies that 1=no and 0=yes, the opposite of the convention. By changing that, you can remove the operator ! who is in the if of function play. Better yet, you can change the name of this function to queen_ok to make it clearer.

  • You can allocate a single memory area with n * n cells and use queen[linha * n + coluna] to access the elements. This way, you only have to allocate memory once to assemble the board and also ensures that it is contiguous, which offers better performance.

  • To prevent one thread from interfering with the other when changing the board, create one board per thread and use a function to copy the boards.

  • To avoid having to work with int *queen and int n in all locations, I suggest you create a struct for that reason.

  • To avoid causing threads to compete for printf when displaying solutions, use Locking. The solution below uses the concept of that reply in Soen.

  • Perhaps by using schedule(dynamic,1), you end up having a good performance gain when considering that not all partially filled trays have the same weight to be analyzed.

  • These variables sol, count and maxQueens will hinder you in the parallelization. But it is not very difficult to eliminate them. The count can be placed in the struct that I mentioned. The sol is only used in a context where you have the lock due to the need to use the printf. The maxQueens can be updated lazily just at this point.

  • Don’t forget to give free in everything you give malloc.

  • In C, most functions use the notation separada_por_underscores, and not the rating camelCase.

I suggest your code stays that way, parallelizing the play:

#include <stdio.h>
#include <stdlib.h>
#include "omp.h"

typedef struct Tabuleiro {
    int n;
    int *casas;
    int queens_count;
} Tabuleiro;

// Funções básicas sobre tabuleiro.
Tabuleiro *criar_tabuleiro(int n);
Tabuleiro *copiar_tabuleiro(Tabuleiro *original);
void destruir_tabuleiro(Tabuleiro *tabuleiro);
void print_tabuleiro(Tabuleiro *tabuleiro);

// Implementação do n queens.
int queen_ok(Tabuleiro *tabuleiro, int linha, int coluna);
void play(Tabuleiro *tabuleiro, int *solucoes, int *max_queens, omp_lock_t *lock);
void play_in(Tabuleiro *tabuleiro, int coluna, int *solucoes, int *max_queens, omp_lock_t *lock);

// Função main.
int main() {
    int n = 10, solucoes = 0, max_queens = 0;

    omp_lock_t lock;
    omp_init_lock(&lock);

    Tabuleiro *tabuleiro = criar_tabuleiro(n);

    play(tabuleiro, &solucoes, &max_queens, lock);

    printf("Maximo de rainhas: %d \n", max_queens);
    printf("Numero de solucoes: %d \n", solucoes);

    omp_destroy_lock(&lock);
    destruir_tabuleiro(tabuleiro);
    return 0;
}

Tabuleiro *criar_tabuleiro(int n) {
    Tabuleiro *tabuleiro = malloc(sizeof(Tabuleiro));
    tabuleiro->n = n;
    tabuleiro->queens_count = 0;
    tabuleiro->casas = malloc(n * n * sizeof(int));

    for (int i = 0; i < n * n; n++) {
        tabuleiro->casas[i] = 0;
    }
    return tabuleiro;
}

Tabuleiro *copiar_tabuleiro(Tabuleiro *original) {
    int n = original->n;
    Tabuleiro *tabuleiro = malloc(sizeof(Tabuleiro));
    tabuleiro->n = n;
    tabuleiro->queens_count = original->queens_count;
    tabuleiro->casas = malloc(n * n * sizeof(int));

    for (int i = 0; i < n * n; n++) {
        tabuleiro->casas[i] = original->casas[i];
    }
    return tabuleiro;
}

void destruir_tabuleiro(Tabuleiro *tabuleiro) {
    free(tabuleiro->casas);
    free(tabuleiro);
}

void print_tabuleiro(Tabuleiro *tabuleiro) {
    int n = tabuleiro->n;
    for (int i = 0; i < n; ++i) {
        printf("\n");
        for (int j = 0; j < n; ++j) {
            printf("%c", tabuleiro->casas[i * n + j] ? 'X' : '.');
        }
    }
    printf("\n\n");
}

int queen_ok(Tabuleiro *tabuleiro, int linha, int coluna) {
    int n = tabuleiro->n;
    int *queens = tabuleiro->casas;

    if (queens[linha * n + coluna] == 1) return 0;

    // Diagonal para cima e para a esquerda.
    for (int i = linha, j = coluna; j >= 0 && i >= 0; --i, --j) {
        if (queens[i * n + j]) return 0;
    }

    // Diagonal para baixo e para a esquerda.
    for (int i = linha, j = coluna; j >= 0 && i < n; ++i, --j) {
        if (queens[i * n + j]) return 0;
    }

    // Diagonal para cima e para a direita.
    for (int i = linha, j = coluna; j < n && i >= 0; --i, ++j) {
        if (queens[i * n + j]) return 0;
    }

    // Diagonal para baixo e para a direita.
    for (int i = linha, j = coluna; j < n && i < n; ++i, ++j) {
        if (queens[i * n + j]) return 0;
    }

    // Para baixo.
    for (int i = linha; i < n ; ++i) {
        if (queens[i * n + coluna]) return 0;
    }

    // Para cima.
    for (int i = linha; i >= 0; --i) {
        if (queens[i * n + coluna]) return 0;
    }

    // Para a direita.
    for (int j = coluna; j < n; ++j) {
        if (queens[linha * n + j]) return 0;
    }

    // Para a esquerda.
    for (int j = coluna; j >= 0; --j) {
        if (queens[linha * n + j]) return 0;
    }

    return 1;
}

void play(Tabuleiro *tabuleiro, int *solucoes, int *max_queens, omp_lock_t *lock) {
    int n = tabuleiro->n;

    #pragma omp parallel num_threads(n) schedule(dynamic,1)
    for (int i = 0; i < n; ++i) {
        Tabuleiro *copia = copiar_tabuleiro(tabuleiro);
        copia->casas[i * n] = 1;
        copia->queens_count++;

        play_in(copia, 0, solucoes, max_queens, lock);

        destruir_tabuleiro(copia);
    }
}

void play_in(Tabuleiro *tabuleiro, int coluna, int *solucoes, int *max_queens, omp_lock_t *lock) {
    int n = tabuleiro->n;

    if (coluna == n) {
        omp_set_lock(lock);
        printf("\nSolucao #%d\n", ++*solucoes);
        print_tabuleiro(tabuleiro);
        if (*max_queens < tabuleiro->queens_count) {
            *max_queens = tabuleiro->queens_count;
        }
        omp_unset_lock(lock);
        return;
    }

    for (int i = 0; i < n; ++i) {
        if (queen_ok(tabuleiro, i, coluna)) {
            Tabuleiro *copia = copiar_tabuleiro(tabuleiro);
            copia->casas[i * n + coluna] = 1;
            copia->queens_count++;

            play_in(copia, coluna + 1, solucoes, max_queens, lock);

            destruir_tabuleiro(copia);
        }
    }
}

Note: I haven’t tested this code, so I don’t know if it’ll work out at first, maybe there’s some silly mistake somewhere. However, even if there is something wrong, the correct result is certainly something close to that.

Still could give an improved trying to avoid the repeated allocations and displacements when preallocating a pool of trays and get/return them to that pool when necessary. However, I’m not sure what the size of the pool is, but an estimate of the size of the upper limit of this pool would be something in the order of n²/4. Note also that this pool would be a pile.

  • I did some tests with your code and the time with Openmp enabled is longer than with Openmp off. The higher the N, the greater is the distance between times in favor of no parallelization.

  • @DJM_JM I edited the answer. See if it works now.

  • Still having the same problem. Even it does not run with "#pragma omp Parallel num_threads(n) Schedule(Dynamic,1)", I need to divide into "#pragma omp Parallel num_threads(n){ #pragma omp for Schedule(Dynamic,1) }" to work. Running the same error persists.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.