Loop to perform calculation in C Pagerank

Asked

Viewed 334 times

4

Using the starting notes 0.15 for all and then add note as explained in the article http://cmup.fc.up.pt/cmup/mecs/googlePR.pdf

If we consider pages 1, 2 and 3 to be:

1: link1
2: link2
3: link3

The.txt page file of this example would be:

1 link1 
2 link2
3 link3

Whereas the.txt links file would be:

1 2 
2 1
2 3

Your program should then calculate the Pagerank of each of the pages contained in the.txt page file (based on the.txt links file), and output a file called Pages.txt, where the pages are ordered from the largest to the smallest Pagerank. If two pages have the same Pagerank, the one with the smallest id should come first.

Would anyone know why this loop of mine is giving the wrong results? My.txt link looks like this :

1 3
2 1
3 1
3 2
3 4

My.txt page looks like this:

1 link1
2 link2
3 link3
4 link4

my Loop:

void calculoPageRank(struct page *pageRank, int numLinhas){
    int h=1,g=1;
    int j=1,i=1;
    float soma = 0.0, difer= 0.2;

    while (difer>=0.05) {
        while (h<=numLinhas) {
            soma= 0.0;
            i=pageRank[h].numeroDeLinksRecebe;
            while (i!=0) {
                g = pageRank[h].quaisLinks[i];
                soma += 0.85 * (pageRank[g].rank/pageRank[g].numeroDeLinksEnviados);
                i--;
                difer = (soma+0.15) - pageRank[h].rank;
            }

            pageRank[h].rank=soma+0.15;
            printf("%f ",difer);
            h++;
        }
        j++;
    }
}

And the complete algorithm:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct page{
    char pagina[20];
    int numeroDeLinksRecebe;
    int numeroDeLinksEnviados;
    int quaisLinks[30];
    float rank;
}page;

int numeroLinhas(){
    char caracter = '\0';
    int numLinhas = 0;
    FILE *arq;
    arq = fopen("paginas.txt", "r");
    while (!feof(arq)) {
        fread(&caracter, 1, 1, arq);
        if (caracter=='\n') {
            numLinhas++;
        }
    }
    fclose(arq);
    return numLinhas;
}

void calculoPageRank(struct page *pageRank, int numLinhas){
    int h=1,g=1;
    int j=1,i=1;
    float soma = 0.0, difer= 0.2;

    while (difer>=0.05) {
        while (h<=numLinhas) {
            soma= 0.0;
            i=pageRank[h].numeroDeLinksRecebe;
            while (i!=0) {
                g = pageRank[h].quaisLinks[i];
                soma += 0.85 * (pageRank[g].rank/pageRank[g].numeroDeLinksEnviados);
                i--;
                difer = (soma+0.15) - pageRank[h].rank;
            }

            pageRank[h].rank=soma+0.15;
            printf("%f ",difer);
            h++;
        }
        j++;
    }
}

void leitura(struct page *pageRank, int numLinhas){
    int i=1,j=1;
    FILE *arq;
    arq = fopen("paginas.txt", "r");
    while (fscanf(arq, "%d %s", &j, pageRank[i].pagina)!=EOF) {
        i++;
    }
    fclose(arq);

    int g = 1;
    for (i=1; i<=numLinhas; i++) {
        pageRank[i].numeroDeLinksRecebe = 0;
        pageRank[i].numeroDeLinksEnviados=0;
        pageRank[i].rank = 0.15;
    }

    int aux;

    FILE *arq1;
    arq1=fopen("links.txt", "r");
    while (fscanf(arq1, "%d %d", &i, &aux)!=EOF) {
        if (i==j) {
            g++;
            pageRank[i].numeroDeLinksRecebe++;
            pageRank[aux].numeroDeLinksEnviados++;
        }
        else{
            j=i;
            g=1;
            pageRank[i].numeroDeLinksRecebe++;
            pageRank[aux].numeroDeLinksEnviados++;
        }
        pageRank[i].quaisLinks[g] = aux;
    }
    fclose(arq1);
}

int main(){
    int numLinhas = numeroLinhas()+1;

    page pageRank[80];
    int i=1;
    leitura(pageRank, numLinhas);
    for (i=1; i<=numLinhas; i++) {
        printf("i=%d recebe=%d envia=%d rank=%f\n", i, pageRank[i].numeroDeLinksRecebe, pageRank[i].numeroDeLinksEnviados,pageRank[i].rank);
    }

    calculoPageRank(pageRank, numLinhas);

    for (i=1; i<=numLinhas; i++) {
        printf("\n %d rank: %f\n", i, pageRank[i].rank);
    }
}

Within this context the loop is returning an output that I did not expect. The correct return would be: PR1 = 1.4901 PR2 = 0.7833 PR3 = 1.5766 PR4 = 0.1500 and delivering R1 = 0.277500 PR2 = 0.267938 PR3 = 0.623184 PR4 = 0.150000.

Does anyone know why my loop is delivering inconsistent results?

  • What a result your program is generating?

  • PR1 = 0.277500 PR2 = 0.267938 PR3 = 0.623184 PR4 = 0.150000

  • 1

    Too bad the people who voted negative didn’t come back to check on you now.

2 answers

2

Some notes on style:

  • Utilize for when the number of iterations is known beforehand, and let while for when you must reach a condition in an unknown number of steps.

  • State the variables at the nearest point of use, and at the narrowest scope required. Some teachers believe that one should declare the variables in a block at the beginning of the function, a habit that should be abandoned. For example, i is not used outside the loop of h, then it is worth declaring it only within the loop.

  • If possible, use C99 or C11, which allows you to use the variable declaration within the for.

  • Code constants lose their fast meaning. What does 0.85 mean? Of course, you scribbled on a piece of paper to reach that value, but that information wasn’t in the code. Consider declaring p = 0.15 within the function, and replace the places where this value is used.

  • Speaking a little more about this: there is a joke that it is immoral to write a function destruir_bagdah(): the correct is to make a function destruir(cidade) and then pass Baghdad as parameter. In your case, consider that Pagerank can work with other values of p desired, so that it should come from outside.


Quite possibly, the problem with your code is that you are changing the rank array as you go through it. You need to make a copy of the structure in order to change the new.

This is evident if you see the Pagerank calculation as an iteration of matrix multiplication. If A is the matrix that gives the links, where A[i][j] = 1 if the page i has a link to the page j, and v is the rank vector, so Pagerank is obtained by iterating

v_novo = A * v_prev + v_aleatório

until v_prev ~ v_novo. I haven’t tested the following program, but I hope it’s correct. Next time, consider making the work of the respondent easier, countless times in the effort to create a minimal example you already solve the problem.

// Nao precisa colocar o tipo de pageRank como struct page, vc ja
// fez um typedef
void calculoPageRank(page *pageRank, int numLinhas){    
    // Se nao tem motivo pra usar float, use double, que eh mais preciso
    double p = 0.15;
    double soma = 0.0, difer= 0.2; 

    int contador = 0; // j nao parecia um bom nome pra essa variavel
    while (difer >= 0.05) {
        page copia[numLinhas]; //C99 - se nao funcionar, use copia[80]

        // memcpy copia pedacos de memoria. Necessita do header <string.h>
        memcpy(copia, pageRank, numLinhas * sizeof(page));

        for (int h = 1; h <= numLinhas; h++) {
            soma= 0.0;
            int i = copia[h].numeroDeLinksRecebe;
            for (; i > 0; i--) {
                int g = copia[h].quaisLinks[i];
                soma += (1 - p) * (copia[g].rank / copia[g].numeroDeLinksEnviados);
                difer = (soma + p) - copia[h].rank;
            }

            pageRank[h].rank = soma + p;
            printf("%f ",difer);
        }
        contador++; // Pra que voce esta usando isso?
    }
}

2

I confess that I cannot, at first reading, understand how Pagerank works and express it by means of code. But looking at your attempted solution, I’ve identified three potential issues that may be contributing to an incorrect outcome:

  1. Potential infinite loop on while external

    When you leave the while internal - which indicates that your condition is no longer satisfied - and simply does j++ and continues the external loop, the condition of the while intern has not changed. This means that if after the first iteration the difer still bigger than 0.05, it will enter an infinite loop. Lucky for you, this did not happen, otherwise instead of an incorrect output you would have a program locked...

    I suggest you resolve this before moving to item 2. If what you want is to go through all the lines again, what is missing is to go back to h to zero. By the way...

  2. Why h begins with 1, and i to zero?

    Is that intentional? Or are you not aware that in C, the indexes begin at zero (i.e. the first element is array[0], the second array[1], the third array[2] and so on)? I’m sorry if I said a basic thing, but I saw no reason to look at the code to assume that you actually wanted to "jump" the first element. (and if that’s really wrong, I’m surprised you didn’t find a segfault...)

  3. The while external only takes into account the last visited link.

    Each time you analyze a link you assign the difer several times (this is correct, but not really necessary - you could do it once, just after leaving the while more internal; but this is detail). However, when you will analyze the next link you are discarding the values of the previous links, so that at the end only the last link visited contributes to the difer - and therefore defines whether the external loop will continue or not.

    I guess that’s not what you want, or am I mistaken? Maybe what you want is to continue the loop in case the difer maximum is greater than 0.05.

Putting it all together, I suggest you change your code to:

int h=1,g=1; // (Estou mantendo essas linhas iguais,
int j=1,i=1; //  pois elas não importam agora...)
float soma = 0.0, difer= 0.2, max_difer = 0.2; // Variável extra para a diferença máxima

while (max_difer>=0.05) { // Agora compara a diferença máxima
    max_difer = 0.2; // ... resetando ela no início de cada loop
    h = 0; // h também tem de ser resetado, para zero, não para 1

    while (h<numLinhas) { // h vai de 0 a numLinhas-1, então é < e não <=
        soma= 0.0;
        i=pageRank[h].numeroDeLinksRecebe-1; // idem para o i
        while (i>=0) { // i pode ir até zero, então é >= em vez de !=
            g = pageRank[h].quaisLinks[i];
            soma += 0.85 * (pageRank[g].rank/pageRank[g].numeroDeLinksEnviados);
            i--;
        }
        difer = (soma+0.15) - pageRank[h].rank; // Movi pra fora do loop interno

        if ( difer < max_difer ) // Atualiza a diferença máxima
            max_difer = difer;

        pageRank[h].rank=soma+0.15;
        printf("%f ",difer);
        h++;
    }
    j++;
}

I don’t know if it’ll lead you to the expected outcome, but I hope it’ll be of some help!

  • I updated the font code so you can understand some of the loops, see if you have a time :D

  • 2

    @user2984406 In addition to copy my answer to your question, Did you do anything else? Ok, you posted the struct, the code you read from the file, etc, but this won’t help me better understand the page rank... (by the way, why do you read the file paginas.txt twice?)

  • So, man, I didn’t just copy it and I made some corrections to see if it changed something and continued with the same answer, so I left yours which I thought was better structured. PAGERANK is like this, all pages start with 0.15 note, so someone has a link pointing to the site he earns the site note multiplied by 0.85, the only problem is that when the sites change the note constantly this difference reflects in all the sites that it points and so on, when the difference starts to get too small the loop has to stop, there is something wrong in this algorithm and I do not see.

  • @user2984406 blz, the problem is not copying (this site is kind of a wiki, so copy at will, just quote the source). The case is that I identified a few minor problems in your code, but I didn’t give a complete solution - I don’t know how! And when you repeated my code and asked "what now?" I simply had nothing else to contribute; my answer is already at the limit of my ability. The issue was just not to misread the question (as I put it in the edit comment), but if you have some advance and want to ask here for a new problem [specific], there is no problem!

  • @user2984406 By the way, I and 2 other users have already voted to reopen your question (now that it is more objective), if 2 more do this you will be able to receive answers (then maybe someone more experienced that I can help you...)

  • Anyway thank you very much, thank you very much for your time and for having tried :D

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.