How to create binary tree with iterative algorithm?

Asked

Viewed 568 times

3

I’m making a program that allocates a dictionary in memory to check for misspelled words in text files. The function below converts dictionary words into integers and stores them in tree structures.

  • My question is how to write correctly and iteratively part of code that stores the numbers in the tree(from line 24).

I saw many examples, but all recursive and did not understand very well, feel free to give any kind of example to other users who may have the same doubt.

typedef struct arvore
{
    unsigned int n;
    struct arvore *esq, *dir;
}arvore;

arvore *raiz;

unsigned int
BPHash(const char* str, unsigned int len)
{
    unsigned int hash;
    for (unsigned int i = 0; i < len; str++, i++)
        hash = hash << 7 ^ (*str);
    return hash;
}

bool
load(const char *dict)
{
    FILE *fp = fopen(dict, "r"); // dict = arquivo que serve como dicionário
    if (fp == NULL)
        return false;

    char str[LENGTH+1]; // LENGTH = tamanho máximo da string (45)
    unsigned int strtam, hash; // Tamanho da string e string convertida em inteiro com função hash
    struct arvore dicio = {0, NULL, NULL};
    raiz = &dicio; // Ponteiro global

    while (fgets(str, LENGTH, fp) != NULL)
    {
        strtam = strlen(str);
        hash = BPHash(str, strtam); // BPHash = função hash

        if (raiz->n == 0) // Minha dúvida se refere as linhas abaixo
            raiz->n = hash;

        else if (raiz->n < hash)
        {
            raiz->esq = (arvore*)malloc(sizeof(arvore));
            raiz->dir = NULL;
            raiz->n = hash;
        }

        else
        {
            raiz->dir = (arvore*)malloc(sizeof(arvore));
            raiz->esq = NULL;
            raiz->n = hash;
        }
    }

    return true;
}
  • You could put your job BPHash also?

1 answer

2


Your first problem is that the root of the tree (dicio) is declared within the function itself load, statically allocated on the runstack (local variables are allocated on the runstack). This means, that when the function is finished, the corresponding part of the execution stack will be cleared and with it the root of the tree. The root could still be recovered by the global variable raiz, but this will be a pointer that will point to a part of the stack that no longer exists and that will soon be overwritten, then becoming junk.

The best solution for this would be to return the root of the tree or NULL if it cannot be created. Avoid using global variables, as doing so is a bad programming practice. But as you’re being forced to work that way, so unfortunately there’s no other way.

Another problem is that your tree has at most three knots: raiz, raiz->dir and raiz->esq. In fact, there will only be two, because when you create the left node, you erase the right and vice versa. Also, you erase these nodes without shifting them, causing memory leakage. You are never creating or accessing nodes on deeper levels.

Your revised code is this:

// LENGTH = tamanho máximo da string
#define LENGTH 45

// função hash
unsigned int BPHash(const char *str, int len) {
    unsigned int hash = 0;
    for (unsigned int i = 0; i < len; str++, i++) {
        hash = hash << 7 ^ (*str);
    }
    return hash;
}

typedef struct arvore {
    unsigned int n;
    struct arvore *esq, *dir;
} arvore;

arvore *raiz = NULL;

bool load(const char *dict) {

    // Primeiro, destroi qualquer árvore que já houvesse antes, para evitar vazamento de memória.
    destruir_arvore(raiz);
    raiz = NULL;

    FILE *fp = fopen(dict, "r"); // dict = arquivo que serve como dicionário
    if (fp == NULL) return false;

    char str[LENGTH + 1];

    while (fgets(str, LENGTH, fp) != NULL) {
        arvore *novo = (arvore *) malloc(sizeof(arvore));
        novo->n = BPHash(str, strlen(str));
        novo->esq = NULL;
        novo->dir = NULL;

        arvore *no = raiz;

        while (1) {
            if (no == NULL) {
               raiz = novo;
               break;
            } else if (no->n < hash) {
                if (no->esq == NULL) {
                    no->esq = novo;
                    break;
                }
                no = no->esq;
            } else {
                if (no->dir == NULL) {
                    no->dir = novo;
                    break;
                }
                no = no->dir;
            }
        }
    }

    return true;
}

Notice that we have two whiles here. The while external scrolls through the file reading its strings, and at each iteration, a node novo is created. The while internal uses a pointer no, which begins in the raiz and down the tree (no = no->esq; and no = no->dir;) until you find an empty place where the knot novo can be inserted. When the node novo is inserted into the tree, a break; in the while internal. The if (no == NULL) is to create the root of the tree. This approach still has the advantage that zero is not special and the function works even if BPHash return zero. Recursion is not used here.

A function to destroy the tree(s) created(s) is also required in order to prevent memory leaks (this time with recursion, since the version without recursion is much more complicated):

void destruir_arvore(arvore *no) {
    if (no == NULL) return;
    destruir_arvore(no->esq);
    destruir_arvore(no->dir);
    free(no);
}

Of course this all depends on the function BPHash have been properly implemented (missing initialize the hash with zero in it). I don’t know if the implementation is adequate or not. I also don’t know if this tree is really useful to what you want, but if it is, it’s the way I put it up that you will build it.

  • I added the function BPHash and a few more comments. "Your first problem is that the raiz tree is declared within the function itself load" The pointer raiz is a global variable, is declared outside any function. "The best solution for this is to return the root of the tree or NULL if it cannot be created" In this exercise I cannot change the statement of load, have to return a bool, so I created the global root pointer, to access the arvore dicio of any function.

  • "In fact, you only have two, because when you create the left node, you erase the right one and vice versa" This is my greatest difficulty, creating new nodes iteratively, and not overwriting them. "Also, you erase these nodes without shifting them, causing memory leakage. You are never creating or accessing nodes on deeper levels." I’m aware of that, I didn’t bother to dislocate because I haven’t even been able to create rsrs.

  • @Matheus Reply edited. As for the first item, I had chosen to say that I referred to the variable dicio.

  • I managed to climb one more step in the ladder of the data structure, thank you! Just one more question: the tree type pointer raiz has four bytes (int) which store address, and the following bytes tree struct data(n, *Esq, *dir)? I didn’t understand if raiz stores only one pointer for memory or all struct.

  • @Matheus The size of the pointer depends on the platform. If I’m not mistaken, in programs compiled with 32-bit architectures it has 4 and ccom 64-bit architectures it has 8 bytes, but I may be mistaken. The pointer points somewhere in the heap. In this place is allocated the structure. The size of the structure can be computed with sizeof(arvore), so much so that it is this amount of memory that you allocate with the malloc for exactly that reason. [continue]

  • 1

    @Matheus [continuation] It is believed that sizeof(arvore) == sizeof(unsigned int) + sizeof(arvore *) + sizeof(arvore *), but the compiler is allowed to add more bytes than padding to take advantage of data alignments in memory. It is also expected that all pointers have the same size, be it 4 or 8 bytes (on 16-bit architectures this did not happen, there were two types of pointers, near and far of different sizes). [continue]

  • 1

    @Matheus [continuation] According to this test, the unsigned int has 4 bytes, each pointer has 8 bytes, the structure components add 20 bytes, but the structure size is 24 bytes, so there are 4 bytes left over (padding). But again, this is something that the compiler has power to decide and depends on the platform, the compiler in question, applied optimizations, the execution environment, etc.

  • 1

    @Matheus https://ideone.com/ft87Kj

Show 3 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.