Scatter table (Hash Tables)

Asked

Viewed 704 times

3

I have a hash table closed with open addressing and linear scanning for the following dictionary of words, from a text;

The input operation works like this:

1) The program asks for the first line of text (with a limit of 100 characters) at that point I’ll break the text, and add the words in the hash table; then ask for the second line... will continue asking for lines until you find a line called $STOP

2) After the $STOP, the program asks for words(strings) p/ to be searched in the hash, the search result will be the number of the line(s) of incidence of that word in the text;

//EXEMPLO:
$START
o mundo era bonito demais
o mundo jah foi muito belo 
$STOP
mundo
bonito
//retornará:
mundo: 1 2
bonito: 1

Considering that the number of keys can vary from 1 to n (10000 for example) distinct words, how should I manage my table size? Should I place a table vector[10001]?

Will not spend much memory?

How to deal with this problem of hash table closed with open addressing and linear probing"?

  • This has a homework face...

  • 2

    @Wineusgobboa.deOliveira: It’s okay to be a homework question if the question is still well asked. pcccj Asked a clear question and is asking about a detail of his problem, without asking someone to do the whole task for him.

  • @Viníciusgobboa.deOliveira: Actually it is. But the question involves, maybe, 5% of this lesson. But even if it wasn’t, one day I would have this curiosity (and maybe many people already have it), so help without judging. I’m not asking you for a program but for advice/tip. Hug

  • 3

    It was just an innocent comment, I swear. I’m sorry if it sounded critical.

1 answer

5


One very important thing to keep in mind when using hash tables, especially with open addressing, is the load factor: the ratio of the number of distinct keys in the table to the maximum capacity of the table. If the load factor is very low you will be wasting memory with an overly sparse hash table. On the other hand, if the load factor is very high, you are very prone to collisions in your hash function and formations clusters in its table (long segments of occupied indices, which mean linear probes also long...). For example, imagine what happens with a 10001 capacity table with 10000 keys used, like the one you proposed in the question: it will have 10000 consecutive indexes used and only one free index left. When trying to insert 10001the element, it is almost certain that there is a collision of the hash function and on average you will have to do a survey of 5000 indices until you reach the empty bucket!

Because of all this, the ideal is that your load factor not too big or too small. The Wikiedia says that if you’re using linear addressing, it’s good to avoid load factors above 70% or 80% (but I couldn’t find the source for this specific number). Of course, it all depends on the hash function you are using and the text you use as input. I suggest you do tests with different capabilities and see the results.

Other than that, if you do not know a priori the table capacity you will need a possibility is to resize the table dynamically. Every time the load factor passes a pre-established limit, you want a new table with double1 capacity and transfers all keys from the old table to the new table.


1 What matters here is to always multiply the capacity by a constant greater than 1 (to grow the capacities exponentially) instead of increasing by a fixed number of elements (to grow the capacity linearly). This makes the asymptotic amortized cost of resizing constant. The higher the resize factor, the less time you will spend with resizing but in compensation you will have tables spending more memory.

  • First of all thank you, your advice has opened my ways. However I have problems with the function that has the duty to pass Table content1 to table2(larger table);

  • I expressed my doubt in this post: http://answall.com/questions/38489/duvida-hash-table

Browser other questions tagged

You are not signed in. Login or sign up in order to post.