One very important thing to keep in mind when using hash tables, especially with open addressing, is the load factor: the ratio of the number of distinct keys in the table to the maximum capacity of the table. If the load factor is very low you will be wasting memory with an overly sparse hash table. On the other hand, if the load factor is very high, you are very prone to collisions in your hash function and formations clusters in its table (long segments of occupied indices, which mean linear probes also long...). For example, imagine what happens with a 10001 capacity table with 10000 keys used, like the one you proposed in the question: it will have 10000 consecutive indexes used and only one free index left. When trying to insert 10001the element, it is almost certain that there is a collision of the hash function and on average you will have to do a survey of 5000 indices until you reach the empty bucket!
Because of all this, the ideal is that your load factor not too big or too small. The Wikiedia says that if you’re using linear addressing, it’s good to avoid load factors above 70% or 80% (but I couldn’t find the source for this specific number). Of course, it all depends on the hash function you are using and the text you use as input. I suggest you do tests with different capabilities and see the results.
Other than that, if you do not know a priori the table capacity you will need a possibility is to resize the table dynamically. Every time the load factor passes a pre-established limit, you want a new table with double1 capacity and transfers all keys from the old table to the new table.
1 What matters here is to always multiply the capacity by a constant greater than 1 (to grow the capacities exponentially) instead of increasing by a fixed number of elements (to grow the capacity linearly). This makes the asymptotic amortized cost of resizing constant. The higher the resize factor, the less time you will spend with resizing but in compensation you will have tables spending more memory.
This has a homework face...
– Vinícius Gobbo A. de Oliveira
@Wineusgobboa.deOliveira: It’s okay to be a homework question if the question is still well asked. pcccj Asked a clear question and is asking about a detail of his problem, without asking someone to do the whole task for him.
– hugomg
@Viníciusgobboa.deOliveira: Actually it is. But the question involves, maybe, 5% of this lesson. But even if it wasn’t, one day I would have this curiosity (and maybe many people already have it), so help without judging. I’m not asking you for a program but for advice/tip. Hug
– pcccj
It was just an innocent comment, I swear. I’m sorry if it sounded critical.
– Vinícius Gobbo A. de Oliveira