Dealing with collisions in Dictionary C#

Asked

Viewed 3,125 times

3

I need to map a string array to a dictionary so that I can later test whether an informed string is valid by checking if it is part of the dict, additionally would like to recover some other information stored on dict.

For this I am using the following structure:

struct reservedWord
    {
        public String command;
        public type token;      //token é um enumerador
    };

When trying to store string string through code (modifications trying to follow @mgibsonbr’s recommendations):

Dictionary<string, string> palavras = new Dictionary<string, string>();
for (int i = 0; i < reservedWords.Length; i++)
        (...)
        palavras.Add(reservedWords[i], word);

at runtime a duplicate key insertion exception is generated. Even when I replace this insertion code with

palavras.Add(reservedWords[i].getHashCode(), word);    

I get the same error message.

  • I understand the mistake you’re making, but don’t understand exactly what you want to do? You have a set of keywords and also another set of words to be tested?

  • Exactly that

4 answers

3

Techniques for handling collisions

The class Dictionary<TKey, TValue> do. Net does not allow null or duplicate keys. You will have to deal with these occurrences in the most appropriate way for their use. There are several techniques that depend on the behavior you want. In case of use of duplicate keys, I highlight the following:

  • Substituting: if you want to add an item to the dictionary, replacing the key if it already exists:

    dict[chave] = valor;
    
  • Add if there is no:

    if (!dict.ContainsKey(chave))
         dict.Add(chave, valor);
    
  • Recover if existing or added: if you want to add a key/value, only if the key is not yet occupied, and if you are busy then retrieve the existing value:

    TValor original;
    if (!dict.TryGet(chave, out original))
        dict.Add(chave, valor);
    
  • Adding multiple items to the same key: if you want to add multiple elements in the same key, you would actually have to use a list dictionary along with the add or recover technique:

    List lista;
    if (!dict.TryGet(chave, out lista))
        dict[chave] = lista = new List();
    lista.Add(valor);
    

Some potential issues I noticed in your code

  • Do not use the method GetHashCode() as dictionary key. This method will already be used internally by the class Dictionary<TChave, TValor> in all keys, and serves to streamline the process of locating elements within the dictionary.

  • Your struct does not implement IEquatable<T>, and nor is a past IEqualityComparer<T> for the dictionary. Because of this, the hash will be extracted from the first field in your struct, that is the field command... Is that really what you want? It seems very reasonable to me, but still it’s good to note it.

  • Your struct has fields that can be changed after inserted in the dictionary. A type that is used as a dictionary key, cannot have its hash changed, and the recommendable is that the struct be immutable.

  • Miguel, I want the hash to be generated only by the first field of the struct, because I’m using this hash to validate commands. I would like to know how to make the first field of the struct immutable. I haven’t programmed in C for a long time#

1

Well, one way to check collisions before you include them in the dictionary is to use the method ContainsKey de Dictionary. This method allows you to check whether there is such a key for each dictionary item. Example with an extension method:

public static bool AddSafe<TKey, TValue>(this Dictionary<TKey, TValue> dictionary, TKey key, TValue value)
{
    if (!dictionary.ContainsKey(key))  
    {
        dictionary.Add(key, value);
        return true;
    }  
    return false;
}

0

There are some misconceptions here, we need to solve them before thinking about the most appropriate solution:

  1. Is there even a "collision", or your problem is "duplicate keys"?

    If your data mass has for example the key "a" mapping both the 1 as to 2, then the problem is not hash collision. Collision would be if we had "a" mapping to 1 and "b" mapping to 2, but hash(a) == hash(b).

    3 collisions at 59 inputs does not seem "very few" to me, but an excessive number, especially if the hash function is well done (as I believe to be the case with getHashCode). Which is why I suspect your case is duplicate keys, but that’s only you can confirm. If that is so - and it is normal in your application for a key to point to the same value - then in fact a solution like your proposal (mapping key to a list) would be correct.

  2. If there’s a collision, whose responsibility is to treat it?

    Collisions can occur in any hash function, and the libraries that make use of them are (or should be) prepared to handle them. In the above example, if you insert "a" and "b" into a Dictionary and there is collision between their hashes, it is obligation of the Dictionary do something about it, not yours. Unless that library is much broken (I have no practical experience with C#, but I doubt this is the case) this will be handled transparently for the programmer-user [of the library, i.e. you].

    Your example above leads me to wonder if it was really a Dictionary<int, int> that you wanted. If your keys are strings, it wouldn’t be the case to use Dictionary<string, int>? For if you call the method getHashCode of the string manually, and uses its result as key (and not the original string) so you are transferring to you a responsibility that would be from the library (dealing with collisions). As far as I know, there’s no reason to do that...

If after reading the above you still need to handle collisions manually, update your question in more detail and I will try to guide you better about your options (chaining, rehashing...).

  • As you commented, I edited the post adding more details about the issue. In this case, the hashcode is generated according to the contents of the strings, as I have no duplicate strings in my data mass, I have collision problems and no duplication. I really found it bizarre that Dictionary launches an Exception every time I try to insert a duplicate key (collision) and not treat it for me, but I also did not find any better solution. Even the above colleagues have pointed out that my approach is satisfactory.

  • @lfelix I don’t understand. The code you posted shouldn’t even compile! The generic type of palavras is string, so it should accept only strings. You didn’t mean Dictionary<reservedWords, string>? (or maybe palavras.Add(reservedWords[i].command, word)). Also, it would be nice to have a list of strings that are colliding for us to test - otherwise it is difficult to reproduce the problem (you have few strings, but they are too big?). You could create an example on ideone that shows the error happening?

0

Another suggestion was to build the Dictionary the return of the own reservedWord struct, after running it with IEquatable<reservedWord>

public struct reservedWord : IEquatable<reservedWord>
{
  public string command;
  public MyType token;

  public override int GetHashCode(){ return command.GetHashCode() ^ (int)token; }
  public bool Equals(reservedWord other)
  {
    // seja qual for o teu criteria de egualdade...'case-sensitive'?
    return String.Equals(command, other.command) && token == other.token;
  }
}

and then

var palavras = new Dictionary<reservedWord, string>();
var reservedWords = new reservedWord[] { .... }; // lista de struct 'reservedWord'
for (int i = 0; i < reservedWords.Length; i++)
{
    (...)
    if(!palavras.ContainsKey(reservedWords[i])
      palavras.Add(reservedWords[i], word);
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.