What is the difference between a map, a dictionary, an associative array and a hash table?

Asked

Viewed 764 times

17

These terms refer to structures that have very similar, if not equal, characteristics. They often seem to be used interchangeably. This would just be different nomenclature depending on the technology or has a real difference between map (in general unordered, makes a difference?), Dictionary, associative array and hash table?

Can one be specialization or implementation of the other? Which?

If they, or some of them, are equal and only the name changes, do you know if there is any reason to have this fragmentation of nomenclature? It is possible to explain the names being different?

Wikipedia seems to indicate that it is the same thing (almost all), but I do not know if it was not a poorly written article, so just copy what is there will not have much value. But to base with different sources, even demonstrating incoherence in use is very useful and creates knowledge.

See:

  • 6

    Only answer if you understand the difference well, an answer that "thinks" is not good. If nothing good comes up I will answer later.

  • 4

    The nomenclature "associative array" was used to access variables in non-numeric "positions". For example, in Bash, ${var[1]} accesses the position 1 of the variable var, while ${var[pos]} accesses the position pos of the variable var; and this is called the "associative array". Out of curiosity, there is a hardware with associative memory, in which the memory is not accessed by a "numerical address", but by an index that reflects the desired content.

  • 4

    Funny that Harbour’s "hash" guy {"nome"=>"Alceu", "idade"=>32} is the associative array. I would prefer that they had called even associative array... although because it is Xbase they probably preferred to choose a name for the type with "unoccupied initial" for not to be confused with other types

2 answers

10

Associative array (Javascript, PHP), dictionary (Python) and Map (Java, C++) are the same thing.

They are also called "hash", but this is a metonym, the hash function is the heart of an efficient associative array implementation. But you can implement associative array of other ways.

In Java, Map is the associative array interface. Hashmap is a Map implementation that uses hash, Enummap is an impl. using a vector (sequential array), Linkedhashmap allows access of keys in the order they were inserted, and Treemap allows access of keys in alphabetical/numerical order (implementation is tree-based, not hashed).

  • For those who have also been floating about metonymy: https://pt.wikipedia.org/wiki/Meton%C3%Admia

7


As the question is , let’s focus first on definitions that... do not depend on a specific language, and then move on to the specifics of some languages that are relevant to the discussion.


Definitions (independent of language)

Dictionary x Map

From the sources I consulted, only one gives a definition that goes against the "rest of the world". It’s a subtle difference, but I think it’s worth mentioning.

The "of against"

According to the Dictionary of Computer Science of the University of Oxford 1, the term "dictionary" is defined as:

Any data Structure Representing a set of Elements that can support the Insertion and Deletion of Elements as well as a test for Membership.

Note: the version online only has this phrase. To have the full definition, only acquiring the paid version (ps: I did not acquire).

That is, according to this definition, a dictionary is a set of elements in which you can add and remove elements, in addition to checking if an element is contained in it. It is important to note that no keys and values are mentioned, so commonly associated with the terms of the question.

In this reply from Soen is also given the definition of Mapping (which does not appear in the online version of the Oxford Dictionary then I believe the author of the answer should have the book):

Mapping An Operation that Associates each element of a Given set (the Domain) with one or more Elements of a Second set (the range).

It is worth noting that the term is mapping (to operation to map an element of a given set to one or more elements of another set). Already the map would be a structure that uses this operation to maintain key-value pairs(s). Remembering that implementations may or may not require keys to be unique (however, definitions seem to leave this open).

The same answer already cited defends - based on Oxford definitions - this distinction between terms, and states that a dictionary is only one map if it supports the following operations (which would not be mandatory for every dictionary):

  • store elements with different key/value values
  • query and delete values, using only the key

Therefore, according to the Oxford definition, dictionary would be a more generic structure, which allows adding and removing elements, in addition to verifying whether a given element belongs to the structure, but without necessarily having key-value pairs.

Already one map would be a structure that maps a set of keys to their respective values, in addition to allowing the same operations of the dictionary (add, remove, query), but using the key to do so. It would therefore be a special case of dictionary.

However, the same answer already cited recognizes that this distinction is not the "common understanding", since most seem to believe that dictionary and map are synonyms, and therefore the terms would be interchangeable (including other responses to it question follow this line).


The "common sense"

I also consulted the Dictionary of Algorithms and Data Structures of NIST 2 (quoted in this answer) and there the term "dictionary" is defined as:

An Abstract data type storing items, or values. A value is accessed by an Associated key. Basic Operations are new, Insert, find and delete.

Important to note that it is said that "dictionary" is a ADT (Abstract Data Type), that is, it only says what the structure does (it stores values that are accessed by their respective keys, in addition to being able to add, remove and search for values), but not how it is done (it gives no detail about the implementation). And contrary to Oxford’s definition, it says that a dictionary associates (or maps) keys with their respective values. Therefore, there is no separation between Dictionary and map, so much so that the definition of map of NIST is:

map: see Dictionary

Consulting other materials (1, 2, 3, 4, among many others similar), the impression I get is that the vast majority seem to disagree with Oxford and define that the dictionary obligatorily does the key-value mapping, and that values are always added, updated, removed and searched from the key.

Many fonts still add the fact that the order of the keys is not guaranteed (in this case it would be a Dictionary/map unordered) - that is, if you iterate for the keys, they can come in any order, which varies depending on the implementation. However, in many definitions there is also the separation of dictionaries into ordered and unordered, indicating whether keys should be kept in a certain order, which also depends on the implementation: it may be the order in which they were inserted (Insertion order), alphabetical order or any other criterion (in this case it would be Sorted - "classified" instead of "ordered"), etc. In case it has some order or classification defined, it is called ordered/Sorted Dictionary/map (some do not distinguish between ordered and Sorted and end up choosing only one of the terms).

Anyway, the idea of which dictionary is map are the same thing is so widespread that it seems to be the "common sense". Based on the sources already cited (and several others that I didn’t mention because they were redundant), it seems to me that both refer to an abstract data structure (ADT, just says what it does, but does not impose implementation details), that maps keys to their values, and has operations to add, update, remove and search elements (or check if an element exists), using the keys to do so.


There is yet another term: the book Algorithms, sedgewick and wayne, describes the term Symbol Table (Symbol Table), which by definition, is basically "the same" as dictionary/map:

The Primary purpose of a Symbol table is to Associate a value with a key. The client can Insert key-value pairs into the Symbol table with the expectation of later being Able to search for the value Associated with a Given key.

That is, in addition to having the key-value mapemanet, the book also mentions the basic operations (add, remove, search/check if it exists), in addition to describing several options that can be chosen by the implementations, such as the type of keys, if it allows duplicate keys, null values, if keys are in a given order (Ordered Symbol Tables, which would allow additional operations, such as obtaining the value of the larger or smaller key, or of keys that are in a certain range), etc.

Finally, other sources also mention the terms "table", or "associative container" as synonyms (although "table" gives the impression that it is exposing an implementation detail, according to the next item).


Table Hash

A table hash is a way to implement a dictionary. NIST, for example, defines hash table thus:

A Dictionary in which Keys are Mapped to array positions by hash functions.

That is, it is one of the possible implementations of a dictionary (since it can also be done with trees, for example). In this case, it uses a table hash, which in turn uses a function of hash (which calculates a value from the key, and this value is called hash, see here for more details).

Of course, implementation can also vary, as there are many ways to create a hash, of treat collisions, etc..


Associative array

In the wikipedia article, the terms "Dictionary", "map" and "associative container" redirect to "Associative array", which gives the impression that this would be the term "correct" and that the others would be synonymous.

However, there has been discussion about the name of the article, suggesting to change it to Map or Dictionary, arguing that are more commonly used terms than "associative array" (which in turn would be "less popular" because "only PHP uses" complain to those who started the argument, not to me). But apparently it did not go forward, since the name of the article remains "Associative array".

After all, it would be the same as a dictionary/map, just one more term to refer to the same thing. However, one can argue that "array" actually is exposing an implementation detail, and that therefore would not be an ADT, or that having "array" in the name is confusing, since it is another completely different data structure (this was one of the arguments to defend the change in the name of the article, incidentally).

THE NIST define "associative array" in this way:

A Collection of items that are Randomly accessible by a key, often a string.

Which seems similar to a dictionary to me, although there’s only one explicit mention of the query operation. But I believe the others are implicit, after all, if you couldn’t even add it, it would be a very limited utility structure.

Apart from Wikipedia, the only source I found that mentions this term (as an abstract and language-independent concept) is the quoted book Algorithms, when talking about the specific case where a symbol table does not allow duplicate keys:

Duplicate keys. Only one value is Associated with each key (no Duplicate Keys in a table). When a client puts a key-value pair into a table already containing that key (and an Associated value), the new value Replaces the old one. These Conventions define the associative array abstraction, Where you can think of a Symbol table as being just like an array, Where Keys are indices and values are array Entries.

That is, even if the term is mentioned, it refers to a specific type of dictionary (and even gives me an impression that there is leakage of abstraction, by explicitly citing a table and its correlation with the array).

Therefore, I am not sure that, conceptually speaking, "associative array" is an equivalent term to "dictionary" or "map" (unless certain implementation ensures that it is, for its specific context).


Given everything that was said above, it seems to me that "common sense" is that the concepts of dictionary and map are accepted as synonyms (ADT, map key-value, basic operations), while associative array is somewhat more debatable. In the background, it depends on who you ask, but the links are there and each one who draws their own conclusions.

The term "table hash" refers to the implementation.


Implementations

I am not going to put all languages, I will just cite some cases that I found pertinent. If you want a more complete comparison, you have here.

In general, I see that each language gives the name it wants, and does not always correspond to the above definitions. And as the definitions of the concepts themselves also differ at certain points, it seems to me - unfortunately - natural that the implementations also do so.

Remembering that everything that is said below about languages refers to current implementation (except where earlier versions are mentioned), consulted on the date of this reply. Because they are implementation details, they can perfectly change - often without warning - in future versions.


Java

In Java there is a abstract class Dictionary (obsolete) and a subclass Hashtable, which shows that at least conceptually there was a concern with nomenclature (since a dictionary is an ADT and one of the ways to implement it is with tables hash).

However, the recommendation is that any current implementation use the interface Map and its various implementations. The most common (at least is the one I see being used most frequently) is the HashMap, that according to the documentation (and as the name says) is implementing with a table hash and does not guarantee the order of the keys. But there are also implementations that guarantee the order, such as TreeMap (which is implemented with a red-black tree and uses the natural order of the keys, but you can also customize the ordering using a Comparator) or LinkedHashMap, which keeps the keys in the order they were inserted (using a table hash for the dictionary and a double-linked list to maintain key order).

That is, Java opted for the term Map (<speculation>perhaps because Dictionary had already been used, and after this class became obsolete, they ended up having to use another term</speculation>), whose implementation conforms to the definitions seen above. Of course there are some specific details (some classes allow keys and/or null values, others not, etc), but in general, it seems to me not to deviate from the definitions.


Python

In Python there is the dictionary (dict), which seems to be in accordance with the above definitions (has the operations to add/remove/update/search keys, which are mapped to the respective values).

One detail is that from Python 3.7 it is guaranteed that the keys will be in the order in which they were inserted (Obs: in Cpython 3.6 this was an implementation detail, but from Python 3.7 this is guaranteed - and approved by the BDFL itself - for all language implementations - see more here).

It is worth remembering that there is also OrderedDict, which, in addition to keeping the keys in order of insertion, has other functionalities, such as the option of move a key to the end, in addition to taking into account the order in the comparisons:

d1 = { 'a': 1, 'b': 2 }
d2 = { 'b': 2, 'a': 1 }
# chaves em ordem diferente
print(d1.keys()) # dict_keys(['a', 'b'])
print(d2.keys()) # dict_keys(['b', 'a'])
# apesar disso, os dicionários são iguais
print(d1 == d2) # True

# com OrderedDict, é diferente
from collections import OrderedDict
d1 = OrderedDict({ 'a': 1, 'b': 2 })
d2 = OrderedDict({ 'b': 2, 'a': 1 })
print(d1.keys()) # odict_keys(['a', 'b'])
print(d2.keys()) # odict_keys(['b', 'a'])
# chaves não estão na mesma ordem, dicionários não são iguais
print(d1 == d2) # False

Internally, one dict is implemented with a table hash.


C#

In C# we have the class Dictionary, representing a collection of keys and values, does not guarantee the order of keys and is implemented with a table hash. There is also the SortedDictionary, which sorts keys and is implemented with a binary tree.


Javascript

Historically, objects have been used as dictionaries/maps, since they have the necessary functionalities: I can add, delete and get values from a key, and test its existence. Ex:

const obj = { 'a': 1, 'b': 2 };
// atualiza a chave e obtém o valor
obj.a = 3;
console.log(obj.a); // 3
// se a chave existe, apaga
if ('b' in obj) delete obj['b'];
console.log(obj); // { a: 3 }

It is worth remembering that the use of objects as dictionaries has some points: the prototype of the object may have properties (i.e., "keys") that may conflict with those defined in the object itself:

Object.prototype.c = 3;
const obj = { 'a': 1, 'b': 2 };
// verifica se a chave "C" está no objeto
console.log('c' in obj); // true (pois a propriedade "c" está no prototype)

In this case, the solution to avoid this problem would be to use a Map, which is a specific object for storing key-value pairs, in addition to having all the operations of a dictionary and keeping the keys in the insertion order (the Map did not exist in the early versions of the language, which may explain the preference for using objects as dictionaries, very common until today). See the difference:

Object.prototype.c = 3;
const obj = new Map([ ['a', 1], ['b', 2] ]);
// verifica se a chave "C" está no objeto
console.log(obj.has('c')); // false (não é afetada pelo prototype)

// se usar o operador "in", ele ainda olha no prototype
// por isso, para ver se a chave existe no Map, o recomendado é usar has()
console.log('c' in obj); // true

There are other differences between a Map and a "common object", see here for more details. But in general, the Map seems to me more appropriate, because the impression I have is that it was made especially to be an implementation of a dictionary.

See also:


PHP

In PHP there is the array, which is actually a "all-rounder" structure. According to documentation (with my emphasis):

An array in PHP is Actually an ordered map. A map is a type that Associates values to Keys. This type is Optimized for several Different uses; it can be treated as an array, list (vector), hash table (an implementation of a map), Dictionary, Collection, stack, Queue, and probably more

That is, in the background the array is a dictionary/map, but which can be treated as the "classical" array of other languages (sequential structure with numerical indices), with numerical indices acting as keys. Interesting that the excerpt above treats the table hash (an implementation of map) and the dictionary as two distinct items.

It is also worth remembering that the documentation mentions that "PHP arrays can contain int and string Keys at the same time as PHP does not distinguish between Indexed and associative arrays". Note the use of the term "associative array". In fact, as argued in the Wikipedia discussion - and at least in the languages I know best - PHP seems to be one of the few languages that use this term officially, to refer to a structure that implements the ADT dictionary concept/map.

In Javascript, for example, we find the term "associative array" in the MDN documentation ("If you want to use an ordered associative array in a cross-browser Environment, use the Map Object"), but the language specification does not contain any mention of the term. Other languages seem to prefer dictionary or map, but like I said, I don’t know all of them, so (always) there’s a chance that I’m biased/wrong.

See also:


Ruby/Perl

In Ruby there is the class Hash, that maps key-value pairs and has the operations already mentioned (add, remove, etc). In Perl, the structure that has these characteristics is also called Hash), although there is mention of "associative array" in documentation.

Anyway, this example was just to register another of the many names used to refer to the implementation of a dictionary.


Completion

The terms "dictionary", "map" and "associative array" may or may not be interchangeable, depending on the context. In some circles/communities, some may be more accepted than others, and the names chosen by language may have a strong influence on this.

The most, shall we say, academic definition considers that dictionary and map refer to the ADT (may be identical, or one may be a variation of the other), while "associative array" seems to be more restricted and controversial. Characteristics such as the order/classification or uniqueness of the keys may or may not be considered "mandatory". As for implementation, each language gives the name it wants (some are more "faithful" to the definitions, others not so much).

Already the table hash I would not say that it is interchangeable, because it is a specific detail of implementation.

Anyway, what remains is the tip to read the documentation of each language to know what in fact is each structure, since only by the names can not be 100% sure about what they are. I don’t know why each of you chose a particular name, and I don’t want to speculate, I can only regret that it’s so messy. It shows that indeed It’s hard to name things.


1 - in this case, it is both a dictionary in the noncomputational sense (a book containing the definition of several terms), and in the computational sense (maps each term - which would be the keys - to its respective definition - which would be the values) :-)
2 - idem 1 (but instead of book, it is a website)

  • 2

    One of the things I think, since it has no absolute consensus, is that the dictionary has unique key (dict<word, definitons>) and the map not necessarily (map<word, definiton>, note the subtlety), although it may have. The table and the array are even details of implementation, being the array associative a wrong name to call the table hash (PHP being PHP and others going in the wave). Good languages choose whether they want an ADT or an exposed detail, just to confirm your answer, I think it happens to be one of the best sources on the subject on the Internet, even if it does not close the subject.

  • I think some definitions "formal" cited are very shameless, it seems not well thought out, but it is what has, the research here was good.

  • 1

    I would like it to be: hashtable => unordered (Unique); associative array => ordered (non Unique); Dictionary => Sorted (Unique, but it should have another name for the sorted structure with key duplicity); map => implementation and undefined guarantees, has only key and value, and manipulation of inputs as requirements, and can have more and one input. The map I get more in doubt, who knows it should be the multi Sorted and not have a more abstract. It makes sense?

  • 1

    @Maniero In an ideal world we would have structures with well-defined names that do not vary according to language/technology. It could be your idea, as well as any other variation, as long as it was general consensus, not the mess it is today. But since the chance for someone to fix this and get a consensus common to all is very small, we can only accept that each language will use the terms it wants, and we will turn around :-/

Browser other questions tagged

You are not signed in. Login or sign up in order to post.