Hash, Object or Dictionary?

Asked

Viewed 1,099 times

4

I have been studying python and ruby and have seen that several other languages as well as these two make use of different terminologies to determine a ' Json object.

var objeto = { 'name' : 'Darth Vader' }.

The point is, I’ve seen them call it in ruby hash(Which reminds me of cryptography ), javascript of objeto and python of dicionário.

  • The terminology changes ?
  • Why not approach a universal terminology since the structure is practically the same?

Obs: I gave the example in json, but also understand how primitive type of each language in the context of the question.

1 answer

5


I cannot answer why the terminology is not universal (can anyone?). But I can try to explain where each term used comes from. I don’t have a preferred term for this data structure, and it varies depending on what language I’m using, but I’ll call it here "dictionary".

In Javascript (I will use Javascript in the answer because JSON means "Javascript Object Notation"), it is called object because the language designers wanted to implement a type of object orientation, and the simplest way to do this in a dynamic language like Javascript is to use a dictionary, which relates the name of the members of the object (its functions and attributes) to its values. Javascript objects are restricted to using strings or numbers as a key, but can have anything as a value. By the way, the terms "key" and "value" are universal, no matter the language.

Python, another dynamic language, followed an idea similar to that of Javascript to implement object orientation: its objects use a dictionary to store the relationship between the names and values of its attributes, but did so explicitly (which is more organized, in my opinion). Instead of every dictionary being a Python object, there are dictionaries (dict and notation { 'name' : 'Darth Vader' }), and any dynamic class defined in Python will contain a special member called __dict__, which is the dictionary that stores the dynamic members of the objects of that class. So I can access a member of an object in two different ways:

class Character:
    def __init__(self):
        self.name = 'Darth Vader'

dv = Character()
print(dv.name) # Imprime "Darth Vader" na tela
print(dv.__dict__['name']) # Também imprime "Darth Vader" na tela

I explained this about Python to make it clear why it is called an "object" in Javascript. But if Javascript merges the concepts of object and dictionary into one thing, in Python they are explicitly separated, and there are so many objects that do not depend on dictionaries (search for the attribute __slots__ if interested) how much dictionaries independent of other objects, and can be used explicitly:

>>> d = {'banana': 'amarela'}
>>> d['maçã'] = 'vermelha'
>>> d
{'maçã': 'vermelha', 'banana': 'amarela'}

In Python, the key is not restricted to strings and numbers, any object hashable can be key. And then we enter the next term: hash. An object hashable is one that has defined a hash function: a function that receives the object, and returns an integer value corresponding to that object. Hashes are not necessarily tied to encryption, as you said. MD5, SHA-256, etc. are cryptographic hash functions. They are too slow to use in dictionaries, and are designed to be difficult to invert (given a hash value, it is difficult to figure out the value that generated that hash). For dictionaries, much simpler hashes are usually used, which are not necessarily difficult to invert. They involve some bit-a-bit operations and some multiplications with some constant prime number and you’re done. The important thing is to be fast, and the result is quite random, with few chances of different values result in equal hashes.

Another very common dictionary name is "hash table". Strictly speaking, hash tables are dictionaries, but not every dictionary is a hash table, although hash tables are the most common type of dictionary. Python dictionaries, Javascript and (apparently from what you said) Ruby dictionaries are implemented as hash tables. This term makes explicit the role of the hash function in the dictionary construction.

A dictionary needs to offer a fast mechanism to get a value from a key, and in the hash table this is done as follows: everything is stored in a size vector M > N, where N is the number of elements stored. To store a key value k, just do:

vetor[hash(k) % M] = valor

Remember that k may be a string, but hash(k) is a whole. The operation % (whole split remainder) ensures that the value will be between 0 and M - 1 (you can add up to 1, if you prefer to think of vectors going from 1 to M). The bigger it is M and better (more "random") for the function hash, less chance of two equal keys ending up in the same vector position. But this happens, it’s called "hash collision". When it happens, more than one value is stored at the same position (usually each element of the vector is a linked list that contains all elements with that hash). In such cases, a search for the key element k requires that this list be traveled in search of the right element. For this reason it is said that the average complexity of a hash table is O(1) (constant time, a few arithmetic operations are enough to find an element, in the average case where there is no collision), and O(n) in the worst case of the unlikely situation that all elements of the hash table fall at the same vector position, situation where the hash table degenerates to a linked list.

Another common dictionary term is "map" (map, in Portuguese), because as a dictionary, it gives the idea of allowing something to be found quickly. This term comes from C++, which has the structure "Std::map" (do not confuse with map of Python and Haskell, which is something else), and unlike the other dictionaries, it ensures that the elements will be stored in an orderly fashion, and so is usually implemented as some kind of binary tree (the most common implementation is with a red-black tree), which in practice is not as fast a dictionary as a hash table (it has O(log n) complexity), but has the advantage that the elements are sorted by the key. In 2011 the "Std::unordered_map" structure was included in the C++ language, which does not guarantee the order of the elements, and therefore can be (and is) implemented as a hash table.

I hope I have clarified the origin of all these terms.

  • 3

    Another name used for the same data structure is "array associative" or "table (associative)", which is used in Moon.

  • Cool way they found to get object orientation in Javascript. Python in turn contains very elegant modules that would do what Js did best ( In my opinion - with the idea / implementation of Named Tuple for example ). Made very clear and thank you for the excellent response !

Browser other questions tagged

You are not signed in. Login or sign up in order to post.