What are the differences between the HASH and BTREE algorithms used in an index?

Asked

Viewed 1,340 times

4

I realized that I can create two types of indexes for one determining field in Heidisql, which use the algorithm HASH or BTREE, see below:

tipo de index

See the CREATE code from an example table for the illustration:

CREATE TABLE `pessoa` (
    `nome` VARCHAR(50) NOT NULL,
    `email` VARCHAR(50) NOT NULL,
    INDEX `index_nome` (`nome`) USING BTREE,
    INDEX `index_email` (`email`) USING HASH
)

So I was left with some doubts about these two types of algorithms that can be used.

Doubts

  1. What is the HASH algorithm?
  2. What is the BTREE algorithm?
  3. What are the differences between HASH and BTREE?
  • HASH is the scattering table; given a set of data, they generate any number X, so only consult in the table if there is any element with that number X. BTREE in turn only applies to elements that can be absolutely sorted; instead of consulting a table, refer to a B tree.

  • 1

    Ah, don’t confuse "B tree" with "binary tree"...

  • In this case the BTREE is related to binary tree?

  • Related: https://answall.com/q/220409/64969

  • the relationship between B trees and binary trees is the same relationship between date palms and pines: both are trees. The idea of the B-tree involves pages of arbitrary size and node balancing, so that any non-root leaf has at least half the page filled. More details on the question (and your answers) than I Inkei

1 answer

3


Already answered in the context of Postgresql.

Already I gave details about Btree (OK, I still need to complete).

Already I talked about the code of hash.

Already was answered on the tables hash.

To documentation shows the difference. In written form it seems they don’t want you to use the hash. They only show disadvantages. Which is a very realistic thing.

It is very rare to be useful, even more on disk where it usually forces much more reading. The literal translation of hash table is spreadsheet table, and anything that gets spread is bad for accessing certain media, or harms the cache resulting more Thrashing.

The main disadvantage is only being able to compare equality, which brings other implications such as not being able to maintain order, which brings a cascade of implications.

But if it is in memory and only need to test equality and access to each element is usually individual and there are few key collisions, either because the original data is not repeated or the result of the function hash does not repeat much, then it can be faster than any binary tree or B tree. If you have a lot of writing almost certainly there will be gains in this operation (not in the rare worst cases).

A B tree can have a lot of internal maintenance in tables with a lot of writing, but the reading is always very optimized.

The index hash is only useful if the person understands well all implications and has done tests that demonstrate clear gain. So it is almost hidden and limited to certain Engines mysql.

There is an index hash internal database uses when it understands that it is best to organize query results, but it is a detail that does not matter for those who use Mysql.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.