What are INDEX, B-Tree, hash, Gist, and GIN?

Asked

Viewed 5,885 times

7

In the the Postgresql Manual has the following excerpt :

Postgresql provides the index methods B-Tree, hash, Gist, and GIN.
Postgresql provides the B-Tree, hash, Gist and GIN index methods.

But after all what are these methods? and what are their differences?

1 answer

13


B-Tree

The B-Tree, or some variation of it, is the most common in all database systems. It is very efficient for almost all common use cases. It is a balanced tree which allows all types of access (reading, inserting removal, anywhere) in time O(log n) (is a little more complicated than this, but so gives an idea), which is very fast always, with very large or very small volume of data, but it matters even when it is large. It is reasonably space efficient and keeps data orderly, allowing sequential and track reading efficiently.

Hash

In cases where you don’t need the order or the sequential search and the keys are unique (where the programmer guarantees this, the index cannot), avoid collision and have large volume of data, the index can be more efficient using a calculation function of hash. They have time O(1). But this does not mean clearly faster. The function calculation time may be longer than some iterations in a binary tree (and there are usually few). Does not usually work well on disk or other secondary storage.

To structure of hash may still have some difficulties when there are collisions and linear searches (O(n)) are necessary (although rare, unless the keys are not unique). But you can also "complain" about wasting extra time on B-Tree when you need a rebalancing although this occurs under more specific circumstances. In all cases it depends on the algorithm used and the data pattern.

There are cases, but it is rare, have gained expressive by its use. There may be gains in certain types of JOIN very complex. Just testing to make sure it’s worth it.

He is not crash-safe, nor is it replicated, which makes more cases impossible.

GIN

The GIN (Generalized Inverted Index) allows a key to have multiple values, i.e., to point to multiple lines in the data table. This is important when the same key can be present in multiple data items. In some cases having such an index (B-Tree allows this too) can be more efficient in space and access time. Do not expect miraculous gains.

A good use is for searching free texts (indicate where determinas words are present.

Another use is when a specific type of data created by the user (programmer) in the database has a multiplicity feature.

In a way we can consider a lower level index. Something almost internal.

Gist

The Gist (Generalized Search Tree) is a more abstract, more internal form where several access methodologies can be implemented. It is a balanced tree, but does not have many rules. These should be specified at a higher level, probably for some specific type created by the user.

Completion

In short, if you don’t have a very strong reason, stick to the traditional index.

  • What would be crash-safe?

  • If you stop in the middle is no problem.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.