Why does bitmap indexing work well for low cardinality domains?

Asked

Viewed 115 times

3

I have just read chapter 29 of the book Overview of Data Warehousing and OLAP

On page 725 the author talks about indexing techniques to support high performance access, this technique is called bitmap.

That is how the doubt arose and consequently the question.

Source: Database Systems 6th edition, authors: Elmasri, Ramez Navathe, Shamkant B. Year: 2011 Chapter 29, page 725.

  • Did the answer resolve what was in doubt? Do you need something else to be improved? Do you think it is possible to accept it now?

1 answer

4

In fact it depends on the implementation and the objective of the index.

Generally the goal is to reduce the space occupied in the index and give better performance, in addition to maintaining a structure similar to a array for key access, which also increases performance.

Ideally such an index should be only two possible values, so a single bit is sufficient for each key entered in the index. This is very efficient.

If you have 4 possible values you already need 2 bits. For 256 possible values you need 8 bits.

As it increases and approaches the key size of a B-Tree index, the efficiency gain becomes smaller. At any given time the key will be equal to or greater than if you used a B-Tree.

But it has implementations that need 1 bit for every possible value. This is because each key has a complete structure with 1 bit for each table entry, so if it has 256 possible values there will be 256 keys, each with a size equivalent to the number of table rows divided by 8, that is to say 32 bytes for each entry, is clearly worse since an entry in the B-Tree index will hardly have more than 25 bytes even in a complex implementation and that meets an absurd amount of lines (usually will be much smaller than this).

There are ways to compress this, but the gain will depend on the distribution of the values, and you lose the positional readability as if it were a array, complexity O(1).

Depending on the implementation and distribution it can still compensate even in very high cardinality, for example have all unique values.

Another gain is when you need to confront two or more indexes mapped per bit. Simply apply simple Boolean algebra to the whole index and you get the expected result.

As the query is always made according to the position of the row in the table it is complicated to use for other classifications and may not have won direct. It has optimizations for this combining with B-Tree that minimize this problem.

The greatest gain of this type of index is when it stores the result of a Boolean expression of a column-based condition. It is rare to need this for use in applications. But the JOIN and other database operations can benefit from an internally created index to control what it should take in that query based on the results of two or more selections made before. The existence of such an index can optimize the query because it does not have to generate the index on time. On the other hand it makes writing worse by having to update one more index.

A good database will know when to use an index bitmap even if it already exists and when you prefer another way.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.