What’s a B tree?

Asked

Viewed 601 times

11

What types of problem is it suitable for? How and why should it be used? It is possible to include an example of code?

B-arvorismo

  • 2

    This is the legitimate tree B :D

  • 1

    No more to answer, the photo says it all :P

1 answer

7


A tree is a data structure in a hierarchy system with a root, branches and leaves, that is, starting a point point will open different paths according to the data, until arriving at the data you want. This is very useful because it uses what is called divide and conquer. With a certain criterion it is possible to organize the data in groups in such a way that it is possible to select these groups in a structured way, leaving behind the parts that do not matter.

A list is still a tree, totally hanging to one side. A tree cannot indicate a node at a previous level or more than one node point to another.

The most commonly used tree in computation is the binary, where there will always be one or two nodes attached to the node at the previous level. Then goes to one side everything that is greater or equal to the die of that alone and goes to another side everything that is smaller, this way to each analysis can be eliminated much of what has to analyze.

Árvore binária

The problem is that it is possible for the tree to hang more to one side and in the extreme case the tree to be a sequential list. Hence the tree B (B Tree) which is a structure that is made a balance. More than that the data nodes, whether branches or leaves are stored in groups, often called clusters or pages.

Balancing ensures that complexity is O(logN) even in the worst case in all read or write operations. This is usually very close to the constant time (O(1)), because it always maintains few levels in the tree and almost all with two connected nodes not letting it hang to one side. So both the search and the updates on it are very fast, maintaining a classification, as opposed to a structure based on hash which gives slightly better performance in most cases, but without a rating or even order.

The grouping allows a more effective search to be done even with data of variable size and variable number of data nodes, and allows less need for rebalancing, in addition to fitting well with the computer’s memory and disk organization, giving better location to the data.

Rebalancing, when necessary, is not something trivial to be done efficiently.

The structure is not the most space-efficient, but the loss is usually small in most implementations.

Árvore B

In a simple binary structure we can put 4 billion nodes in maximum 32 levels, so in only 32 queries we get what we want (except for queries of data overload, but this is another matter), since properly balanced. With groupings it is possible to reduce this enough, in extreme cases (only boolean) in 17 queries, at most.

When the access time to the data is worse than the time to process it the gain of this type of structure is considerable. Even because reading in mass storage only occurs in clusters. Even though they may have small groupings (4KB), it is common to prefer larger groupings on disks since the initial access time, latency, tends to be large and it pays to read more things at the same time and try to find in this large grouping which is the next grouping to be read more accurately, arriving in fewer steps where you want to reach.

For all this it is the ideal structure to organize databases, mainly the indexes, although the table is an index. Not that it has no flaws, but in general the advantages of this structure outweigh the disadvantages, because the options that do not bring them are much worse. There’s essentially no way to get a better result. You can do some optimizations to reduce the number of levels, because each level is certainly a mass storage reading that is slower than memory.

Obviously anything that looks like a database and goes into mass storage also benefits, for example the operating system file system.

In memory there are advantages of its use in many cases, but having low locality other options may have gained even if you have to analyze more information.

There are tree variations that seek the same goal, but with very large complexity of implementation and other problems. Binary tree variations can improve the way of balancing or the way of storing the nodes meeting a more specific need.

It has variations that work better with the data is already sorted, typical case of sequential primary key, some work better with data completely scrambled.

There are controversies about what B means. We usually understand it as balanced, but it’s just intuition. In fact the terms used in the general definition of the structure are not always consistent.

I’m still going to improve because it’s something that I really like and I’m going to try to set an example, although a legal code will be kind of large and complex.

  • Ainda vou melhorar até porque é algo que gosto muito e vou tentar colocar exemplo, ainda que um código legal será meio grande e complexo. Please! I love your answers :)

  • @Renan and I love yours :)

  • 1

    Still Waiting for the code...

  • 1

    @Jeffersonquesado an hour comes :D, but it’s good to warn, I’ll try this week, but I’m on a tight schedule

Browser other questions tagged

You are not signed in. Login or sign up in order to post.