What is and what is a Bloom Filter for?

Asked

Viewed 382 times

10

I was reading an answer here at Sopt (yes, it’s funny) and I saw about Bloom Filter, I wanted more information about him. Besides what it is and what it’s for, where I can use in practice?

  • You read this https://www.ic.unicamp.br/~francesquini/mc202/files/aula26.pdf

  • @Leocaracciolo yes.

  • I had imagined that yes :-)

  • until I got curious and went looking for it. the link https://www.psafe.com/blog/o-que-e-bloom-filter/ seems to take 99% of the doubts about Loom filter well didactically. da uma olhada la

  • It’s expensive, I went in search of the subject too. Initially it makes no sense to have a structure that uses an inaccurate probabilistic hash and with a loss of performance in increasing the hash. Just doing the asymptotic study to see if it brings benefits in small hashes. Either this, or it assists specific applications that need to eliminate false negatives at any cost. It is to be thought.

  • is a data filtering by deletion where the priority is to discard negatives

Show 1 more comment

1 answer

6


Bloom filter is a data structure created in 1970 by Burton Howard Bloom (hence the name Bloom filter)

It is a probabilistic structure with the following properties:

  • Has space efficiency

  • Allows false positives

  • Does not allow false-negatives

  • Elements can be added to the filter

  • Once added to the filter, elements cannot be removed

  • The more items inside the filter, the greater the chance of false positives

  • Are based on:

    • Hash functions
    • Bit Fields
  • For a good functioning are essential:

    • Several functions of hash
    • Even spreading

Example of practical application:

Your browser checks all the addresses you access to determine if it’s a safe address and to be able to give you a warning if it is not.

  • There are billions of addresses on the Internet.
  • We can assume with a good margin of security that among them there are a few million fraudulent websites.
  • It would be inefficient to make every computer in the world to download a complete list of websites fraudulent.
  • It would also not be very efficient to check a bank of central data for each address accessed.
  • The browser downloads a Bloom filter containing all sites recognized as unsafe
  • At each address accessed, the browser checks if it is contained in the filter.
  • If the filter query indicates that the address is not contained, then surely the address is not on the list of insecure websites and access continues normally.
  • If the filter indicates that it is contained, then it may be that website is in the list of unsafe addresses. In this case your browser makes an additional query to a database on the Internet (this yes with the complete list)
  • Note that if it is a false positive, the only consequence is a little longer delay to access the desired website.

Other applications:

  • Preliminary verification of access rights
  • Avoid recommending articles/pages that the user does not liked or already read
  • Avoid disk readings in databases
  • Check for weak passwords
  • Spell checkers
  • Among others

Source: https://www.ic.unicamp.br/~francesquini/mc202/files/aula26.pdf

  • In passing, this slide quoted is very bad... the content may not be, but it is difficult to read with that background image and color scheme. Other than that I found the subject interesting... Very good answer.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.