What is required to achieve maximum entropy?

Asked

Viewed 605 times

12

I’ve been studying a little bit about random numbers and hashes, Yet something somehow still confuses me. In several groups related to cryptography I read about people talking about the addiction of algorithms and problems to generate "chaotic" numbers, i.e., truly random.

The content of that answer for Onosendai made me understand a little better the meaning of entropy. I also saw something about the importance of "white Noise" to create numbers truly random.

As indicated in the reply, post by Bo Allen rather cites the difference between Pseudo-Random and True Random, but in the example it references the combination of rand() used in the PHP and the combination "very bad" with the operating system Microsoft Windows. It also indicates that the result of the same function in Linux produces a much less predictable result than the previous experiment.

It is necessary to use "noise" to generate true randomness?

It is possible to achieve "absolute entropy" within a system? Or the maximum entropy would be similar to an asymptote curve?


** I migrated some doubts to your own questions:

Algorithmic predictability in random number generation

  • 4

    Not in order to answer, but just to get to the point: entropy is directly associated with unpredictability, that is, total entropy you will get when all the entropy bits are really random (unpredictable). The problem in some generators is precisely that they start from algorithms and/or sources where there is some expectation of the value generated over time (such as pseudorandom numbers, which are generated with simple formulas). I hope that someone with experience in the field has time to elaborate something cool, this is a subject where the guy can not be "parachutist".

  • 1

    White Noise is generated by a random function and within predetermined limits, when generating noise you will be adding more randomness, if you look at the equation for entropy calculation in this answer here, if you make the equation and check the calculations for a totally heterogeneous vector, you will see the result, I made here just out of curiosity a vector [1 0] you think entropy is zero? are totally different aren’t you? using the equation will have a entropia = 1.4430e-12that is equivalent to 0.000000000001443, complex arrive in the 0 absolute.

  • 1

    @ederwander interesting, was what I imagined about "absolute entropy" at first, that maybe it would be like an asymptotic function, approaching zero while tending to infinity, but never intersecting the axis... complex or impossible? :D

  • 1

    I’ve never asked myself about absolute entropy until this very moment, I use entropy for other purposes, I’m tempted to say that absolute entropy is impossible, just out of curiosity again right, i tested random inputs within the equation and it behaved asymptotically every time, sometimes positively others negatively, the fact is that all responses to the inputs permeated the axis x without reaching the 0 absolute...

1 answer

12


There’s a big difference between perfect security (or perfect confidentiality) and semantic security. The first is more of theoretical interest, and in this context, you can’t "generate" randomness - either you have truly random numbers or you don’t have them (and if you do, you can only use them once and then you have to discard them). The second, of practical interest, concerns only what can reasonably be expected of a computational process that operates in polynomial time. The concept of entropy in this case is the same, but the use of entropy is quite different - and in this case yes, one can achieve enough security from a small amount of entropy.

(Note: Replace "security" with "unpredictability" if your focus is other than encryption - for example, ensure randomness during a scientific simulation)

In theory

A good way to illustrate what entropy is is through an example. Consider the following bit sequence:

01001101010011010100110101001101010011010100110101001101010011010100110101001101

I used 80 characters to describe it, but I could "compress it" for example as follows:

01001101 repetido 10 vezes

Which only takes 26 characters. I could continue to look for more succinct ways to describe this sequence, until it reaches a point where it is not possible to compress more, because it would be in the most compact form possible and that still describes only this same sequence (i.e. unambiguously, a form that cannot also describe a different sequence). If this form uses, say, 10 characters, then I can say it has 10 entropy characters.

(you can convert this measure to bits if you want: log23710 = 52 bits of entropy, assuming a "character" is a letter, number, or space)

What does this mean? Why is this the entropy of this sequence? It’s simple: if someone wants to arrive in the same sequence from nothing, all they need to do is generate all possible arrangements of 10 characters and one of them will describe their sequence.

Intuitively, one can understand why entropy is linked to the concept of "unpredictability". Suppose I showed you just a piece of that sequence:

010011010100110101001101...

Observing well, you can see that a pattern appears, and although there is no guarantee that the sequence continues following this pattern (the next bit could be a 1) it is still a good "kick", it is preferable to test this hypothesis first instead of trying by brute force all possible 80-bit sequences.

In practice

A pseudo-random number generator usually starts from a random seed and then goes on to "produce" new numbers through a well-defined process, in the hope that these numbers will prove unpredictable. But are they really unpredictable? Theoretically, if you know that your seed is a 32-bit integer, and that the sequence is generated by encrypting the natural numbers in order using that seed as the key (e.g.: flow ciphers), so by observing the first generated number it is already possible to predict the next (and similarly, all others):

  1. Create a list of all 232 possible seeds;
  2. Encrypt the number 0 using each of these seeds as a key;
  3. Compare with the observed number, thus discovering which is the right seed;
  4. Encrypt the number 1 using the correct seed; you just predicted with 100% certainty the next sequence number.

That is, from the point of view of perfect security, after you generated the first number you have already "spent" all the entropy of the seed, and therefore should not use it again (totally or partially) to generate new numbers - otherwise they will not really be unpredictable. That is, the entropy of the entire sequence is only as large as the entropy of the seed, perhaps smaller, but never larger, and you cannot increase it by combining it with white noise or any other source of randomness (we need replace it by this white noise, the original is no longer good for anything).

What about semantic security? Well, in practice test 232 possibilities are quite costly, especially since each test involves a large number of operations. Therefore, even if an opponent observes a number generated by a 32-bit seed, it is still considered that the next number will have an entropy of approximately 32 bits. Only after observing a very large sequence (see birthday attack) is that a reduction in the entropy of the process is considered, assuming that less and less operations are needed to predict the remainder of the sequence.

Repeat cycles

I mentioned that when using a 32-bit key the sequence entropy would be at most 32, but that could be less. This is related to the quality of the PRNG itself. If a 32-bit seed is used to generate 32-bit numbers as well, then by the Pigeon House Principle the largest possible sequence to be generated without repetition has size 232. However, if the generation procedure is not perfect, the numbers may begin to repeat long before a cycle of this size is reached. The example of the function rand() PHP on Windows shows a premature repetition of the numbers generated (or at least a premature repetition of part of them), revealing a pattern. In the worst case scenario, one can even partition the solution space, reaching a perfect predictability after a very small number of observations.

Whether the PRNG is good or bad, the fact is that it will eventually start repeating unless more entropy is added to the system. For semantic security, in general the amount of entropy needed does not need to be very large, since the "loss" is small after each new observation. For the perfect safety, as I mentioned, the loss is always total, and the addition of white noise would be the unique source of system security. But for practical purposes, one can consider that as new entropy is added - taking into account lost entropy - the total would continue to grow, tending to infinity.

In short

Rather, an external source of randomness ("noise") is required to obtain - not "generate" - true randomness, even because this source is solely responsible for any randomness other than the seed itself (and from a theoretical point of view, restricted to the very first use of this seed). And it is not possible to obtain "absolute entropy" in any way, because from the moment one stops adding entropy to the system, it already begins to decrease as the use (slowly, semantically, or very quickly, from a theoretical point of view)and will eventually reach zero.


P.S. I am assuming that any PRNG, cryptographically secure or not, is periodic. I may be wrong in this sense, however this does not change the fact that, for a 32-bit seed, at most 232 distinct sequences can be generated. And although noise accretion does not change this periodic nature, a good mixing algorithm can greatly lengthen that period, while a bad mix may "re-seven" the sequence but keep its period unchanged.

P.P.S. I interpreted "absolute entropy" as an eternal, inexhaustible entropy, which is what I understood based on your comment. If the concept is different, please clarify. Anyway, even without entering into the merit of the "calculus" (personally, I would call "estimation") of entropy, I can still state according to the previous reasoning that entropy is always "spent"and will eventually reach zero unless new entropy is continuously added to the system.

  • 2

    +1 perfeição & semântica

  • 1

    I knew you’d answer. + 1

Browser other questions tagged

You are not signed in. Login or sign up in order to post.