Compute secure data randomly

Asked

Viewed 568 times

26

Random functions are not entirely random in computation. I wonder if there is a safe way to generate a salt, or any other random string safely, without using external hardware.

Can randomization be achieved without the use of external hardware, for example by using the time it takes for a process to be done? Isn’t that something that’s intrinsic to the quantum states of the processor materials? If not, theoretically, in the smallest computable unit of time by a computer, an exactly equal process is always calculated at the same speed.

3 answers

16

Before directly answering your question, I would like to establish some parallels that will help in understanding the answer.

First, evaluate the sequences below and their corresponding formulas:

01010101010101010101010101   f(X) = NOT X
AAAAAAAAAAAAAAAAAAAAAAAAAA   f(X) = "A"
ABCDEFGHIJKLMNOPQRSTUVWXYZ   f(X) = CHAR(ASCII(X) + 1)

Without much work, we come to the conclusion that the predictability (or deterministic definition) of these sequences is very high - or, conversely, that entropy is very low.

But what is entropy? It is the measure of chaos in a system. The term was originally created to describe thermodynamic systems, but the concept is also applicable to other domains - data, for example.

When we talk about the generation of random or random content by computers, we are talking about formulas that generate values that have a distribution similar to that found in a system with high entropy and continuous uniform distribution.

An example of easy-to-see uniform continuous distribution is white noise, where the distribution is seemingly impossible to describe with a deterministic formula - but where we can use statistics to describe density. This is a white noise bitmap generated in Random.org:

inserir a descrição da imagem aqui

For comparison, this is the bit map of PHP’s Rand() function as demonstrated by developer Bo Allen in a 2012 post on his personal blog entitled Pseudo-Random vs. True Random. Note how easily you detect the generation pattern:

inserir a descrição da imagem aqui

While in nature systems lose order and gain entropy, the reverse occurs in data systems. Whenever you 'generate' random numbers, you are stealing the entropy system, and entering order.

As an example, let’s assume that I have the following random string of letters via the Random.org Random String Generator:

Chave
JPVPUUWWJAZEEUMLXDVT

Which I use in a simple encryption formula, where I 'add' the variation relative to the letter A of each position when applying to a letter of my payload at the same position.

Payload               Conteúdo encriptado
AAAAAAAAAAAAAAAAAAAA  JPVPUUWWJAZEEUMLXDVT
BBBBBBBBBBBBBBBBBBBB  KQWQVVXXKBAFFVNMYEWU

But note that if my payload is equal to or greater than the key, I will reset the system entropy. So, assume I concaten my key, for the following payload:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

My encrypted content would be:

JPVPUUWWJAZEEUMLXDVTJPVPUUWWJAZEEUMLXDVT
^^^                 ^^^

I can easily detect the repetition and predict the rest of the sequence.

From a safety point of view, safe random functions are those that periodically recharge with entropy, in order to prevent predictability.

You can recharge entropy in many ways; The best source of entropy is the real world. Some examples, which can be used in conjunction with a pseudo-random function in the form of Seeds (seeds):

  • Access the trending Topics from Twitter. Take the last 128 tweets generated. Extract the day and time of each, convert to a byte array.
  • Capture images from 2 or more public webcams around the world. Extract MD5 from all. Convert to one byte array.
  • Let your cat walk on the keyboard. Convert the generated characters to a bytearray. (Add a hamster to the system for more data. Prevent the abandonment of the system’s scope with a box around the three.)

Each of these examples provides different sample size and rate. The more random source data you enter into a hybrid system with a connected pseudo-random generator, the smaller the pattern detection changes.

The answer, therefore, is nay: You need to import entropy from an external system.

Sources:

http://www.random.org/strings/? num=10&Len=10&upperalpha=on&unique=on&format=html&rnd=new

http://boallen.com/random-numbers.html

Edit-Disclaimer: Added, directly in the text, the mention to the post where the function image RAND() was withdrawn.

  • 1

    I withdrew my vote to close (as duplicate) because this answer made much worth the question. :)

  • 1

    I really liked the answer,clarified many things to me, opened a door to the solution of the question, don’t computers have a little entropy? Because your internal clocks usually go out of sync. This dyssynchrony is a sign of an even low entropy (given its unpredictability) or some effect unknown to me?

  • Functions for randomness generation are made to be fast. My question ignores this need.

  • @Weslleycxsardinha You’re right - yes, some entropic features are present in modern computers. However, several suffer some kind of treatment in the hardware layer, not being exposed to any type of consumption or inference by the software. Processing time, however, is a possible inference, as you pointed out. So is the photoelectric effect. Still, you are importing entropy from an external system.

  • 1

    Yes, external system, but no external hardware. Clarified my doubts on the subject.

  • 1

    Good answer, I only have two things to comment on: 1) Your example of "simple encryption formula" is a vigénere cipra, totally "broken" in the present day; I believe you tried to use it as a one-time-pad (and if so, his comment on the size of the key being smaller than the message is totally correct), but ended up making a "two-time-pad" when it combined so much with A* as to B*... In my opinion your answer would be better without this example, it adds nothing, and the rest of the answer is already great as it is.

  • 3
    1. Contrary to what our common sense says, add entropy to a system can get worse your safety!!! I do not have enough knowledge to opine on the subject, but anyway Ugero avoid sources of entropy that may be under the control of an opponent (depending on your level of "paranoia", avoid collecting entropy from the internet, no matter how innocuous the consulted data seems).
  • @mgibsonbr I appreciate the comments - regarding (1), the focus was on demonstrating the introduction of entropy, while the second part demonstrates the exhaustion of entropy, making the system perfectly predictable. The given examples should not be followed as algorithm recommendations. The need for introduction from an external system is still necessary, and its caveat in (2) is coherent; the examples assume that entropy is being guaranteed for the mentioned system.

Show 3 more comments

16


Use external hardware

Cannot generate fully random numbers without external hardware.

As already indicated in the comments of your reply, could use either the Lavarand

If you enter the sites that use Really Random services (Random.org or Hotbits) use external hardware to ensure that the numbers are random:

Random.org: uses noises in atmospheric signals (reference).

Hotbits: uses the decay of radioactive particles (reference).

I can’t see why use external hardware, like the Random.org API could bring you security problems.

If you insist on not using them:

As you requested (examples in PHP or C++), I will post below the best way to get pseudo-random numbers in PHP:

<?php
mt_srand((double)microtime()*1000000);

echo "<b>mt_rand() com mt_srand()</b><br><br>";


for($i = 0; $i != 5; $i++)
{
    echo mt_rand(0, 100)."<br><br>";
}
?> 

The mt_rand() is much higher than srand() for using the Marsenne Twister is probably one of the best pseudo-random number implementations that exists.

If you want to compare with other forms in php:

<?php

echo "<b>rand() sem srand() (semente/alimentação)</b><br><br>";

for($i = 0; $i != 5; $i++)
{

echo rand(0, 100)."<br><br>";

}

srand((double)microtime()*1000000);

echo "<b>rand() com srand()</b><br><br>";


for($i = 0; $i != 5; $i++)
{

echo rand(0, 100)."<br><br>";

}

echo "<b>mt_rand() sem mt_srand()</b><br><br>";

for($i = 0; $i != 5; $i++)
{

echo mt_rand(0, 100)."<br><br>";

}

mt_srand((double)microtime()*1000000);

echo "<b>mt_rand() com mt_srand()</b><br><br>";


for($i = 0; $i != 5; $i++)
{

echo mt_rand(0, 100)."<br><br>";

}

?>

EDITED:

I found a bit suspicious the images of the answer accepted and decided to take the test:

Random.org

inserir a descrição da imagem aqui

Rand()

inserir a descrição da imagem aqui

mt_rand()

inserir a descrição da imagem aqui

The code of this test is in gist the experiment was performed with PHP 5.3 and can also run online (without having to install anything on that website).

  • I liked the reply @Kyllopardium. Let me travel a little: We could use as input a variable available in the environment, which is unpredictable, for example, the third part of the client IP multiplied by the fourth part of the server IP ?

  • 1

    Yeah, by doing that will target data entropy which will make your numbers more random but even so I suggest combine with the microtime in something like mt_srand((double)microtime()*$IP);

  • Yes, exactly as I thought: mt_srand((double)microtime()*($IpClient[2] * $IpServer[3]));

  • PS: I suggest not to keep the server ip in multiplication if it is static.

  • 1

    How about mt_srand((double)microtime()*($IpClient[rand(0,3)] * $IpServer[rand(0,3)])); ?

  • Beware. If you use the server time to generate the seed, then the client can find out what that seed is (if you give a 5 second (long) error margin there is only 10,000,000 seed possibilities (since you used microtime), just test them all). Use a hardware source, such as reading the /dev/random linux, which can use network traffic to randomize. I imagine a server has enough.

  • 2

    "I can’t see why using external hardware like the Random.org API would cause you security problems." Use a hardware you own (and trusts) is one thing, using a third-party service for something that should be confidential is another. Random.org itself says in its FAQ: "...anyone genuinely concerned with security should not trust anyone (including RANDOM.ORG) to generate their Cryptographic Keys." (my emphasis)

  • Every time we consume a third-party service, which usually involves traffic over third-party networks, there are various dangers depending on the implementation, be it yours or the third. The more 'people' are involved in a process, the more complexity to manage security.

  • When the image "suspect" the rand, the cited article explains that this result was obtained only in Windows, which in Linux there was no similar pattern. Which shows that the random number generator of that version of Windows was bad. I don’t know if this applies to recent versions. Anyway, none of this "proof" anything, as one can create results that appear random but still hiding a pattern. The quality of a PRNG should be tested using specific methods.

Show 4 more comments

11

Random data can only be obtained from random procesoss. Physically, only quantum processes are really random and so there are external devices that generate random data: Geiger counters, reversally polarized PN junctions etc.

Without using external equipment, you can:

  • obtain data from sites that generate random numbers: if you trust these sources and the process they use, you can obtain this data externally

  • use the pseudo-random numbers generated on your own computer: the best sources are those of your operating system, as they must have been built following the best practices in force at the time of its construction.

  • In general, if you use the functions/methods/whatever is in your programming language, it will look for those of the operating system. And will be the best sources available to you.

  • make your own source: in general, it is very easy to make a mistake and end up generating less random numbers than those obtained by the operating system itself. Unless you suspect/know that your OS numbers are unreliable, which in this case means that your problems are worse than just trusting or not the numbers it provides.

[Edited by changing the question]

Randomization can be achieved without the use of external hardware, for example by using the time it takes a process to be done?

Yes, the time that a process takes can change, since the processing frequency of a processor can vary with temperature, voltage etc. But this variation can be extremely small, perhaps imperceptible. Could it be checked with the use of a watch? Yes, it could, with an extremely precise watch, e.g. an atomic watch or GPS. But this is not "internal" to a computer. A program, running inside a computer, does not know how long it took to run, without using an external precision clock.

Isn’t that something that’s intrinsic to the quantum states of the processor materials? If not, theoretically, in the smallest computable unit of time by a computer, an exactly equal process is always calculated at the same speed.

Exactly. If you have a processor running at 3.2GHz and each instruction takes a cycle to run, it means that each instruction will run at 0.3125 nanoseconds to run. If your processor is running at 3.199GHz, each instruction will take 0.3126 nanoseconds. It is not simple to own a watch that can detect this time difference.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.