Generate multiple different arrays containing random numbers

Question

Generate multiple different arrays containing random numbers

Asked 4 years, 2 months ago

Viewed 77 times

4

I am developing a program and need to generate an array of 16 elements, in which each input is 0, 1 or 2; and then I need to check if this array meets certain conditions. To generate this array, I am doing:

ni = np.array(np.random.randint(0,3))
for i in range(15):
    n1 = np.array(np.random.randint(0,3))
    ni = np.append(ni,n1)

The output is:

array([0, 0, 1, 0, 1, 2, 2, 0, 2, 1, 0, 1, 2, 1, 0, 2])

The problem is that this method sometimes generates repeated arrays, which somewhat hinders the performance of my problem. Is there any way to ensure that new arrays will always be generated? In case the output array could no longer be generated.

3 answers

Browser other questions tagged python random

You are not signed in. Login or sign up in order to post.

by Fernando Favini • 66 points · Answer 1 · 2021-05-31T12:52:02+00:00

That’s an algorithm problem, and I see several options.

To illustrate, I will work with a 2 element array:

if you think, this array can be represented by a number that will be written in base 3, so the possible values (for 2 digits) 0(00), 1(01), 2(02), 3(10), 4(11), 5(12), 6(20), 7(21), 8(22)

Therefore, Voce can generate a random number between 0 and 8 (3ⁿ - 1) and convert it to array.

For ease, let’s talk about an array of 3 positions (number between 0 and 26):

number # número aleatório entre 0 e 26
array[2] = number % 3
number = number / 3
array[1] = number % 3
array[0] = number / 3

Thus, you can keep the record of arrays by keeping a record of the numero in a cache array.

cache # array de 3^n posições inicializado com false em todas as posições

do
  number = random(0, 3^n)
while cache[number] # o numero tem que ser regerado ate se encontrar um que nunca foi usado

cache[number] = true

array = gerar_array_from_number(number)

Generate a string from the array and use it to check that the array has already been generated again:

string = array.join("")

use another function to generate a hash of the array and check if it has already been generated previously

Another option is to generate multiple numbers between 0 and 3ⁿ - 1, and convert them to base 3, thus obtaining the respective array:

import numpy as np
from random import sample

def gerar_array(x, base, size):
    digits = []
    while x:
        digits.insert(0, x % base)
        x //= base
    if len(digits) < size:
        digits = ([0] * (size - len(digits))) + digits
    return np.array(digits)

# números que podem estar no array estão entre 0 e n - 1
n = 3

# tamanho dos arrays
tamanho_arrays = 16
# gerar 20 arrays diferentes
quantidade = 20

for i in sample(range(n ** tamanho_arrays - 1), k=quantidade):
    print(gerar_array(i, n, tamanho_arrays))

sample ensures that the generated numbers do not repeat, so you would not need a cache to know if they are repeated.

by hkotsubo • **55,826** points · Answer 2 · 2021-05-31T12:53:11+00:00

Deep down you want to "generate N random things without repeating" (several different arrays), so an alternative is to follow this idea:

generate all possibilities (assuming the total amount is 'X')
choose N random indices between zero and X, and take the possibilities that are in these indices

For the first step, you can use itertools.product, and for the second step, use random.sample to generate the indices, and itertools.slice to take the element that is in that index. Something like this:

import numpy as np
from itertools import product, islice
from random import sample

# números que podem estar no array
nums = range(0, 3)
# tamanho dos arrays
tamanho_arrays = 16
# gerar 20 arrays diferentes
quantidade = 20
# quantidade total de arrays possíveis
total_arrays_possiveis = len(nums) ** tamanho_arrays

# sample pega 20 índices aleatórios entre zero e o total de arrays possíveis
for indice in sample(range(total_arrays_possiveis), k=quantidade):
    # gera as possibilidades
    todos_possiveis = product(nums, repeat=tamanho_arrays)
    # pega somente a que está na posição do índice
    lista = next(islice(todos_possiveis, indice, indice + 1))
    array = np.array(lista) # gera o np.array
    print(array)

I used itertools because it would be very costly to generate all the possibilities and keep them in memory. In your specific case, there are three possible values (the numbers 0, 1 and 2) in an array with 16 elements, so the total possibilities is 3¹⁶ (i.e., 43,046,721 possibilities - more than forty-three million possible arrays).

Using itertools, the elements are only obtained when necessary, saving memory (and also time, because generating everything would take a long time).

With sample i guarantee that there will be no repeated indexes, and so I guarantee that the array to be obtained will never be the same as the ones that were previously taken.

Store already fetched arrays in a list and see if new ones already exist in this list (as suggested in another answer) It is also an option, but may not scale well if the number of arrays to be generated is too large. For example, if you want to generate 10 thousand different arrays, there will come a time when the list will have, for example, 9 thousand arrays. Then you’ll have to go through that nine grand to see if it’s repeated. Then it goes through 9001, then 9002, etc. It’s a very inefficient algorithm (also called jocosely Shlemiel the Painter’s Algorithm) - of course for small values the difference will be tiny, but remember that "for small values, everything is fast".

And remember that itertools.product generates all possible arrays (including those where all numbers are equal). But this is not a "bug", since with randint this can also happen (only has a smaller chance, but being "random", it is not impossible).

by Evilmaax • **2,613** points · Answer 3 · 2021-05-31T12:47:31+00:00

Can’t you use an auxiliary list? If possible, it’s quite easy: Create this list and, after generating the random number array, do a check, if the sequence is unique it is inserted in your array, otherwise it generates a new sequence.

Something more or less like this:

lista = []
ni = np.array(np.random.randint(0,3))
for i in range(15):
    n1 = np.array(np.random.randint(0,3))
    ni = np.append(ni,n1)
if lista já contém o valor gerado:
   gera uma nova sequência aleatória
else:
   salva o valor gerado

The verification can be in various ways, so you can choose the best one for your case.