How to generate random characters or string?

Asked

Viewed 3,505 times

2

How to specify the number of times a character can repeat?

Ex: I have ABC, where C should repeat 4 times in a new string 6-character:

( AB CC CC ) or (AC CB CC) or ( AC CC CB )

To can only enter position 1, 3, 5.
B can only enter position 2, 4, 6.
C can enter any position.

  • You’re not allowed to show up AA or BB, only AB.

Example not allowed:

( AA CC CC ) or ( CC AA CC ) or ( CC CC AA ) or
( BB CC CC ) or ( CC BB CC ) or ( CC CC BB ).

I expect all these combinations:

AC AC CC
AC CC CB
AC CB CC
AC CC AC
AB CC CC
CB AC CC
CB CC AC
CB CC CB
CB CB CC
CC AC AC
CC AC CB
CC CB AC
CC CB CB
CC CC AB
CC AB CC

(It doesn’t have to be in that order, but it has to have all these exits)

  • https://pt.meta.stackoverflow.com/a/6425/69359

  • @Rovannlinhalis I made many edits on the other question, then I thought to redo.

  • 1

    yes, but the correct is to use the same editions, as you can see in the response of Renan that I shared, the system can automatically lock your account.

  • Could you elaborate a little more on your problem and comment on the code so we can see your line of thought?

  • I would like to make a 6 character permutation with ABC;

  • I created two string told the number of times I put in a new string and had the random printed, only it was to join the two string together and then did the random and printed.. I’ll still see how to make the restrictions.

  • Maybe this question of Stackoverflow in English help you.

  • I can solve this question through a regular grammar/finite state machine. There are other alternatives, thought in Markov chain, but the limitation of at least 4 Cs becomes difficult to model in traditional Markov chains

Show 3 more comments

4 answers

5

I will address the specific case of your question. Among the many possible ways to generate random information chains, I found these to be the most significant. But first, a little concept.

It’s all information

First of all, I would like to point out that letters, numbers, and what else is being dealt with here is just information that is being generated to be consumed by another process.

In the case of strings (or words), the information that composes them is:

  1. its characters
  2. the position of each of its characters

So for each letter, two pieces of information are needed. We can then represent strings of letters as a set of character/position tuples. For example, the word banana:

{
  ('b',0),
  ('a',1),
  ('n',2),
  ('a',3),
  ('n',4),
  ('a',5)
}

Since we have the guarantee that the second term of the tuple will always be unique (if it was not, then we would assume that a word could have two or more letters in the same position, which is not the case)then each tuple formed is unique and the cardinality of the set that generates a given word is equal to the length of the word.

Each letter a different information? Maybe not

We have seen that a word is nothing more than a set of letter/position tuples. For your specific case, we have 3 possible letters for 6 positions. The letters are:

  • A
  • B
  • C

But they obey certain restrictions. For example, in positions 0, 2 and 4 we can have A or C, never B. For positions 1, 3 and 5 we can have B or C. This implies to say that for a position i informed, the letter can be C or não-C. Just like that.

Why this? Because the maximum possible amount of distinct values at each position is 2: it can be {A,C}, or may be {B,C}. Therefore, we can represent with only 1 bit of information beyond the positional information which is the character in the specific position. How C is common to both groups of options, or whether it is C or whether it is distinct from C (therefore não-C).

Therefore, for your case in question, it is sufficient that we generate a sequence of 0s and 1s (with 0 representing não-C and 1, C), applied the restriction of owning exactly 4 times bit 1 (the C needs to appear exactly 4 times).

But we can still compress more!

Behold What is NOT Random? (video in English) for more details.

In your list of expected outputs, the first word is ACACCC. In the tuple set representation letter/position:

{
  ('A', 0),
  ('C', 1),
  ('A', 2),
  ('C', 3),
  ('C', 4),
  ('C', 5)
}

As has already been discussed, the letter be A or B does not carry information, so either she is C or não-C, 1 or 0:

{
  ('0', 0),
  ('1', 1),
  ('0', 2),
  ('1', 3),
  ('1', 4),
  ('1', 5)
}

Have you noticed how the 1s are in the majority? Well, the difference between that word and this other word also valid is just a change of position between letters:

{
  ('1', 0),
  ('0', 1),
  ('0', 2),
  ('1', 3),
  ('1', 4),
  ('1', 5)
}

What does that mean then? That I would only need a single piece of information to differentiate between ACACCC and CBACCC. But our representational model is not minimal enough. The only real information exchanged was the position of the letter/position tuple ('0', 0) for ('0', 1). How can that be? Because all positions that do not have 0 necessarily has 1, then I need to report only the positions that contains 0. As I already know what character to be informed, it is no longer information, being despicable. Therefore, the words ACACCC and CBACCC are represented by, respectively:

{ 0, 2 }
{ 1, 2 }

TL;DR

The representation of the character string is summarized only in the description of the position of the two different letters of C that are in the word. All the other 4 positions are then C. If the index is even, the letter não-C is A, otherwise it is B. Note that the two positions need to be distinct, otherwise we will be offering a word with five Cs, which is not what we want.

Generating one random word with data

Knowing how to represent the information, generating the answer now consists in generating two distinct integers between 0 and 5 inclusive. An alternative would be to throw dice:

dois dados de 6 faces com resultados distintos, neste caso o resultado da rolagem foi 2 e 4

In this case, the chain obtained would be 101011, which amounts to CBCBCC. If the result were two equal numbers, it would be enough to throw the dice again: of the 21 possible results, 15 are of different numbers, and each of these 15 has double the chances of happening when a result of two equal numbers.

But it would not be very practical for your program to throw dice to generate the string... or would it be?

To create a "dice", we need a random number generator. In this case, the generator is a class object Random. So just generate the whole random number in the range [0, 5], using Next(6) (Next(m) generates an integer in the range [0, m) open in m; as [0, 5] is equivalent in integers to [0, 6), the last number was 6). To respect the randomness predicted by the random number generator, always the same generator needs to be called over and over to get "reasonably" random numbers.

So your algorithm consists of finding two distinct random integers in the interval [0, 6). Then we can summarize it like this:

// cria o gerador de números aleatórios
// você também pode prover esse gerador, no lugar de ficar criando um novo a cada instante
// reutilizar o mesmo gerador de números aleatórios nos garante a entropia máxima possível entre duas gerações de números 
Random r = new Random();

int a, b;

// gira os dados...
a = r.Next(6);
b = r.Next(6);

// insiste na rolagem de um dado até obter um resultado distinto
while (a == b) {
  b = r.Next(6);
}

Okay, now we have two random integer numbers between [0, 6). This is all the information needed to get the string.

string cadeia = "";
for (int i = 0; i < 6; i++) {
  if (i == a || i== b) {
    // caso seja um dos números aleatórios, então é não-C
    if (i % 2 == 0) {
      // posições pares não-C são dedicadas a A
      cadeia += "A";
    } else {
      // caso ímpar, B
      cadeia += "B";
    }
  } else {
    // caso contrário, preencha com C
    cadeia += "C";
  }
}

Then, after performing this procedure, we have a string in the desired format.

Word frequency

To know how often each word will appear, we have a chance to rotate thousands of times and average or try a more theoretical approach. The theoretical seems to be more fun.

The strategy used to generate random numbers by class Random is a subtractive generator (source). According to the Source of the Rosetta Code, the developers of Freeciv imply that the calculation using this subtractive strategy and a module generates low entropy in the least significant bits.

So, the entropy expected for our case is not the largest. Then it would be necessary to rotate several times to get the 15 different words. Since entropy is possibly low, you can expect to rotate in a few words until you get a new.

Walkabout

Another alternative is to use a "wandering walk" to assemble the desired word. What is this?

Well, the correct term would be random walk, but walking sounds more natural. Imagine you have the following grid:

Grid 4x2 com C no eixo horizontal e não-C no eixo vertical, com os pares de inteiros devidamente marcados e com o ponto mais extremo circulado

The best explanation I think there is for wandering walks is an Infinite Series video: What is a Random Walk?.

In the particular case, I will take a 4x2 grid (4 horizontal and 2 vertical) for our walking. So, I just have to define the odds of taking the steps. Ah, I also have to say what these steps are. I go back to the meaning of the steps further, I will finish the rest of the modeling of the walk before I get into the meanings, but I’ll just say that I dedicate a whole section to explain this.

Since my grid is finite, if the walker is on the edge, he can’t go that way anymore. For example, starting from the origin, if in the first two steps the walker chose to go up and up, the third step will necessarily be to the right or down, because he cannot go up or move to the left, because he would fall from the grid.

I don’t care either back-tracking, then the walker will always go towards the final destination, which is the position (4,2). He cannot therefore move either downward or to the left, for both would be one back-track of the way, closer to the origin than to the destination after the step.

Note that on my floor, I am not using the "drunk walk" as was the example that Kelsey used in the video. My movement options are more limited to better represent the problem I’m attacking.

So, I already know that movements to the left or down always have a possibility of 0. I also know that, after any two steps up, the third step in this same direction also has a 0 chance, and analogous to 4 steps to the right, with the fifth step with a 0 chance. But the other steps?

I would like each vertical transition of the same level (like (x, 0) for (x, 1)) have the same chance global to occur. I don’t directly care about the chance of being in each state, so I can manipulate these values at will.

To get out of the first level and into the second, I have 5 possibilities:

  • (0, 0) ==> (0, 1)
  • (1, 0) ==> (1, 1)
  • (2, 0) ==> (2, 1)
  • (3, 0) ==> (3, 1)
  • (4, 0) ==> (4, 1)

So in all, each transaction needs to be triggered 20% of the time.

Starting from the source, before the first step, I have a 100% chance that the walker is in that specific position. So I need a 20% chance for the vertical transaction.

To the point (1, 0), I now have 80% (4/5) chance that the hiker is at this point. So, I need that, of that 80%, 25% (1/4) trigger the vertical transaction (getting a global total of 80% x 25% = 20%). So, being at the point (1, 0), we have:

  • 1/4 possibility to climb
  • 3/4 possibility to go right

Similarly, only 3/5 chance the hiker will reach the point (2, 0), so the vertical transaction must be triggered 1/3 of the times. At the position (3, 0), will be 2/5 chance to be there, so half the shots should be done up. So, in the end, we have 1/5 chance the hiker is in (4, 0), then we need 100% of the shots up. As of any luck that point is at the edge of the grid, it’s compelled to go up even.

The general form in this case is:

  • chance of vertical firing in position (i, 0): 1/(5 - i)
  • chance of horizontal firing in position (i, 0): 1 - 1/(5 - i) = (4 - i)/(5 - i)

In the case of the second shooting... well, I have no encouraging news regarding the balancing. Let’s take the position (4, 1). It will always shoot up, so the overall chance of this transaction being triggered is equal to the overall chance of the hiker being in position (4, 1) after 5 steps. Computing only the chance of leaving the state (4, 0), we have a 20% chance. Then we have the 20% + a value S*X of the walker being in position (3, 1) (likelihood of S) and move horizontally (shooting probability X). As the expected resulting output value is 20%, we also know that 20% = 20% + S*X, then we have to S*X = 0. Since there is a non-zero chance that the hiker will pass through this position, then X must be void.

That is, if I wish the probability of global triggering of vertical transactions to reach the third level, it is necessary to override the possibility of horizontal shots at the second level. This result can be achieved from an extrapolation of what was obtained with X in the previous paragraph. But this is not desired at all. Maybe we should reverse the horizontal and vertical shooting probabilities of the previous level shots?

So, according to this methodology, this is my scoreboard:

Ponto  V   H
-------------
(0,0) 1/5 4/5
(1,0) 1/4 3/4
(2,0) 1/3 2/3
(3,0) 1/2 1/2
(4,0)  1   0
(0,1) 4/5 1/5
(1,1) 3/4 1/4
(2,1) 2/3 1/3
(3,1) 1/2 1/2
(4,1)  1   0
(0,2)  0   1
(1,2)  0   1
(2,2)  0   1
(3,2)  0   1
(4,2) DESTINO

In this table, the column Ponto indicates the trigger probability from that point, the column V the probability of vertical firing and H vertical firing.

Wandering

For this walk, just play a probability and counteract with the table above. To generate a number worthy of this probability, simply call the method NextDouble() of an object of the class Random, if it is less than the table value for the vertical transaction, then it shoots vertically. Otherwise, it shoots horizontally.

For example:

  • if the value 0.2 is obtained, leaving the origin, then the transaction followed will be to the right
  • if 0.49 is obtained from position (3,0), then it will go up

With this, in a loop of 6 steps, we obtain a path that leaves the origin and arrives at the destination (4,2).

Path example:

um exemplo de sequência de disparos, direita-cima-direita-direita-cima-direita

Semantics of the wandering path

I promised I’d explain this after I finished modeling, I didn’t promise?

Let’s go back to the first image:

a primeira imagem de grid novamente

Did you notice her axles? The horizontal axis is C, the vertical is already C with the negation bar on top, so não-C. This means that a step taken vertically consists of creating/consuming a type letter não-C, already horizontal would be the consumption of a C proper. Returning to the binary representation, a vertical step equals 0, whereas a horizontal equals 1.

So, what would be the sequence generated by the above example?

recapitulando o desenho

Seria 101101, or CBCCAC transforming binary information into the final chain.

And if you wanted to get the word whose binary is 101011?

caminho equivalente a <code>CBCBCC</code>

That would give the floor CBCBCC.

With these paths I can also represent two não-Cconsecutive s, as in ABCCCC. The equivalent torque is 001111, then the path drawing would look like this:

caminho equivalente a <code>ABCCCC</code>

Curiosities of this modeling

As explained in video about wandering walks, this processing is nothing less than a Markov chain. And Chomsky in one of his seminal articles on formal languages Three models for the Description of language described how a simple finite Markov process can generate a language of infinite words (section 2). By the way, in the very first paragraphs of this section, Chomsky describes how to create Markov processes for the generation of words.

In another article (On Certain Formal Properties of Grammars), Chomsky demonstrates in Theorem 6 the Definition 9, the definition implies that finite Markov processes generate a regular language when emitting characters when firing a transaction.

Note on the reading of On Certain Formal Properties of Grammars]14: what Chomsky defined in that article as regular grammar today we know how normal Chomsky form for context-free grammars

Now, aren’t we shooting characters while making transactions? Is this character 0 or 1? So we have here a Markov process, with finite number of states, that fires characters. So without even realizing it, we ended up creating the deterministic finite automaton which recognizes the words of its language. Note that, at the word recognition level, the probability of triggering the Markov process can be summarily ignored. The fact of being represented by a regular language means that to recognize the desired words also has an underlying regular expression.

It seems too complicated this story of the wandering walk, no?

Appearances can be deceiving. In this case, the space available to be wandering has more possibility for variations in what you are wanting to recognize/generate than the previous solution.

For example, if you need to recognize with at least one não-C and at most two não-C, with 6 letters in all? Then we would have added to the grid the following points:

  • (5,0)
  • (5,1)

But it makes no sense to add (5,2), because it would take 7 letters (limit is 6 letters).

Should this structure be repeated? For example, after finding a sequence this way, it can have one, two or infinite sequences that follow the same logic of formation. So we’re dealing with Kleene star about this language of As, Bs and Cs, limited to 6 characters in all etc.

Not to mention that, graphically it gets more beautiful.

So, why not just present this answer option? Because this, although more complete and widespread, is not the simplest. So I decided to leave both options. And also start by talking about the simplest to introduce the concepts to be used in this most complete part sounds more natural to the reader.

Enumerating the strings

I presented two generation abstractions. You can use both to generate the enumeration of all possible words. But I find it particularly easier the alternative of the two dice.

Since there are only two dice on 6 different sides, I could simply iterate through all 36 possibilities and, in each of them, check whether it is a valid release or not. Or else I can be smarter and only generate valid possibilities.

To be valid, the second number must happen after the first and necessarily be greater. Why this difference now, since the algorithm shown above made so much? Why (3,1) is equivalent to (1,3) and assuming this limitation makes it easier to program.

To recap, I can encapsulate the generation routine described in the previous section into a function:

public static string geraPalavra(int a, int b) {
    string cadeia = "";
    for (int i = 0; i < 6; i++) {
      if (i == a || i== b) {
        // caso seja um dos números aleatórios, então é não-C
        if (i % 2 == 0) {
          // posições pares não-C são dedicadas a A
          cadeia += "A";
        } else {
          // caso ímpar, B
          cadeia += "B";
        }
      } else {
        // caso contrário, preencha com C
        cadeia += "C";
      }
    }
    return cadeia;
}

And then it would be enough to call this function with the duly completed arguments to enumerate all the possible strings;

for (int a = 0; a < 5; a++) {
    for (int b = a + 1; b < 6; b++) {
        string novaPalavra = geraPalavra(a, b);
        // faz algo com essa string
    }
}

0

Try this solution:

private static string Seleciona() {
    var origem = new List<char>() { 'A', 'B', 'C', 'C', 'C', 'C' };
    var resultado = new StringBuilder();
    int indice;
    var rnd = new Random(GerarSeed());
    for (int i = (origem.Count - 1); i >= 0; i--) {
        indice = rnd.Next(0, i);
        resultado.Append(origem[indice]);
        origem.RemoveAt(indice);
    }
    return resultado.ToString();

    int GerarSeed() {
        using (var rgn = new RNGCryptoServiceProvider()) {
            var rno = new byte[20];
            rgn.GetBytes(rno);
            return BitConverter.ToInt32(rno, 0);
        }
    }
}

An array containing all characters is already predefined. The job is to randomly select the elements and, at each selection, the element is removed from the vector.

To test the code:

public static void Main(string[] args) {
    for (int i = 0; i < 100; i++) {
        Console.WriteLine(Seleciona());
    }
}
  • There are other ways to tell how many times 'C' should repeat?

  • The code solved your quest?

  • I edited the question of what I expect.

0

I think the trick for these questions is to create a vector with the letters and draw them inside the vector and checking the previous character and if C has already appeared four times, see if the code below meets you:

//vetor de char com as letras
            var letras = new char[3];
            letras[0] = 'A';
            letras[1] = 'B';
            letras[2] = 'C';
            var resposta = "";
            int numC = 0;
        while (resposta.Length < 6)
        {
            //gerando um aleatório dentro do intervalo do vetor
            var random = new Random();
            var i = random.Next(letras.Length);

            //caso a resposta tenha tamanho 0 qualquer letrar serve
            if (resposta.Length == 0)
            {
                resposta += letras[i];
                continue;
            }

            if (letras[i] == 'C')
                numC++;

            //o C sempre pode ser usado sempre até quatro vezes, mas o A só pode quando o anterior for B e vice versa
            if ((letras[i] == 'C' && numC < 4) || resposta[resposta.Length - 1] != letras[i])
                resposta += letras[i];
        }

        Console.Write(resposta); 
  • C is required to repeat 4 times. A and B are to complement the 6 characters.

0

That should solve

public static Random random = new Random();

public static string Sortear()
{
    // posição 0, 2, 4
    int a = random.Next(6);

    //posição 1, 3, 5
    int b;
    do 
    {
        b = random.Next(6);
    } while(b == a);

    //string res = "a: "+a+" b: "+b +" -> ";
    string res = "";
    for (int i = 0; i < 6; i++)
    {
        if (i == a && (a % 2 == 0))
        {
            res += "A";
        }
        else if (i == a && (a % 2 == 1))
        {
            res += "B";
        }
        else if (i == b && (b % 2 == 0))
        {
            res += "A";
        }
        else if (i == b && (b % 2 == 1))
        {
            res += "B";
        }
        else 
        {
            res += "C";
        }
    }
    return res;
}

If you want me to show everyone, all you have to do is change the rule for one

public static string MostrarTodas()
{
    string res = "";
    for (int a = 0; a < 6; a++)
    {
        for (int b = a + 1; b < 6; b++)
        {
            res += "\n";
            for (int i = 0; i < 6; i++)
            {
                if (i == a && (a % 2 == 0))
                {
                    res += "A";
                }
                else if (i == a && (a % 2 == 1))
                {
                    res += "B";
                }
                else if (i == b && (b % 2 == 0))
                {
                    res += "A";
                }
                else if (i == b && (b % 2 == 1))
                {
                    res += "B";
                }
                else 
                {
                    res += "C";
                }
            }
        }
    }
    return res;
}
  • It didn’t generate all of them. I also don’t know how to validate that each group of 6 characters or groups of 3 pairs are different from each other. Ex: AC CC CB has no other AC CC CB result.

  • I had misunderstood the question, now it is correct

Browser other questions tagged

You are not signed in. Login or sign up in order to post.