I will address the specific case of your question. Among the many possible ways to generate random information chains, I found these to be the most significant. But first, a little concept.
It’s all information
First of all, I would like to point out that letters, numbers, and what else is being dealt with here is just information that is being generated to be consumed by another process.
In the case of strings (or words), the information that composes them is:
- its characters
- the position of each of its characters
So for each letter, two pieces of information are needed. We can then represent strings of letters as a set of character/position tuples. For example, the word banana
:
{
('b',0),
('a',1),
('n',2),
('a',3),
('n',4),
('a',5)
}
Since we have the guarantee that the second term of the tuple will always be unique (if it was not, then we would assume that a word could have two or more letters in the same position, which is not the case)then each tuple formed is unique and the cardinality of the set that generates a given word is equal to the length of the word.
Each letter a different information? Maybe not
We have seen that a word is nothing more than a set of letter/position tuples. For your specific case, we have 3 possible letters for 6 positions. The letters are:
But they obey certain restrictions. For example, in positions 0, 2 and 4 we can have A
or C
, never B
. For positions 1, 3 and 5 we can have B
or C
. This implies to say that for a position i
informed, the letter can be C
or não-C
. Just like that.
Why this? Because the maximum possible amount of distinct values at each position is 2: it can be {A,C}
, or may be {B,C}
. Therefore, we can represent with only 1 bit of information beyond the positional information which is the character in the specific position. How C
is common to both groups of options, or whether it is C
or whether it is distinct from C
(therefore não-C
).
Therefore, for your case in question, it is sufficient that we generate a sequence of 0s and 1s (with 0 representing não-C
and 1, C
), applied the restriction of owning exactly 4 times bit 1 (the C
needs to appear exactly 4 times).
But we can still compress more!
Behold What is NOT Random? (video in English) for more details.
In your list of expected outputs, the first word is ACACCC
. In the tuple set representation letter/position:
{
('A', 0),
('C', 1),
('A', 2),
('C', 3),
('C', 4),
('C', 5)
}
As has already been discussed, the letter be A
or B
does not carry information, so either she is C
or não-C
, 1 or 0:
{
('0', 0),
('1', 1),
('0', 2),
('1', 3),
('1', 4),
('1', 5)
}
Have you noticed how the 1s are in the majority? Well, the difference between that word and this other word also valid is just a change of position between letters:
{
('1', 0),
('0', 1),
('0', 2),
('1', 3),
('1', 4),
('1', 5)
}
What does that mean then? That I would only need a single piece of information to differentiate between ACACCC
and CBACCC
. But our representational model is not minimal enough. The only real information exchanged was the position of the letter/position tuple ('0', 0)
for ('0', 1)
. How can that be? Because all positions that do not have 0 necessarily has 1, then I need to report only the positions that contains 0. As I already know what character to be informed, it is no longer information, being despicable. Therefore, the words ACACCC
and CBACCC
are represented by, respectively:
{ 0, 2 }
{ 1, 2 }
TL;DR
The representation of the character string is summarized only in the description of the position of the two different letters of C
that are in the word. All the other 4 positions are then C
. If the index is even, the letter não-C
is A
, otherwise it is B
. Note that the two positions need to be distinct, otherwise we will be offering a word with five C
s, which is not what we want.
Generating one random word with data
Knowing how to represent the information, generating the answer now consists in generating two distinct integers between 0 and 5 inclusive. An alternative would be to throw dice:
In this case, the chain obtained would be 101011
, which amounts to CBCBCC
. If the result were two equal numbers, it would be enough to throw the dice again: of the 21 possible results, 15 are of different numbers, and each of these 15 has double the chances of happening when a result of two equal numbers.
But it would not be very practical for your program to throw dice to generate the string... or would it be?
To create a "dice", we need a random number generator. In this case, the generator is a class object Random
. So just generate the whole random number in the range [0, 5]
, using Next(6)
(Next(m)
generates an integer in the range [0, m)
open in m
; as [0, 5]
is equivalent in integers to [0, 6)
, the last number was 6). To respect the randomness predicted by the random number generator, always the same generator needs to be called over and over to get "reasonably" random numbers.
So your algorithm consists of finding two distinct random integers in the interval [0, 6)
. Then we can summarize it like this:
// cria o gerador de números aleatórios
// você também pode prover esse gerador, no lugar de ficar criando um novo a cada instante
// reutilizar o mesmo gerador de números aleatórios nos garante a entropia máxima possível entre duas gerações de números
Random r = new Random();
int a, b;
// gira os dados...
a = r.Next(6);
b = r.Next(6);
// insiste na rolagem de um dado até obter um resultado distinto
while (a == b) {
b = r.Next(6);
}
Okay, now we have two random integer numbers between [0, 6)
. This is all the information needed to get the string.
string cadeia = "";
for (int i = 0; i < 6; i++) {
if (i == a || i== b) {
// caso seja um dos números aleatórios, então é não-C
if (i % 2 == 0) {
// posições pares não-C são dedicadas a A
cadeia += "A";
} else {
// caso ímpar, B
cadeia += "B";
}
} else {
// caso contrário, preencha com C
cadeia += "C";
}
}
Then, after performing this procedure, we have a string in the desired format.
Word frequency
To know how often each word will appear, we have a chance to rotate thousands of times and average or try a more theoretical approach. The theoretical seems to be more fun.
The strategy used to generate random numbers by class Random
is a subtractive generator (source). According to the Source of the Rosetta Code, the developers of Freeciv imply that the calculation using this subtractive strategy and a module generates low entropy in the least significant bits.
So, the entropy expected for our case is not the largest. Then it would be necessary to rotate several times to get the 15 different words. Since entropy is possibly low, you can expect to rotate in a few words until you get a new.
Walkabout
Another alternative is to use a "wandering walk" to assemble the desired word. What is this?
Well, the correct term would be random walk, but walking sounds more natural. Imagine you have the following grid:
The best explanation I think there is for wandering walks is an Infinite Series video: What is a Random Walk?.
In the particular case, I will take a 4x2 grid (4 horizontal and 2 vertical) for our walking. So, I just have to define the odds of taking the steps. Ah, I also have to say what these steps are. I go back to the meaning of the steps further, I will finish the rest of the modeling of the walk before I get into the meanings, but I’ll just say that I dedicate a whole section to explain this.
Since my grid is finite, if the walker is on the edge, he can’t go that way anymore. For example, starting from the origin, if in the first two steps the walker chose to go up and up, the third step will necessarily be to the right or down, because he cannot go up or move to the left, because he would fall from the grid.
I don’t care either back-tracking, then the walker will always go towards the final destination, which is the position (4,2)
. He cannot therefore move either downward or to the left, for both would be one back-track of the way, closer to the origin than to the destination after the step.
Note that on my floor, I am not using the "drunk walk" as was the example that Kelsey used in the video. My movement options are more limited to better represent the problem I’m attacking.
So, I already know that movements to the left or down always have a possibility of 0. I also know that, after any two steps up, the third step in this same direction also has a 0 chance, and analogous to 4 steps to the right, with the fifth step with a 0 chance.
But the other steps?
I would like each vertical transition of the same level (like (x, 0)
for (x, 1)
) have the same chance global to occur. I don’t directly care about the chance of being in each state, so I can manipulate these values at will.
To get out of the first level and into the second, I have 5 possibilities:
(0, 0)
==> (0, 1)
(1, 0)
==> (1, 1)
(2, 0)
==> (2, 1)
(3, 0)
==> (3, 1)
(4, 0)
==> (4, 1)
So in all, each transaction needs to be triggered 20% of the time.
Starting from the source, before the first step, I have a 100% chance that the walker is in that specific position. So I need a 20% chance for the vertical transaction.
To the point (1, 0)
, I now have 80% (4/5) chance that the hiker is at this point. So, I need that, of that 80%, 25% (1/4) trigger the vertical transaction (getting a global total of 80% x 25% = 20%). So, being at the point (1, 0)
, we have:
- 1/4 possibility to climb
- 3/4 possibility to go right
Similarly, only 3/5 chance the hiker will reach the point (2, 0)
, so the vertical transaction must be triggered 1/3 of the times. At the position (3, 0)
, will be 2/5 chance to be there, so half the shots should be done up. So, in the end, we have 1/5 chance the hiker is in (4, 0)
, then we need 100% of the shots up. As of any luck that point is at the edge of the grid, it’s compelled to go up even.
The general form in this case is:
- chance of vertical firing in position
(i, 0)
: 1/(5 - i)
- chance of horizontal firing in position
(i, 0)
: 1 - 1/(5 - i) = (4 - i)/(5 - i)
In the case of the second shooting... well, I have no encouraging news regarding the balancing. Let’s take the position (4, 1)
. It will always shoot up, so the overall chance of this transaction being triggered is equal to the overall chance of the hiker being in position (4, 1)
after 5 steps. Computing only the chance of leaving the state (4, 0)
, we have a 20% chance. Then we have the 20% + a value S*X
of the walker being in position (3, 1)
(likelihood of S
) and move horizontally (shooting probability X
). As the expected resulting output value is 20%, we also know that 20% = 20% + S*X
, then we have to S*X = 0
. Since there is a non-zero chance that the hiker will pass through this position, then X
must be void.
That is, if I wish the probability of global triggering of vertical transactions to reach the third level, it is necessary to override the possibility of horizontal shots at the second level. This result can be achieved from an extrapolation of what was obtained with X
in the previous paragraph. But this is not desired at all. Maybe we should reverse the horizontal and vertical shooting probabilities of the previous level shots?
So, according to this methodology, this is my scoreboard:
Ponto V H
-------------
(0,0) 1/5 4/5
(1,0) 1/4 3/4
(2,0) 1/3 2/3
(3,0) 1/2 1/2
(4,0) 1 0
(0,1) 4/5 1/5
(1,1) 3/4 1/4
(2,1) 2/3 1/3
(3,1) 1/2 1/2
(4,1) 1 0
(0,2) 0 1
(1,2) 0 1
(2,2) 0 1
(3,2) 0 1
(4,2) DESTINO
In this table, the column Ponto
indicates the trigger probability from that point, the column V
the probability of vertical firing and H
vertical firing.
Wandering
For this walk, just play a probability and counteract with the table above. To generate a number worthy of this probability, simply call the method NextDouble()
of an object of the class Random
, if it is less than the table value for the vertical transaction, then it shoots vertically. Otherwise, it shoots horizontally.
For example:
- if the value 0.2 is obtained, leaving the origin, then the transaction followed will be to the right
- if 0.49 is obtained from position
(3,0)
, then it will go up
With this, in a loop of 6 steps, we obtain a path that leaves the origin and arrives at the destination (4,2)
.
Path example:
Semantics of the wandering path
I promised I’d explain this after I finished modeling, I didn’t promise?
Let’s go back to the first image:
Did you notice her axles? The horizontal axis is C
, the vertical is already C
with the negation bar on top, so não-C
. This means that a step taken vertically consists of creating/consuming a type letter não-C
, already horizontal would be the consumption of a C
proper. Returning to the binary representation, a vertical step equals 0, whereas a horizontal equals 1.
So, what would be the sequence generated by the above example?
Seria 101101
, or CBCCAC
transforming binary information into the final chain.
And if you wanted to get the word whose binary is 101011
?
That would give the floor CBCBCC
.
With these paths I can also represent two não-C
consecutive s, as in ABCCCC
. The equivalent torque is 001111
, then the path drawing would look like this:
Curiosities of this modeling
As explained in video about wandering walks, this processing is nothing less than a Markov chain. And Chomsky in one of his seminal articles on formal languages Three models for the Description of language described how a simple finite Markov process can generate a language of infinite words (section 2). By the way, in the very first paragraphs of this section, Chomsky describes how to create Markov processes for the generation of words.
In another article (On Certain Formal Properties of Grammars), Chomsky demonstrates in Theorem 6 the Definition 9, the definition implies that finite Markov processes generate a regular language when emitting characters when firing a transaction.
Note on the reading of On Certain Formal Properties of Grammars]14: what Chomsky defined in that article as regular grammar today we know how normal Chomsky form for context-free grammars
Now, aren’t we shooting characters while making transactions? Is this character 0 or 1? So we have here a Markov process, with finite number of states, that fires characters. So without even realizing it, we ended up creating the deterministic finite automaton which recognizes the words of its language. Note that, at the word recognition level, the probability of triggering the Markov process can be summarily ignored. The fact of being represented by a regular language means that to recognize the desired words also has an underlying regular expression.
It seems too complicated this story of the wandering walk, no?
Appearances can be deceiving. In this case, the space available to be wandering has more possibility for variations in what you are wanting to recognize/generate than the previous solution.
For example, if you need to recognize with at least one não-C
and at most two não-C
, with 6 letters in all? Then we would have added to the grid the following points:
But it makes no sense to add (5,2)
, because it would take 7 letters (limit is 6 letters).
Should this structure be repeated? For example, after finding a sequence this way, it can have one, two or infinite sequences that follow the same logic of formation. So we’re dealing with Kleene star about this language of As, Bs and Cs, limited to 6 characters in all etc.
Not to mention that, graphically it gets more beautiful.
So, why not just present this answer option? Because this, although more complete and widespread, is not the simplest. So I decided to leave both options. And also start by talking about the simplest to introduce the concepts to be used in this most complete part sounds more natural to the reader.
Enumerating the strings
I presented two generation abstractions. You can use both to generate the enumeration of all possible words. But I find it particularly easier the alternative of the two dice.
Since there are only two dice on 6 different sides, I could simply iterate through all 36 possibilities and, in each of them, check whether it is a valid release or not. Or else I can be smarter and only generate valid possibilities.
To be valid, the second number must happen after the first and necessarily be greater. Why this difference now, since the algorithm shown above made so much? Why (3,1)
is equivalent to (1,3)
and assuming this limitation makes it easier to program.
To recap, I can encapsulate the generation routine described in the previous section into a function:
public static string geraPalavra(int a, int b) {
string cadeia = "";
for (int i = 0; i < 6; i++) {
if (i == a || i== b) {
// caso seja um dos números aleatórios, então é não-C
if (i % 2 == 0) {
// posições pares não-C são dedicadas a A
cadeia += "A";
} else {
// caso ímpar, B
cadeia += "B";
}
} else {
// caso contrário, preencha com C
cadeia += "C";
}
}
return cadeia;
}
And then it would be enough to call this function with the duly completed arguments to enumerate all the possible strings;
for (int a = 0; a < 5; a++) {
for (int b = a + 1; b < 6; b++) {
string novaPalavra = geraPalavra(a, b);
// faz algo com essa string
}
}
https://pt.meta.stackoverflow.com/a/6425/69359
– Rovann Linhalis
@Rovannlinhalis I made many edits on the other question, then I thought to redo.
– Diego Well
yes, but the correct is to use the same editions, as you can see in the response of Renan that I shared, the system can automatically lock your account.
– Rovann Linhalis
Could you elaborate a little more on your problem and comment on the code so we can see your line of thought?
– Paz
I would like to make a 6 character permutation with ABC;
– Diego Well
I created two string told the number of times I put in a new string and had the random printed, only it was to join the two string together and then did the random and printed.. I’ll still see how to make the restrictions.
– Diego Well
Maybe this question of Stackoverflow in English help you.
– LP. Gonçalves
I can solve this question through a regular grammar/finite state machine. There are other alternatives, thought in Markov chain, but the limitation of at least 4 Cs becomes difficult to model in traditional Markov chains
– Jefferson Quesado