At first I considered applying the algorithm of Detection of Floyd cycles, also known as the "turtle and rabbit" algorithm, but the problem seems to be simpler than that.
Logarithmic complexity features usually involve a comparison that decreases over time. For example, the binary search, which divides the vector always in two.
The fact that it is squared could also be associated with comparisons between the elements of the vector, as occurs with the bubble ordering algorithm (Bubble Sort).
All in all, plus the definition of the problem:
S[K] = S[K+P] for all K, where 0 K < Q P
I thought of a simple algorithm that works basically as follows:
- Calculates the longest period by dividing the bit string size by 2, making
P = T / 2
, being T
the bit string size.
Calculates the number of strings to compare by Q = LEN / P
, always rounding up
For each K = 1..P
and J = 2..K
, compares each element in position K
with the correspondent in position K * J + P
, only when K * J + P <= T
If you find a different element
- If
P > 1
, decreases the period P
and returns to the step 2
again
- Otherwise, just keep going
4
If P <= 0
returns -1
, else, return P
Practical example
Given the example number 955
, whose binary value is 1110111011
, we can calculate the initial values of Steps 1 and 2 as follows:
T = 10
P = 10 / 2 = 5
Q = 10 / 5 = 2
Then the comparison in Step 3 occurs as follows:
K = 1
: 1 1 1 0 1 | 1 1 0 1 1 (equal)
K = 2
: 1 1 1 0 1 | 1 1 0 1 1 (equal)
K = 3
: 1 1 1 0 1 | 1 1 0 1 1 (different)
When comparing the third element of each period, a different element was found. Then, updated the variables like this:
P = P - 1 = 4
Q = 10 / 3 = 4 (arredondando para cima)
And we continue the comparison:
K = 1
: 1 1 1 0 | 1 1 1 0 | 1 1 (equal)
K = 2
: 1 1 1 0 | 1 1 1 0 | 1 1 (equals)
K = 3
: 1 1 1 0 | 1 1 1 0 | 1 1 (equal)
K = 4
: 1 1 1 0 | 1 1 1 0 | 1 1 (iguals)
As there is no difference, then we return the value of P
, that is 4
.
Java implementation
public static int periodo( int n ) {
// validação inicial: precisa ser positivo e com pelo menos duas casas binárias
if( n < 2 ) return -1;
// inicialização de variáveis
String s = Integer.toBinaryString( n );
byte[ ] bytes = s.getBytes();
int t = bytes.length;
// período inicial
int p = t / 2;
// verifica se o período se repete pelo menos uma vez
boolean diferente;
do {
diferente = false;
int qp = t / p; // quantidade de períodos
if (t % p > 0) qp++; //somar 1 para comparar os caracteres de resto no final
// verifica se a repetição ocorre
for2: for( int k = 0 ; k < p ; k++ ) {
for (int j = 1; j < qp; j++) {
if( k + p * j < t && bytes[ k ] != bytes[ k + p * j ] ) {
diferente = true;
break for2;
}
}
}
//se não encontrou repetição, tenta um período menor
if( diferente ) {
p--;
}
} while( diferente && p > 0 );
return p <= 0 ? -1 : p;
}
Some Considerations
I’m not very good at judging the complexity of an algorithm, but I believe in the worst case the above algorithm will do N / 2
iterations and, in each, it makes iterations in shorter periods, which leads to a logarithmic temporal complexity.
As for the special (memory) complexity, the algorithm is O(1)
because it does not add new data structures beyond the original, although the implementation is O(N)
because it creates an auxiliary vector of the same bit string size.
Something that leaves me with a flea behind the ear is that the problem suggested a special complexity also logarithmic, so maybe there is some data structure that can optimize the temporal complexity of the solution.
One solution I thought of with logarithmic time complexity is to split the bit string into several strings and then compare to see if they have the same content. However, I suspect that this solution would be much less efficient, since it would execute several substring
s seems to me something unnecessary, besides which the comparison of Strings would also occur character to character.
Another idea is to recover the binary section of each period and compare the equivalent in its numerical format. Perhaps still with binary operations the algorithm could be very efficient. But I haven’t analyzed the complexity of converting each period from binary to decimal format.
Yet another idea to optimize this further would be to make all this comparison using binary operators to check the equality of periods, for example, using bit chains like this: 1000010000
, 1000100010
, 1001001001
, 1010101010
. It’s an idea.
Finally, I do not know if the solution I proposed meets the requirement of the question, but I hope to have helped at least with some ideas.
And, before concluding, just one warning: beware of language performance when participating in an online challenge. I have already solved some of these problems using Java and Python. Unfortunately, I often got one timeout even with a good algorithm. Some sites add speed multipliers, for example Java 1,5
and Python 2,5
to try to match the efficiency of the algorithm with faster languages like C++. However, this is not always fair and just look at the ranking to see which languages are always at the top.
Updating
Based on gist from Anthony Accioly, I made an attempt to further optimize code using bit-only manipulation. Follow the code:
public class BitPeriod5 {
public int solution(int n) {
// validação inicial: precisa ser positivo e com pelo menos duas casas binárias
if (n < 2) {
return -1;
}
// inicialização de variáveis
final int t = 32 - Integer.numberOfLeadingZeros(n);
// descarta bits não significantes
final int s = n >>> Integer.numberOfTrailingZeros(n);
// período inicial
int p = 1;
int maxp = t / 2;
int vp = 0;
// verifica se o período se repete pelo menos uma vez
boolean diferente;
do {
diferente = false;
int qp = t / p + 1;
if (t != p * qp) {
qp++; //somar 1 para comparar os caracteres de resto no final
}
vp = vp << 1 | 1;
int vi = n & vp;
for (int i = 1; i < qp; i++) {
if (((n >> (i*p)) & vp) != (vi & vp)) {
diferente = true;
}
}
//se não encontrou repetição, tenta um período menor
if (diferente) {
p++;
}
} while (diferente && p <= maxp);
return p > maxp ? -1 : p;
}
public static void main(String[] args) {
final BitPeriod5 bitPeriod = new BitPeriod5();
for (int i = 0; i < 100; i++) {
long nano = System.nanoTime();
for (int j = 0; j < 100000; j++) {
bitPeriod.solution(10);
bitPeriod.solution(187);
bitPeriod.solution(955);
bitPeriod.solution(1000000000);
bitPeriod.solution(0b10101000001010100000);
}
System.out.printf("%.2f\n", ((double) (System.nanoTime()) - nano) * 1e-6);
}
}
}
However, I noticed that this solution is slower than the previous one, even though it has a for
unless.
Lesson 1
Manipulating bits is more "expensive" than accessing vector elements and comparing their value.
I also noticed that removing the module operation, the performance of the code greatly improves.
Changing the section:
if (t % qp > 0) ...
That’s why:
if (t != p * qp) ...
The performance gain was in the 25% range. It is very interesting to see how each operation most drastically affects the performance.
Lesson 2
The division module operation is very "expensive" from the performance point of view.
I call shenaningans! 110100 is the 110100 sequence repeated exactly once!
– Oralista de Sistemas
This problem is vaguely familiar... I’m sure there’s a solution involving finite automata, and it’s very similar to the problem of marrying a string with a regular expression. I’ll try to remember better, but I’ve got the hint.
– mgibsonbr
Yeah, I also thought about Kleene’s closure, but I couldn’t think of any algorithm to infer the alphabet given to the word, especially since it allows cases of partial composition like the
905 = 1110111011
. Later I also ran into the algorithms of cycle detection but I could not formulate the problem exactly according to the proposed algorithms.– Anthony Accioly
Disregard what I said about automata, I thought of them because I’ve recently been rereading this. But the problem I remembered was another (string search), has nothing to do with repeated patterns no... As for your solution proposal, I believe it is not far from the optimal solution no, although maybe can to reduce a little the space of solutions (I have some ideas here, but as I am lousy in math I don’t know it’s in my head or not... if I have something concrete, then I come back here).
– mgibsonbr
P.S. I agree that the repetitions have to be whole, otherwise it opens up a lot of room for "gambiarra"... Ex.:
101110010
is1011100
repeated 1 time and 2/7.– mgibsonbr
@Anthonyaccioly But one thing has to be clear, in the first case there are 2.5 periods of 4? And so there would be a multitude of periods and it seems to me that it was very difficult to resolve. You could even have a decimal with several periods. .
– Jorge B.
@Anthonyaccioly Although I don’t really have an answer (I gave up!), here’s an additional question - to help you clear up the ideas: you want the greater repeating pattern, or the minor? Because if your jail is
0000000...
then there is a pattern of size 1, one of size 2, one of size 3... and although I can’t prove it, I strongly suspect that if there are two patterns with sizesA
andB
then there is also one with sizemdc(A,B)
- even if they are cousins to each other (will have a size pattern1
). That is, when you find the smallest, stop - because all will be multiples of it.– mgibsonbr
@Jorge B. for 2.5 periods I don’t really mean a fraction, but there may be an incomplete "rest" of the binary string (See the example of 905 = 1110 1110 11), that is, the last repetition may be partial and not complete (this case was foreseen in the exercise).
– Anthony Accioly
@mgibsonbr I think I have found a solution http://stackoverflow.com/a/21171365/664577. (Page 340 of this PDF here http://www-igm.univ-mlv.fr/~mac/REC/text-Algorithms.pdf), let me try to remember a little bit of Pascal as soon as I can understand the solution put a response hehe.
– Anthony Accioly
Everyone, I received today the "results" of the exam, which showed the definition of the problem, as well as the worst cases expected. Unfortunately the solution I found above is also not great, so the problem remains open. I cleaned up the comments and includes official definitions. @utluiz, see the settings the exercise is looking for the lowest standard even (good feeling).
– Anthony Accioly
@Jorgeb., also see the official definitions, really partial patterns are acceptable, since they
P < Q / 2
, that is, that the pattern is repeated at least twice.– Anthony Accioly
This expected complexity is expressed as a function of what? Remember that for an integer
n
, the size of your bit string is [the ceiling of]log2(n)
. That is, a "quadratic solution in relation to Q" corresponds exactly tolog2(n)^2
...– mgibsonbr
Hmm... it makes sense @mgibsonbr. In my results I exceeded the time limit of some unit tests (but my algorithm had a stupid problem... I only got 38 out of 100 points). Maybe you’re right and I want to optimize a solution already "good enough".
– Anthony Accioly
This paper here seems to be a good way...
– carlosrafaelgn