Here is the complete solution:
To query the following was the fruit of a number of considerations, and meets the following requirements:
Returns results sorted by field codigo
does not group repeated results from different sequences, i.e.: if a sequence is interrupted, even if the same coordinate appears again in the future, a new sequence will be considered;
returns only sequences that contain 8 or more repeated coordinates and in sequence ;
returns all rows containing the coordinates that meet the requirement, not just the grouped summary.
This is the query ready-made:
SELECT * FROM coordenada AS r3 INNER JOIN (
SELECT
codigo_a,
MAX(codigo) AS codigo_b
FROM (
SELECT
codigo,
@combinado:=CONCAT(latitude,'x',longitude),
@reinicio := ( @combinado != @anterior) AS reinicio,
@anterior := @combinado AS anterior,
@contagem := IF(@reinicio, 1, @contagem+1 ) AS contagem,
@codigo_a := IF(@reinicio, codigo, @codigo_a) AS codigo_a,
`data`
FROM coordenada
ORDER BY codigo
) AS r1
WHERE contagem >= 8
GROUP BY codigo_a
) AS r2
ON r3.codigo >= codigo_a AND r3.codigo <= codigo_b
ORDER BY codigo
Detailed explanation:
Part 1
Let’s start with the select innermost (R3):
SELECT
codigo,
@combinado:=CONCAT(latitude,'x',longitude),
@reinicio := ( @combinado != @anterior) AS reinicio,
@anterior := @combinado AS anterior,
@contagem := IF(@reinicio, 1, @contagem+1 ) AS contagem,
@codigo_a := IF(@reinicio, codigo, @codigo_a) AS codigo_a,
`data`
FROM coordenada
ORDER BY codigo
The logic is as follows:
This select makes an initial step in the data, and we start by creating a user variable in the "lat x long" format, to facilitate testing of the following conditions.
Then we create the flag @reinicio
, which is set if the current row coordinate is different from the previous row coordinate (i.e., beginning of a new sequence).
If they’re different, we’ll reset the @contagem
from number 1. If they are equal, increment the count value.
If so, a new sequence, we store the codigo
initial of that sequence in @codigo_a
. If we are in the same sequence, we carry forward the @codigo_a
, maintaining its value. In other words, the @codigo_a
will always keep the code initial of the sequence in which we are, in all lines of this sequence.
The result will look like this:
table 1
Codigo, Combinado, Reinicio, Contagem, Codigo A
1, '-12x-63', 1, 1, 1
2, '-29x-63', 1, 1, 2
3, '-29x-63', 0, 2, 2
4, '-29x-63', 0, 3, 2
5, '-29x-63', 0, 4, 2
6, '-29x-63', 0, 5, 2
7, '-29x-63', 0, 6, 2
8, '-29x-63', 0, 7, 2
9, '-29x-63', 0, 8, 2
10, '-29x-63', 0, 9, 2
11, '-29x-63', 0, 10, 2
12, '-29x-63', 0, 11, 2
13, '-29x-63', 0, 12, 2
14, '-40x32' , 1, 1, 14
15, '-29x-63', 1, 1, 15
16, '-29x-63', 0, 2, 15
17, '-29x-63', 0, 3, 15
18, '-29x-63', 0, 4, 15
19, '-29x-63', 0, 5, 15
20, '-29x-63', 0, 6, 15
21, '-29x-63', 0, 7, 15
22, '-29x-63', 0, 8, 15
23, '-29x-63', 0, 9, 15
24, '-29x-63', 0, 10, 15
25 .....
Note the following: every time the coordinate changes, it triggers our flag @reinicio
(occurred in codes 1, 2, 14 and 15 of the example). Consequently, the @contagem
on these lines restarted from 1, and was increasing, until the next restart. The @codigo_a
, in turn, was keeping the initial code of that sequence, in all lines. In the next step, we will see the reasons for the @codigo_a
exist.
The source of this data is not the same as the question, I used a larger set with more variation, but no decimal numbers, to make it easier to read.
A possible alternative to this technique would be to turn the restart flag into a counter, which would increment with each new sequence. Making a basic comparison between the two possibilities, both seem to me comparable in complexity.
Part 2
This is our select intermediary (R2):
SELECT
codigo_a,
MAX(codigo) AS codigo_b
FROM (
... query 1 ...
) AS r1
WHERE contagem >= 8
GROUP BY codigo_a
The function of this select is to organize our previous result into something usable for our most external select, doing the following:
1. We start by making a "select of select", for the purpose of grouping previous results into only one row per sequence of equal coordinates.
2. We use the WHERE clause to select only the sequences whose grouping @contagem
of the previous step is equal to or greater than 8, which reduces our previous data to the following:
table 2
Codigo B, Combinado, Reinicio, Contagem, Codigo A
9, '-29x-63', 0, 8, 2
10, '-29x-63', 0, 9, 2
11, '-29x-63', 0, 10, 2
12, '-29x-63', 0, 11, 2
13, '-29x-63', 0, 12, 2
22, '-29x-63', 0, 8, 15
23, '-29x-63', 0, 9, 15
24, '-29x-63', 0, 10, 15
25...
3. However, as we group with GROUP BY codigo_a
, will be returned only ONE row of each sequence (and thanks to step 2, this sequence will be sure to 8 or more coordinates).
4. But which line do we want? Like @codigo_a
will always contain the beginning of each sequence, we need the line that contains the greatest code of said sequence, which is what defines where it ends. That’s exactly what makes the MAX(codigo)
, and to facilitate reading, we call it codigo_b
. The final result of this select is:
table 3
Codigo B, Combinado, Reinicio, Contagem, Codigo A
13, '-29x-63', 0, 12, 2
24, '-29x-63', 0, 10, 15
... sequencias seguintes ...
Conclusion of this step: The first sequence runs from code 2 (extracted from codigo_a
) code 13 (taken from codigo_b
), the second goes from code 15 to code 24 and so on. So we get a line for each sequence that interests us. Each line of these defines the starting and ending code of the sequence.
Step 3 and final
As we want not only the groupings, but all the data of each row, comes the third select (R3) that crosses all coordinates with the species of "remissive Intel" that we created in step 2.
SELECT * FROM coordenada AS r3 INNER JOIN (
... query 2 ...
) AS r2
ON r3.codigo >= codigo_a AND r3.codigo <= codigo_b
ORDER BY codigo
This select is simply the following:
We start with a select traditional, taking all data from each coordinate with *
.
To cross-reference this data with what we obtained in table 3 (select R2), we will use a Inner Join, whose condition is simply this: for each sequence listed in the previous step, we want all results whose code is between codigo_a
and codigo_b
(inclusive).
Finally, we order to codigo
, to maintain consistency with the rest we have done so far.
IMPORTANT: For this query to work more than once, it is essential to execute a SET @anterior = '';
to restart our more internal select logic, which uses user variables.
I hope I helped with the contents. When I read the question, I had the impression that I could not solve only with Mysql, but I made sure to learn a little more about DB to be able to answer (and it was a nice series of frustrated attempts to get to that).
This way you do not guarantee that the same lines are followed as requested in the question.
– user4552
@Osvaldo I believe it is a logic error, since the records are exactly the same and if you need to know how many records there are, just add a Count.
– Filipe Moraes
Note that there is a code field (auto-increment) that differentiates each line and, according to the author of the question, the lines are ordered by this code. Its SQL expression returns all that have more than seven repetitions, regardless of whether the repetitions are in sequence. If it were in Postgresql the use of Window Function would solve the problem.
– user4552