How do a select pick up 8 lines in a row or more where there are common values?

Question

How do a select pick up 8 lines in a row or more where there are common values?

Asked 11 years, 4 months ago

Viewed 4,703 times

6

I have a table coordenada that has the columns codigo, latitude, longitude, data and would like to give a select which selected only if it had 8 lines in a row or more where latitude and longitude are equal to the previous line.

The column codigo is a auto_increment and the column data is in order of insertion, as well as codigo.

Example:

Linha 1: latitude = -30, longitude = -60
Linha 2: latitude = -29, longitude = -61
Linha 3: latitude = -29, longitude = -61
Linha 4: latitude = -29, longitude = -61
Linha 5: latitude = -29, longitude = -61
Linha 6: latitude = -29, longitude = -61
Linha 7: latitude = -29, longitude = -61
Linha 8: latitude = -29, longitude = -61
Linha 9: latitude = -29, longitude = -61
Linha 10:, latitude = -30, longitude = -60

Between 2 and 9 there are 8 lines equal or more, so take the lines.

Does anyone know this? Mysql and Phpmyadmin usage in XAMPP.

3 answers

9

Here is the complete solution:

To query the following was the fruit of a number of considerations, and meets the following requirements:

Returns results sorted by field codigo
does not group repeated results from different sequences, i.e.: if a sequence is interrupted, even if the same coordinate appears again in the future, a new sequence will be considered;
returns only sequences that contain 8 or more repeated coordinates and in sequence ;
returns all rows containing the coordinates that meet the requirement, not just the grouped summary.

This is the query ready-made:

SELECT * FROM coordenada AS r3 INNER JOIN (
   SELECT
      codigo_a,
      MAX(codigo) AS codigo_b
   FROM (
      SELECT
         codigo,
         @combinado:=CONCAT(latitude,'x',longitude),
         @reinicio := ( @combinado != @anterior) AS reinicio,
         @anterior := @combinado AS anterior,
         @contagem := IF(@reinicio, 1, @contagem+1 ) AS contagem,
         @codigo_a := IF(@reinicio, codigo, @codigo_a) AS codigo_a,
         `data`
      FROM coordenada
      ORDER BY codigo
   ) AS r1
   WHERE contagem >= 8
   GROUP BY codigo_a
) AS r2
ON r3.codigo >= codigo_a AND r3.codigo <= codigo_b
ORDER BY codigo

Detailed explanation:

Part 1

Let’s start with the select innermost (R3):

SELECT
   codigo,
   @combinado:=CONCAT(latitude,'x',longitude),
   @reinicio := ( @combinado != @anterior) AS reinicio,
   @anterior := @combinado AS anterior,
   @contagem := IF(@reinicio, 1, @contagem+1 ) AS contagem,
   @codigo_a := IF(@reinicio, codigo, @codigo_a) AS codigo_a,
   `data`
FROM coordenada
ORDER BY codigo

The logic is as follows:

This select makes an initial step in the data, and we start by creating a user variable in the "lat x long" format, to facilitate testing of the following conditions.
Then we create the flag @reinicio, which is set if the current row coordinate is different from the previous row coordinate (i.e., beginning of a new sequence).
If they’re different, we’ll reset the @contagem from number 1. If they are equal, increment the count value.
If so, a new sequence, we store the codigo initial of that sequence in @codigo_a. If we are in the same sequence, we carry forward the @codigo_a, maintaining its value. In other words, the @codigo_a will always keep the code initial of the sequence in which we are, in all lines of this sequence.

The result will look like this:

_{table 1}

Codigo, Combinado, Reinicio, Contagem, Codigo A
     1, '-12x-63',        1,        1,        1
     2, '-29x-63',        1,        1,        2
     3, '-29x-63',        0,        2,        2
     4, '-29x-63',        0,        3,        2
     5, '-29x-63',        0,        4,        2
     6, '-29x-63',        0,        5,        2
     7, '-29x-63',        0,        6,        2
     8, '-29x-63',        0,        7,        2
     9, '-29x-63',        0,        8,        2
    10, '-29x-63',        0,        9,        2
    11, '-29x-63',        0,       10,        2
    12, '-29x-63',        0,       11,        2
    13, '-29x-63',        0,       12,        2
    14, '-40x32' ,        1,        1,       14
    15, '-29x-63',        1,        1,       15
    16, '-29x-63',        0,        2,       15
    17, '-29x-63',        0,        3,       15                        
    18, '-29x-63',        0,        4,       15                        
    19, '-29x-63',        0,        5,       15                        
    20, '-29x-63',        0,        6,       15                        
    21, '-29x-63',        0,        7,       15                        
    22, '-29x-63',        0,        8,       15                        
    23, '-29x-63',        0,        9,       15                        
    24, '-29x-63',        0,       10,       15                        
    25 .....

Note the following: every time the coordinate changes, it triggers our flag @reinicio (occurred in codes 1, 2, 14 and 15 of the example). Consequently, the @contagem on these lines restarted from 1, and was increasing, until the next restart. The @codigo_a, in turn, was keeping the initial code of that sequence, in all lines. In the next step, we will see the reasons for the @codigo_a exist.

The source of this data is not the same as the question, I used a larger set with more variation, but no decimal numbers, to make it easier to read.

_{A possible alternative to this technique would be to turn the restart flag into a counter, which would increment with each new sequence. Making a basic comparison between the two possibilities, both seem to me comparable in complexity.}

Part 2

This is our select intermediary (R2):

SELECT
   codigo_a,
   MAX(codigo) AS codigo_b
FROM (
   ... query 1 ...
) AS r1
   WHERE contagem >= 8
   GROUP BY codigo_a

The function of this select is to organize our previous result into something usable for our most external select, doing the following:

1. We start by making a "select of select", for the purpose of grouping previous results into only one row per sequence of equal coordinates.

2. We use the WHERE clause to select only the sequences whose grouping @contagem of the previous step is equal to or greater than 8, which reduces our previous data to the following:

_{table 2}

Codigo B, Combinado, Reinicio, Contagem, Codigo A
       9, '-29x-63',        0,        8,        2
      10, '-29x-63',        0,        9,        2
      11, '-29x-63',        0,       10,        2
      12, '-29x-63',        0,       11,        2
      13, '-29x-63',        0,       12,        2
      22, '-29x-63',        0,        8,       15                        
      23, '-29x-63',        0,        9,       15                        
      24, '-29x-63',        0,       10,       15                        
      25...

3. However, as we group with GROUP BY codigo_a, will be returned only ONE row of each sequence (and thanks to step 2, this sequence will be sure to 8 or more coordinates).

4. But which line do we want? Like @codigo_a will always contain the beginning of each sequence, we need the line that contains the greatest code of said sequence, which is what defines where it ends. That’s exactly what makes the MAX(codigo), and to facilitate reading, we call it codigo_b. The final result of this select is:

_{table 3}

Codigo B, Combinado, Reinicio, Contagem, Codigo A
      13, '-29x-63',        0,       12,        2
      24, '-29x-63',        0,       10,       15                        
      ... sequencias seguintes ...

Conclusion of this step: The first sequence runs from code 2 (extracted from codigo_a) code 13 (taken from codigo_b), the second goes from code 15 to code 24 and so on. So we get a line for each sequence that interests us. Each line of these defines the starting and ending code of the sequence.

Step 3 and final

As we want not only the groupings, but all the data of each row, comes the third select (R3) that crosses all coordinates with the species of "remissive Intel" that we created in step 2.

SELECT * FROM coordenada AS r3 INNER JOIN (
   ... query 2 ...
) AS r2
ON r3.codigo >= codigo_a AND r3.codigo <= codigo_b
ORDER BY codigo

This select is simply the following:

We start with a select traditional, taking all data from each coordinate with *.
To cross-reference this data with what we obtained in table 3 (select R2), we will use a Inner Join, whose condition is simply this: for each sequence listed in the previous step, we want all results whose code is between codigo_a and codigo_b (inclusive).
Finally, we order to codigo, to maintain consistency with the rest we have done so far.

IMPORTANT: For this query to work more than once, it is essential to execute a SET @anterior = ''; to restart our more internal select logic, which uses user variables.

_{I hope I helped with the contents. When I read the question, I had the impression that I could not solve only with Mysql, but I made sure to learn a little more about DB to be able to answer (and it was a nice series of frustrated attempts to get to that).}

Browser other questions tagged mysql sql

You are not signed in. Login or sign up in order to post.

by Filipe Moraes • **8,737** points · Answer 1 · 2014-03-27T14:24:29+00:00

0

If the lines are equal, make a GROUP BY as follows:

SELECT latitude, longitude
FROM tabela
GROUP BY CONCAT(latitude,longitude)
HAVING COUNT(CONCAT(latitude,longitude)) > 7

This way you do not guarantee that the same lines are followed as requested in the question.

– user4552

2014/03/27 at 17:52
@Osvaldo I believe it is a logic error, since the records are exactly the same and if you need to know how many records there are, just add a Count.

– Filipe Moraes

2014/03/27 at 18:06
Note that there is a code field (auto-increment) that differentiates each line and, according to the author of the question, the lines are ordered by this code. Its SQL expression returns all that have more than seven repetitions, regardless of whether the repetitions are in sequence. If it were in Postgresql the use of Window Function would solve the problem.

– user4552

2014/03/27 at 18:19

by ceinmart • **860** points · Answer 2 · 2014-03-27T19:31:18+00:00

This code is ugly but simple and functional under the data passed as example.
Uses variables to control repeat record Qtde.

I didn’t try to vary the data, but I believe I should meet your need, perhaps with some adaptations... (maybe first generate the subquery in temporaria and then do the group by / having on the temporaria to facilitate the recovery of the lines with all the details, based on the column "Indice")

To see this example on SQL Fiddle, click here

drop table if exists tst1; 

CREATE temporary TABLE tst1
    (`linha` int, `lat` int, `lng` int)
;

INSERT INTO tst1
    (`linha`, `lat`, `lng`)
VALUES
    (1, -30, -60),
    (2, -29, -61),
    (3, -29, -61),
    (4, -29, -61),
    (5, -29, -61),
    (6, -29, -61),
    (7, -29, -61),
    (8, -29, -61),
    (9, -29, -61),
    (10, -30, -60)
;

set @ant = 0;
set @x=0;

select lat, lng, max(indice), count(indice)
from (
select linha, lat, lng
     , @ant as ant
     , (case when lat != @ant then @x:=@x+1 else @x end ) as indice
     , (@ant := lat) as ignorar
from tst1 
) as x
group by 1,2
having count(indice) >= 8
;

The final result is:

lat lng max(indice) count(indice)
-29 -61 2   8