How to find "holes" in SQL Server tables?

Asked

Viewed 1,110 times

3

I have a table with a column id (primary key, auto increment value 1).

My application nay allows lines to be excluded, hence the expected query

SELECT id FROM tbl ORDER BY id ASC

that would be it:

id
---
1
2
3
4
5
6
7
...

However, some rows were deleted for some reason, either because my user manipulated the table out of my application or the database is corrupted. Why those lines are missing doesn’t interest me at the moment.

Running the same query from the beginning, my result is not what I hope:

id
---
1
2
3
6
10
12
8870
...

How can I write a query to find these spaces? I need to find where these gaps start, in the case of the above result, I need to extract something like:

id
---
3

For it was from the id 3 where spaces begin.

  • 1

    Which database are using?

  • SQL Server. I changed the title.

  • Just out of curiosity, what would be the reason for that?

  • 1

    You can attach the DDL of this table?

  • In the logic of my application, the user cannot delete lines, only inactivate them. Some users have deleted lines manually. Another reason would be to identify some banks that are corrupted, that have these gaps for some reason.

  • Just do not delete table values

  • Does this refer to sorting a SELECT? Is it in a query? Or is it saying that it is appearing like this in some manager software? Can you explain exactly where you’re showing up like this?

  • Does not refer to ordering. Missing lines have been deleted or the database has been corrupted.

  • @vnbrs is hard to understand, in question you added ..., I assumed there were more things, now if it gets corrupted then the problem is quite different. This is when you do a SELECT, right? So just try this: SELECT id FROM minha_tabela ORDER BY id; and say whether "loser" appears or not.

  • I exemplified in the question. I have tables with more than 100 thousand records, it is difficult to identify in the eye :P

  • So what you mean is that the "holes do exist" in fact, and you just want to identify which Ids are left without registration? Or do you want to know why these Ids supposedly disappeared?

  • I want to know what ID the gap started. The question is not why these Ids have disappeared.

  • @vnbrs yes, now it has begun to clear, is that you said, "should not", it leads to a totally different understanding. By your last comment what you want to know is in which the "first" ID where began these "holes", at least that’s what led to understand.

  • 1

    Just call the table "asphalt".

Show 9 more comments

3 answers

2


Finding the intervals

Follow two efficient solutions, obtained in the article Solving Gaps and Islands with Enhanced Window Functions.

The first solution caters to various versions of SQL Server.

-- código #1
SELECT col1 + 1 AS rangestart, 
       (SELECT MIN(B.col1)   
          FROM dbo.T1 AS B
          WHERE B.col1 > A.col1) - 1 AS rangeend 
  FROM dbo.T1 AS A
  WHERE NOT EXISTS (SELECT * 
                      FROM dbo.T1 AS B
                      WHERE B.col1 = A.col1 + 1)
        AND col1 < (SELECT MAX(col1) FROM dbo.T1);

In code #1, replace col1 the name of the column containing the numbering and T1 by table name.


There is another suggestion, even more efficient, that works from the 2012 version (inclusive) of SQL Server. Uses the window Function LEAD().

-- código #2
WITH C AS (
SELECT col1 AS cur, LEAD(col1) OVER(ORDER BY col1) AS nxt
  FROM dbo.T1
)
SELECT cur + 1 AS rangestart, nxt - 1 AS rangeend
  FROM C
  WHERE nxt - cur > 1;

In code #2, replace col1 the name of the column containing the numbering and T1 by table name.


What is the cause of the intervals?

There could be multiple motives. For small intervals it is necessary to look for the cause in the application and in the accesses made directly to the table by the users. For larger ranges (usually multiples of 1000), one possibility is that the cause is directly linked to how the IDENTITY is implemented in SQL Server. The documentation itself reads "SQL Server Might cache Identity values for performance reasons and some of the Assigned values can be Lost During a database Failure or server Restart. This can result in gaps in the Identity value upon Insert”.
Attention to the excerpt "This can result in gaps"!

As a solution, still in the same documentation is quoted that "If gaps are not acceptable then the application should use its Own Mechanism to generate key values". That is, it is a fact that IDENTITY is unreliable to generate consecutive numerical sequences without intervals.


Deepening on the theme gaps and Islands

For those interested in knowing more about the classic problem of gaps and Islands (ranges and islands), here are some selected articles:

  • in code #1, I took a 00907. 00000 - "Missing right parenthesis" on an Oracle that I used for testing...

  • @Dudaskank: The topic is about SQL Server and the suggestions are encoded in T-SQL. // For PL/SQL you should replace dbo.T1 by table name. // About the test of code #1 in Oracle Database, managed to verify at which point of the code occurs the error message?

  • Yes, Jose, it’s just that I found it interesting and wanted to see the output of the query in a table here, so much so that the code #2 did not risk rs. The error seems to point to the line of rangeend, and the complete message is: Complete error (would be in the rangeend line): ORA-00907: right parenthesis not found 00907. 00000 - "Missing right parenthesis" *Cause: *Action: Line error: 5 Column: 6

  • But I looked at the query and the parentheses are all there

  • 1

    @Dudaskank: The ORA-00937 error refers to the use of the aggregation function without the presence of the GROUP BY clause. See https://techonthenet.com/oracle/errors/ora00937.php // I have found topics in forums that cite a similar situation (use of subconsulta with aggregation function but no GROUP BY clause) and how to get around.

0

Another method, which works on any BD:

SELECT
  min(id)
FROM tabela
WHERE id+1 NOT IN
  (SELECT id FROM tabela
  )
;

If you need to know all idIf you don’t have the next one, just do it like this:

SELECT
  id
FROM tabela
WHERE id+1 NOT IN
  (SELECT id FROM tabela
  )
ORDER BY id;

Note that these queries only take from the smallest number of id up. If for example the id’s are 5, 6 and 8 in the table, it will show the value 6, not 0, since it does not have id 1.

0

You can use the clause NOT EXISTS:

SELECT t1.*
  FROM tabela t1
 WHERE NOT EXISTS(SELECT 1
                    FROM tabela t2
                   WHERE t2.id -1 = t1.id)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.