What is the best way to query SQL Server database using a list as input?

Asked

Viewed 691 times

1

I have a question as to how best to address the following problem: I need to query records in an SQL Server table using a list of serial numbers. I know the IN command, but it does not meet, because it is limited to receiving a maximum of 2100 data as a parameter and my list has much more than that.

I wonder if it’s better:

  • subdivide my list before the call and make several queries
  • pass the list only once and query record by record(using a for in java to scroll through the list)

What would be the best performing way? I will run my tests here on both options, but I imagine someone has an experience in this and can prevent me from making a mistake.

  • 2

    You really need 2000 parameters?

  • Yes, as it will depend on the filters provided on the request screen, which can vary greatly.

  • But will there be more than 2,000 options on the request screen? If with this request screen you mean a query, it is not very logical to put so many parameters for choice... In my view, it is logical.

  • No, it’ll be some combos, which will search these records in another database, which populate my list, so it depends on the value passed on the combos. I’m sorry if you got confused before, now I’ve been able to clarify?

2 answers

3


Let’s look at some possibilities, including the ones you mentioned.

Block Reading

Split items into blocks of equal size and execute queries using the IN until you read all the items.

  • Upside: read only the required data.
  • Downside: needs multiple queries (Qtd. items / block size)
  • Completion: is the best generic solution when you have no idea the amount of items to be searched and the total amount of data in the table.

Individual Reading

Read record by record.

  • Upside: I don’t see any.
  • Downside: greatly increases the number of consultations and the overhead processing. Although it may seem that the amount of data is the same, each query executed adds some processing, so if we make a difference between processing and data transmitted gross and net, the gross will be much higher than in the case of block reading.
  • Completion: feasible only if the number of items consulted is small.

Single Reading with Auxiliary Table

If somehow the data comes from the database, it would be easier to make a JOIN between tables or even use the IN followed by a SELECT. Example:

select * from tab1 where tab1.id in (select tab1_id from tab2)

If the data are not in the database it would still be possible to include them, for example, in a temporary table.

  • Upside: reads the required data at once.
  • Downside: may need to include the data in the database before consulting.
  • Completion: more appropriate if the consulted items are somehow available in the database. Otherwise, it would be interesting to make a comparison to know if the additional time to insert the items in a temporary table is greater or less than the additional time to query the block records using IN with various parameters.

Single Read from Entire Table

Read the entire table and log the records in Java.

  • Upside: a single consultation at the bank.
  • Downside: needs multiple queries (Qtd. items / block size)
  • Completion: interesting if the total amount of records of the table is not too large and also if not read a lot of data of each record, for example if the case is a read only of the id of 2000 out of a total of 4000 records.

Bonus

If performance is critical, a cache in memory with the indexed records in some easy-to-recover data structure (map, set, list) can be something to think about.

  • I am following the reading in blocks, because I really imagined to be the most viable in this case, because in this case it would not be appropriate to use an auxiliary table. Thank you very much.

  • @Wyllianneo If you will generate multiple clauses IN to use in the WHERE of each query ("block" method), you can also inform all these IN clauses in the WHERE of a single query separating them by OR: SELECT...WHERE CAMPO IN (VALOR 1, VALOR 2,...VALOR 2100) OR CAMPO IN (VALOR 2100, VALOR 2101,...VALOR 4200) OR CAMPO IN (....

  • @Caffé It’s a good idea.

  • @Caffé really, a good suggestion, only I will not use because I have already done the other and the change in the structure of my code would take a time that at least at the moment I do not have, but in a next situation I will use to see how it works. Thank you.

  • @Wyllianneo For nothing. Before migrating to this suggestion, make some performance measurements to observe the gain. Using OR.. IN instead of multiple queries will take at least the same time and will most likely be much faster.

0

Sounds like a block execution to me. Often we have to process many records in a routine and several developers reuse the business layer (java or c#) for this, but, experience shows that you will end up compensating in hardware the implementation or suffering to optimize code and bank trying to gain milliseconds by changing small snippets of code, when, in fact, much of the time spent in the routine is spent on network traffic, connection opening and batch execution management.

The ultimate solution to this does not exist, it will depend a lot on your long-term vision of the system and the size of the records and processing actions you perform, as well as the architecture of your system.

If you don’t want to drill through the layers and really want to ensure that the business rules are always in coding (c#, java,c++ etc.), then you will suffer a little to gain performance.

A simple alternative is to reevaluate whether you really need to pass the list of parameters that is the result of the filter, when you can simply pass the filter and operate the select in the database.

An advantage in this is that you will use only a few parameters circulating through the network (faster) and as you will not have to break the list, will only once in the bank.

Another alternative is to serialize in a string and pass the parameters you need, in my view, if you select this data and a filter, you can pass the filter values to a precedent is faster and easier.

Even, if you carry out a batch processing with the data that is passed (the many records) you could do this in the process that receives the parameters or create a form of scheduling based on filters to process the records in a job.

Avoiding Locking User Screen Expecting Transaction Responses.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.