Best approach is to filter data in the database or application?

Asked

Viewed 829 times

6

Suppose we have a table 'Vendas' with the following columns 'estado', 'cidade', 'vendedor', 'valor'.

'estado' and the 'cidade' would be where the sale was held and 'vendedor' would be what made this sale. 'valor' would be the value of the sale.

What’s the best, cheapest, smartest approach to accumulating these values by 'estado' or 'cidade' or 'vendedor'?

Make 3 selects:

sql = "SELECT estado, sum(valor) 
       FROM vendas
       GROUP BY estado";

sql = "SELECT cidade, sum(valor) 
       FROM vendas
       GROUP BY cidade";

sql = "SELECT vendedor, sum(valor) 
       FROM vendas
       GROUP BY vendedor";

or

Make 1 select only and do the calculations with the programming language

sql = "SELECT * FROM vendas";

Leave it up to the programming language (in this case I work with Java) to do these calculations, and the programmer’s manual work would be heavier?

  • 1

    If you think that the sales table can contain millions of records, it is clear that making the 3 selects is "cheaper".

  • Thanks for the reply, it was as I imagined. Now! Please. I changed the scenario a little and I made a comment in the reply below. If possible, could you comment there as well?

  • @Henriquesantiago: Contact the database whenever possible. Generally sgbd generates optimized execution plan to filter, especially if there is index that meets the query. // In addition, it decreases network traffic. //

  • @Henriquesantiago: In some database managers it is possible to make the 3 queries in a single reading of the data, using GROUPING SETS. . The result returns as a single data set.

  • sorry but did not understand, it is a question with answer "predefined" and chosen, only to give the reward?

  • @Augustovasques doesn’t make sense to open a reward for it when you see... the goal of a reward is to encourage a better response, so you get an extra reputation... open a Bunty and write "look it’s not for pariticpar pq this bouty is for so-and-so’s answer" I don’t think it makes sense from Bounty’s point of view. If you want to reward someone, leave it there and choose the person’s answer (in fact it was already chosen), but leaving the text like this does not seem right "with the OS community". A question is a question, anyone can answer, it’s not a private site

  • @Augustovasques if you put your mouse on the icon that identifies an answer as an accepted answer, you will see that I accepted the answer in July 2017. This feature is Stackoverflow, even a question already with accepted answer is likely to receive more answers.

  • @Henriquesantiago It is not about your acceptance, we are discussing the reward. Rest assured.

Show 3 more comments

4 answers

16


Let the database work for you.

Not that this is an absolute truth. But bringing a lot of data to the filtering application is almost always a mistake. I say it almost because there can always be a scenario that I don’t know can be useful or the only viable way. But I can remember one like that in my head.

Bringing in too much data costs the database, infrastructure and application, not to mention that you’ll have an extra code to deal with.

This case is clearly best to use full SQL and bring only what you need. See if you can make the three queries in one query only. Probably can’t, because the most used technologies hinder this, but it depends a little on how everything is organized.

There may be some very specific and odd case that it would be better to bring everything, even then it would be little gain and I find something so out of the curve that I doubt it would actually happen.

Of course you need to architecture the application to work well, you need to see if the database is properly configured to support this query adequately.

There are controversies as to which limit should be passed to the database. Many scalability problems occur because the developer overloads the database with things that would look better in the application. There will always be the discussion on the use of Stored Procedures or not.

I say and I repeat, many of the techniques and tools that exist today were created to solve a problem created by another technique that was unnecessary or worse than simple. Often people ignore the Razor of Occam. Of course there will always be the argument that the more complex is doing something more. There are cases that are true, but there are cases that are just complicated by not evaluating the whole, or by liking it, or just knowing how to do it.

The "good" part is that it created a huge industry because of so much new problem. We look like hamsters on the wheel.

Hamster na rodinha

Related: Why using "SELECT * FROM table" is bad?.

  • Thanks for the reply. It was as I imagined.

  • Changing the scenery a bit. "SELECT * FROM vendas" because I will feed a table with these records and also generate three charts powered by those other 3 selects I’ve mentioned. In this case, it is still better to leave it to SQL?

  • 1

    So I think you can start to compensate by bringing this in and then generating the others in the application. But there is one thing. If it’s small, it’ll make little difference, if it’s big, you might have a problem bringing it all in. You’d need to see if it compensates for the extra code effort to manipulate this. But you need to know how to do it, because not having an index at your service can make this filter be much slower than asking again for the database. That’s what I said at the beginning of the question, if the scenario changes a little bit, you can’t answer it so clearly. The basic rule is to do all scenarios and measure.

  • What would be 'small' for you? Today, the largest select in this scenario where I am working is bringing a maximum of 500 records.

  • 1

    You can’t talk about numbers, you have to measure.

-4

I think that using both SELECT * FROM sales and the 3 SELECT’s very strange, it might be better to make a table that organizes at least one column, like the seller’s name, for example. Then I’d be like this:

sql = "SELECT * FROM vendas
       GROUP BY vendedor";

the "seller" up there can be replaced by city or state

  • 2

    Note an aggregate in the query sum(valor). Aware of this see that the question is about whether it is more advantageous to consult all the data and then add them within the programming language or else add within the database and the language consult only the results.

-5

I’m sorry I disagree with your friend Maniero as to "Let the database work for you...".

In my view, decoupling the application of the technology adopted in the database is always the best strategy, as it delivers portability and facilitates migration between database versions. In addition, a programmer who "lets the bank work" must also be a bank AD or DBA, otherwise he or she will be misusing the BD and the "spell can turn against the wizard". If the application is Java, whose JPA standards are already much more than mature, it is much better to hand over the responsibility for building the 'selects' to Querybuilder and let it take into account all the details such as CHARSET and Dialect.

  • 7

    Dear Rogério, I honestly think I do not understand your answer, and I think no one will understand, you say something in your answer q seems to be the same as you are disagreeing with the other answer, it seems a paradox. Personally I think two things about this 1: you missed the context of the first sentence of Cool’s answer and maybe you didn’t understand, making a mess of your own answer - 2: your answer seems q was based more on a specific environment/tools that you know and not in fact on the problem addressed in Henrique’s question.

  • excuse me if I was unclear. I will try to improve next time.

  • The summary of what I tried to say is: "always put the business rule in the application and use the database only for storage".

  • 2

    Now it seems that this speaking exactly contrary to the answer, business rules only in the application, is like saying that Sgdbs would not need Ifs, a lot of things nor is business rule, using a ORM for example would be in the application but would be passed to query, and anyway the query is written in the application, but is passed on as a command to the bank through the connection, I think better read Maniero’s response and try to rethink, because it still seems to me that you are saying things your way, environment and tools and not technical things actually.

-5

Each case is a case.

I believe you are saying that, in the interface, the user will be able to select the view of a sales report, with option by city, state or seller.

The past example uses SQL ANSI, which runs on which database you want. It makes no sense to use the programming language (java) with a loop to sum it up. You can use Hibernate to abstract SQL, or not.

A solution and use a selection structure (if), and according to the option selected in the interface runs the corresponding SQL, presenting the list as the option chosen.

  • 1

    It seems to me that a well designed bank and by someone experienced will always lead to the solve side in the bank itself, only (or usually) the other way around, when nay is well done that will be the case to use "each case as a case", then say to gambiarras so on banks from which you nay has "control" or is bequest and generally botched, your answer would be valid, now if we speak of what is well done, planned and by experienced people (on both sides, who designed the bank and who will use it) then this "case by case" would be wrong.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.