What’s wrong with the N+1?

Question

What’s wrong with the N+1?

Asked 7 years, 1 month ago

Viewed 5,722 times

23

Whenever we work with some ORM, it is common to fall into the problem of darlings N+1. It’s something about performance, called even antipattern.

But what is really this problem, why it happens, what its main causes and how, in theory, to solve them?

I’ve also heard that to solve, just practice Eager loading. But to what extent it is beneficial and able to solve this problem?

4

You make a query to get a list. Then for each element of this list, you make 1 query. That is, N+1 queries. When you release me from a few merge revisions

– Jefferson Quesado

2018/06/15 at 13:10
4

Gavin King, creator of Hibernate once said on the blog that when he is hired to consult/consult on ORM, etc often has to say "In this case we do not use ORM, use a query". ORM is great hand does not serve p/solve 100% of cases. Once upon a time there was a report that was to show only the total number of appointments in the month made by a network of clinics. The ORM had to carry the clinics, the doctors, the patients, the appointments... ended up trying to load all the records of all the tables and after 20 min gave timeout. A query did the same thing in a split second

– jean

2018/06/18 at 19:47

2 answers

23

The problem is not unique to Orms, although many think it is because it is a common problem, but not inherent to them. And they don’t believe in people’s ability to do wrong manually :).

In the ORM appears more because a naive implementation will force the problem to occur always.

Alias is not only a problem of the ORM itself, but of modeling objects with related data. Either you use a database with a non-relational model, which has its problems there, or you adopt the relational model in the application.

Still you can do something when you mix the two models.

The problem is common when it has an object and other related N, hence the name N + 1 which is the "father" of these N. The problem becomes clear when the query picks up the main data, we will say it is a fiscal note, and then it will pick up the lines of items that make up the note. Picking up data individually in the database can become a very high cost, even more so in the poorly thought out architecture that many people do (some out of necessity).

Generally at least 1 + 1 is required, which is a failure of communication of relational models (not of the model, but of the way with them communicate in the current implementations, what I consider a mistake, and instead of fixing this, they created another worse model, is the history of our area, solve a problem with another problem, but nothing other problem can not solve this too).

By the way, that’s why even some people like to use a Nosql DB like proxy relational. Again, the complexity of the solution increases because the tool has problems easy to solve, but nobody does it.

But in this case if the size is big it won’t be a big problem.

The problem of Eager loading is that you can bring information that you won’t even use. But it depends a lot on the problem, there are cases that this is rare to happen, there are others that even if it happens does not even tickle, and in many cases the fact of coming more than it needs generates a overhead so small that a simple extra consultation will already be worse, ie a 1 + 2 can be worse. Imagine reading a single invoice and it bring all the lines of all tax bills to avoid the N+1, total waste.

This is the problem of automated solutions or programmers who do not understand what they are doing and adopt solutions automatically. The real solution is to understand what will happen in that case and decide what is most interesting. Even manually it is complicated to meet all cases, depends on the query. The ORM may have a mechanism that tries to "guess" which is the best strategy.

In many cases there is a lot of repetition of information by the way it was agreed to work with tabular data, usually it is used the JOIN.

Most of the time bringing it all at once is usually more interesting than bringing it one by one.

For lack of a better solution would be something like:

SELECT * FROM Nf
SELECT * FROM NfItem

If you have 1000 notes and on average 10 items per note, there will be 11,000 lines in all, with 2 queries, a large and a huge.

As opposed to N+1:

SELECT * FROM Nf
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
SELECT * FROM NfItem WHERE NfNumero == Nf.Numero
.
.
.
Tantos quantos forem a quantidade de notas fiscais existentes.

I put in the Github for future reference.

Here you will also have 11 thousand lines, but with 1000 small queries and 1 large.

The code is very abstract, just to illustrate.

Try frying one potato fillet at a time and a lot of fillet at once. The first ends quickly individually, but the whole is tragic, the second takes longer, but when it’s over it’s all ready. It’s only a problem if you find out you’ve only sold three fillets, and fried the package.

6

I know a guy who sells individually fried potato fillets in packs of 50. He runs around in a Fiat 147.

– bfavaretto

2018/06/15 at 16:14
1

@bfavaretto and that’s right? Or just works?

– Maniero

2018/06/15 at 17:44
2

It only works. And look there. Most potatoes already come cold at the time of eating.

– bfavaretto

2018/06/15 at 18:41
1

I’ve seen frameworks solve like this: SELECT * FROM table, SELECT * FROM related_table WHERE table_id IN(?, ?, ?, ?)

– Wallace Maxters

2018/07/06 at 12:47

Browser other questions tagged sql database terminology performance orm

You are not signed in. Login or sign up in order to post.

by Dherik • **10,372** points · Answer 1 · 2018-06-15T16:33:53+00:00

But what’s really this problem

The best way to explain this problem is with an example.

Imagine you have a table Pessoa and a table Endereco. Each person has several addresses, consolidating a list of one for many (1-N).

And now you want to get the addresses of several people. Normally, we see the following query using the ORM of your preference (I will use JPQL notation from JPA):

public List<Pessoa> consultarPessoas() {
    String jpql = "select * from Pessoa";
    return em.createQuery(jpql).getResultList();
}

And then you iterate for each Person to pick up their addresses:

List<Pessoa> pessoas = consultarPessoas():
for (Pessoa pessoa : pessoas) {
    List<Endereco> enderecos = pessoa.getEnderecos();
}

Imagining a LAZY between person and addresses, we will have the following SQL for each person when calling the method pessoa.getEnderecos():

SELECT * from Endereco where pessoa_id = :id;

, why does it happen,

The problem is because to get people’s addresses you first pick up the person and then search the addresses of each. Imagining that the previous query returned us 5 people, the amount of Sqls generated will be something like this:

 SELECT * from pessoa
 SELECT * from endereco where pessoa_id = 1;
 SELECT * from endereco where pessoa_id = 2;
 SELECT * from endereco where pessoa_id = 3;
 SELECT * from endereco where pessoa_id = 4;
 SELECT * from endereco where pessoa_id = 5;

That is to say, 1 select of person with N select for addresses, the famous N + 1 .

what its main causes

It is usually caused by improper use of Orms. You have to understand what the ORM does behind the scenes. Although they are there to make our lives easier, they need to be used wisely. Because they are very permissive in general, unexpected results can be caused in the misuse of the tool.

and how, in theory, to solve them?

In the example I gave earlier, his goal was to get the addresses of several people. If they are the addresses of all the people in the database, you just need to do:

 SELECT * from Endereco

But if you want to apply a filter to bring those 5 people, this can be done by avoiding those various queries with a different JPQL, starting from the Address table as well:

 SELECT * from Endereco where pessoa_id IN (1,2,3,4,5);

I have also heard that to solve, just practice the Ager loading. But to what extent it is beneficial and able to solve this problem?

The EAGER loading is an alternative, because the generated SQL would be something like this:

select p.id, p.nome, end.id, end.rua, end.pessoa_id from pessoa p JOIN endereco end ON end.pessoa_id = p.id

However, the EAGER is a problem if added between the Person and Address relationship in your ORM, as every time that you seek a person, the addresses will also come together. Believe me, you do not want this as standard behavior of your system. The main performance problems I saw in applications involving the use of some ORM were caused by this.

Some Orms have the alternative of using the FETCH optionally in a query, so you can "turn on" the EAGER whenever you want. In JPQL, it would look like this:

SELECT * from pessoa JOIN FETCH pessoa.enderecos

Resulting in the same SQL I quoted earlier.

But there is a serious limiting with FETCH and EAGER if you try to apply some kind of paging in the query. Using EAGER or FETCH this is not possible to be applied in the generated SQL itself, and to have a query with this same effect (bring people and addresses in the same query) you will need to call a native query and/or use native functions (such as DENSE_RANK) of the database.