Although without specific details, it is possible to understand the problem: one or more tables in your database is spending most of the time blocked (locked).
Next question: what makes a table lock?
Many insertions
A first possibility is that there are so many inserts that no one else can access the table.
In this case, the solution could be to use a table type that accepts competing inserts.
However, this is hardly the case, because there should be a massive and constant amount of insertions to cause timeouts.
50,000 records isn’t much, because it’s only about 35 records a minute, or 70 every two minutes. on a reasonable server, such operation shall not take more than half a second.
Transactions
The second possibility is that the code that inserts the data opens a transaction that blocks other access to the database.
Transactions bring security, ensuring consistency, but are common cause of performance problems, especially when poorly implemented.
Imagine the following pseudo-code:
rotina_mal_feita_que_insere_dados() {
iniciaTransacao()
dados = carregaDadosDoWebService()
validaEProcessaDadosDoWebService(dados)
foreach (dados -> dado)
insereNoBanco(dado)
terminaTransacao()
}
Let’s think about it. The code above makes everyone wait for the return of web service and the processing of data. This is the same as a restaurant manager who only allows himself to write down the next order after the previous one has already been served to the customer, that is, he is not letting the employees carry out their work in parallel.
Something a little better would be like this:
rotina_melhorada_que_insere_dados() {
dados = carregaDadosDoWebService()
validaEProcessaDadosDoWebService(dados)
iniciaTransacao()
foreach (dados -> dado)
insereNoBanco(dado)
terminaTransacao()
}
In the above example, we continue to ensure database consistency, but greatly limit the scope of the transaction so that it lasts much less and does not block the rest of the system during web service access, which is usually somewhat time-consuming.
However, there are times when many records are returned at once, it may be better to still process block insertion. For example:
rotina_otimizada_que_insere_dados() {
dados = carregaDadosDoWebService()
validaEProcessaDadosDoWebService(dados)
iniciaTransacao()
inseridos = 0
foreach (dados -> dado) {
insereNoBanco(dado)
inseridos++
if (inseridos > 100) {
inseridos = 0
terminaTransacao()
iniciaTransacao()
}
}
terminaTransacao()
}
In the example above, the routine does the commit transaction every 100 records entered and starts a new transaction. This gives chance to other blocked routines waiting their turn.
Monitor the database
Anyway, there is not much secret. You must determine who is blocking your tables.
According to my suggestions, you can check if there is a large amount of data coming in at once or if there is a bad implementation on the part of the routine that reads from the web service and writes to the database.
If the problem is not so obvious or easy to test, the only right way is to use a tool to monitor your database and determine what is running at the time of timeouts.
One possibility is to use the command SHOW PROCESSLIST
or some more specific tool, such as this.
Maybe that’s not it at all
Maybe our assumptions are all wrong. There is still the possibility of timeouts occur simply because the queries made to the bank in AJAX requests are very inefficient.
If the table has too much data, such queries need to necessarily filter the data with clauses WHERE
using indices and limit large results to a crease of at most a few dozen records.
On the other hand, if by chance your table already has a few million records and the code is:
- Making a
where
in a field that has no index;
- Using a
like
with percentage at the beginning of the text (campo like %texto%
)
- Processing a result with many records in memory
- Performing any other operation that causes a table scan, that is, scan the entire table
- Accessing database multiple times instead of using joins
- Or anything that makes the query slow independent of the inserts
So the problem is obviously quite different.
To determine whether this may be happening, pause the schedule for a short period and check how long it takes for these routines to run.
You can also run queries executed directly in your database and check their performance.
Finally, it is important to understand that even a query or writing that takes a few seconds, this may cause the whole system to lock or crash.
Imagine a database operation that takes 10 seconds. It’s bad, but it doesn’t seem at all absurd if you think about it alone. However, imagine that you have 10 users accessing the system at any given time and all end up running the same operation in some way. Now you have a queue where the last user will be served after 100 seconds and any subsequent access will have to wait in the queue as well. Join this with that user who clicks on reload again every 10 seconds and your system may take a long time to recover or simply crash.
Note
If you identify the cause of the problem, please leave a comment here. It will help a lot of people in the future.
There is a world of things you can improve, impossible to know how to help without having more data... Improve the querys of cron routines, improve pool, improve php querys and so on... You have to have more information.
– Ricardo
The problems that may be causing this can be diverse, detecting them knowing only the basic structure of their application is unlikely. Run some tests on the app, find some possible bottlenecks and share the information to try to help.
– Pedro Erick
Thanks, I did not detail the complexity, but I could add more information to the description of the problem.
– Rolland