0
I need to read and save in the database two files the first (smallest) has about 49,000 lines (each line will be become a record). I parse this file by creating an instance for the object and then save that object to the database using Doctrine. I can save about 13,000 records, after which the script dies due to memory consumption.
Although the file is large having more than 100MB I don’t carry the whole file in memory, I do read it one line at a time. The complete process of reading the file, creating instance, parse and uploading all the other data for the file to work (with the exception of Doctrine) at no time consumes more than 30 MB (consumes less than 2.5% ram). Just when Doctrine runs that memory consumption increases absurdly.
I know it is possible to increase the amount of memory to be used by changing php.ini directives or using ini_set, but I would not like this solution. The object only needs to be written to the database and not to remain in memory.
I tried to recover the elements and remove from memory using detach:
private function clearBatch() {
$id = $this->last->getId() - 1000;
$query = $this->getDoctrine()->getEntityManager()->createQuery(sprintf('SELECT c FROM {MyObjet} c WHERE c.id >= \'%d\'', $id > 0 ? $id : 0));
$cache = $query->getResult();
foreach ($cache as $c) {
$this->getDoctrine()->getManager()->detach($c);
}
$query->free();
$this->getDoctrine()->getManager()->clear();
$this->getDoctrine()->resetManager();
gc_collect_cycles();
}
but the script always dies, I believe that Doctrine is every Insert storing it in memory, only that I just need to feed the database with that information. There’s no need to remember.
Every 1000 records I enter (persist) I run clearBatch
public function onProcess($c){
$this->getDoctrine()->getManager()->persist($c);
$this->count++;
$this->last = $c;
if ($this->count % 1000 == 0) {
$this->getDoctrine()->getManager()->flush();
$this->clearBatch();
}
}
I even tried to appeal by calling the Garbage Collector.
If I don’t store in persistence I can read all the records. And my memory consumption is constant (reaches that peak, right at the start of the script and remains until the end) 2.4% (is a Vmware only for testing, has 1GB of ram), the script dies when after reaching ~14%.
The function onProcess is called in the file read loop.
Isn’t there a way to take out of memory the instance used by Doctrine as a kind of free in the instances already recorded so that later the Garbage Collector can free this memory? Because really the file is large has more than 100MB, but if I generate the instances and call onProcess without using Doctrine it uses less than (2,4%) 30MB for the whole process, that is, despite it create the instances, it frees the space. Memory is not being overloaded by reading the file as I load line by line in memory and not the complete file.
– Sileno Brito
I was using clearBatch as you suggested, but I added the full example, because in other research I did suggested several things, I then posted everything I had ever done, based on suggestions from previous research. None of them worked.
– Sileno Brito
So "clear()" is responsible for making the "detach" of objects
– Adir Kuhn
tries to force gc after clear() using the gc_collect_cycles() function. You can try to adjust the batch size as well
– Adir Kuhn
I made the modifications, but the consumption remains the same.
– Sileno Brito
So I believe you don’t have much to do, I strongly recommend doing this import without Doctrine, as its own documentation says: An ORM tool is not primarily well-suited for mass Inserts, updates or deletions. Every RDBMS has its Own, Most effective way of dealing with such Operations and if the options outlined Below are not sufficient for your purposes we recommend you use the tools for your particular RDBMS for These Bulk Operations.
– Adir Kuhn
The problem is that the GC will only release the memory if it is scheduled to be released, but I believe that this is not happening, because Doctrine must be caching it and it will be in use. I stopped at this documentation too, for portability reasons I didn’t want to leave Doctrine.
– Sileno Brito
ta using only Doctrine or ta using some symfony2-like framework?
– Adir Kuhn
I’m using Doctrine with Symfony2
– Sileno Brito
I’ll change the answer see if it helps
– Adir Kuhn
I’m running the test, but for sure this should be the solution the peak memory utilization if fixed at 3.9% (~ 40MB) from the beginning, however I passed a wrong information, I went to see are not a little more 49.000 records, are a little more than 490.000 records. As soon as I’m done I’ll let you know.
– Sileno Brito
Change the text of your answer, because the code has a solution, but your answer leads to believe that it has no solution or that the solution is another.
– Sileno Brito
in any case if it does not work and you do not want to make a pure sql try to use only the DBAL, it will be faster than the Doctrine
– Adir Kuhn
It will work, is fixed consumption in 3,9% and have been entered more than 350.000 records.
– Sileno Brito
Worked, maximum consumption 3,9% (~40MB), entered 498,825 Records. If you want to edit the answer, let me know, because I will close the question.
– Sileno Brito