How to optimize memory consumption using Doctrine

Asked

Viewed 591 times

0

I need to read and save in the database two files the first (smallest) has about 49,000 lines (each line will be become a record). I parse this file by creating an instance for the object and then save that object to the database using Doctrine. I can save about 13,000 records, after which the script dies due to memory consumption.

Although the file is large having more than 100MB I don’t carry the whole file in memory, I do read it one line at a time. The complete process of reading the file, creating instance, parse and uploading all the other data for the file to work (with the exception of Doctrine) at no time consumes more than 30 MB (consumes less than 2.5% ram). Just when Doctrine runs that memory consumption increases absurdly.

I know it is possible to increase the amount of memory to be used by changing php.ini directives or using ini_set, but I would not like this solution. The object only needs to be written to the database and not to remain in memory.

I tried to recover the elements and remove from memory using detach:

private function clearBatch() {
        $id = $this->last->getId() - 1000;

        $query = $this->getDoctrine()->getEntityManager()->createQuery(sprintf('SELECT c FROM {MyObjet} c WHERE c.id >= \'%d\'', $id > 0 ? $id : 0));
        $cache = $query->getResult();
        foreach ($cache as $c) {
            $this->getDoctrine()->getManager()->detach($c);
        }
        $query->free();
        $this->getDoctrine()->getManager()->clear();
        $this->getDoctrine()->resetManager();
        gc_collect_cycles();
    }

but the script always dies, I believe that Doctrine is every Insert storing it in memory, only that I just need to feed the database with that information. There’s no need to remember.

Every 1000 records I enter (persist) I run clearBatch

public function onProcess($c){
      $this->getDoctrine()->getManager()->persist($c);
      $this->count++;
      $this->last = $c;
      if ($this->count % 1000 == 0) {
           $this->getDoctrine()->getManager()->flush();
           $this->clearBatch();
      }
}

I even tried to appeal by calling the Garbage Collector.

If I don’t store in persistence I can read all the records. And my memory consumption is constant (reaches that peak, right at the start of the script and remains until the end) 2.4% (is a Vmware only for testing, has 1GB of ram), the script dies when after reaching ~14%.

The function onProcess is called in the file read loop.

1 answer

1


The problem is precisely in the file size, only counting by the number of lines the file should be more than 50M. Then there’s not much to do your memory consumption will be high even.

I suggest two things or you make a script just to do this processing in pure PHP, consumption will still be great but then you can optimize better.

If you want to continue using Doctrine you first try to leave the entityManager in a global access property/variable ($this->em), start in the constructor for example, instead of always giving a $this->getDoctrine->getManager()

Something else tries to use only the clear in the onProcess function

public __contruct() {
    //enable gc
    gc_enable();

    //get entityManager
    $this->em = $this->getDoctrine()->getManager();

    //disable log (salva um pouco de memoria ao evitar que tudo vá para o log)
    $this->em->getConnection()->getConfiguration()->setSQLLogger(null);
}

public function onProcess($c) {

    $this->em->persist($c);
    $this->count++;

    unset($c);

    //diminui um pouco o batch para evitar varios statements
    if ($this->count % 100 == 0) {
         $this->em->flush();
         $this->em->clear(); //detach all objetcs

         gc_collect_cycles();
    }
}

Your clearBatch method will only slow down the process.

  • Isn’t there a way to take out of memory the instance used by Doctrine as a kind of free in the instances already recorded so that later the Garbage Collector can free this memory? Because really the file is large has more than 100MB, but if I generate the instances and call onProcess without using Doctrine it uses less than (2,4%) 30MB for the whole process, that is, despite it create the instances, it frees the space. Memory is not being overloaded by reading the file as I load line by line in memory and not the complete file.

  • I was using clearBatch as you suggested, but I added the full example, because in other research I did suggested several things, I then posted everything I had ever done, based on suggestions from previous research. None of them worked.

  • So "clear()" is responsible for making the "detach" of objects

  • tries to force gc after clear() using the gc_collect_cycles() function. You can try to adjust the batch size as well

  • I made the modifications, but the consumption remains the same.

  • So I believe you don’t have much to do, I strongly recommend doing this import without Doctrine, as its own documentation says: An ORM tool is not primarily well-suited for mass Inserts, updates or deletions. Every RDBMS has its Own, Most effective way of dealing with such Operations and if the options outlined Below are not sufficient for your purposes we recommend you use the tools for your particular RDBMS for These Bulk Operations.

  • The problem is that the GC will only release the memory if it is scheduled to be released, but I believe that this is not happening, because Doctrine must be caching it and it will be in use. I stopped at this documentation too, for portability reasons I didn’t want to leave Doctrine.

  • ta using only Doctrine or ta using some symfony2-like framework?

  • I’m using Doctrine with Symfony2

  • I’ll change the answer see if it helps

  • I’m running the test, but for sure this should be the solution the peak memory utilization if fixed at 3.9% (~ 40MB) from the beginning, however I passed a wrong information, I went to see are not a little more 49.000 records, are a little more than 490.000 records. As soon as I’m done I’ll let you know.

  • Change the text of your answer, because the code has a solution, but your answer leads to believe that it has no solution or that the solution is another.

  • in any case if it does not work and you do not want to make a pure sql try to use only the DBAL, it will be faster than the Doctrine

  • It will work, is fixed consumption in 3,9% and have been entered more than 350.000 records.

  • Worked, maximum consumption 3,9% (~40MB), entered 498,825 Records. If you want to edit the answer, let me know, because I will close the question.

Show 10 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.