What is the best way to read an XML in PHP and insert it into the Database?

Asked

Viewed 761 times

5

I am reading XML files of about 50M of information (a hundred thousand records, could be more), and I have two classes implemented XML reading one DOM and another SAX.

I’m having a problem in terms of running times, it takes me more than 10 minutes, the maximum running time of PHP, changed by me on php.ini.

I’ve read a lot about this and I know that SAX is faster than DOM for very large files, which I think is the case.

One of my mandatory requirements is that either the entire file is read and placed in the Database, or it inserts NOTHING.

The question that remains is which is the most effective way in terms of RUNNING TIME, to read XML and save to BD?

  • Go reading TAG to TAG and inserting in BD?
  • Read all to a data structure and insert all at the end?
  • Another way, which?
  • 1

    I think your question is very valid, but she is almost which is mainly based on opinions. I say this in light of the options we have. For example, if we read everything first and enter at the end of the request, the fluidity will be greater - this is a technique widely used by Microsoft in the latest versions of its software; on the other hand, if we insert item to item, can the problem of losing something along the way if we don’t have one stash of data to store and later commit the information to be entered in the database.

  • @Guilhermeoderdenge the question is clear, which is the most effective way in execution time.

  • 2

    But I liked the first part of the reply about microsoft :)

  • 1

    In addition, you mentioned "[...] in terms of running time [...]": to have security when entering tag to tag, you will need a stash. He demands a little hardware, which naturally minimizes the performance of the technique in question; on the other hand, read the whole structure and save at the end, for me, is the best option because there is no need for stashes and it is all or nothing: if succeeded, saved; otherwise, forget what was done. In addition, this methodology requires a - and only one - request with the database, which does not waste any byte used.

  • You just gave an answer anything but based on opinions. Thank you.

  • Really. I’m already formulating an answer, Jorge. =)

Show 1 more comment

1 answer

1


Let’s look at the possibilities you’ve given yourself:

Go reading TAG to TAG and inserting in BD

In this format, we will have a redundant structure - with each tag read, a request is opened with the database and saves it there.

The good:

If you have not been able to read everything and save everything, at least you do what is possible; The problem of this "pro" is that it goes against the question yourself raised:

"[...]

One of my mandatory requirements is that either the entire file is read and placed in the Database, or it inserts NOTHING.

[...]"

The bad:

If you don’t keep a database connection alive during the application lifecycle, you will have what I just called redundant structure. Now, each time you go to save some information, you will be required to request a connection to your base - which, rest assured, is a very relevant waste of processing; naturally, this will directly affect the running time of your service - which is precisely what nay we want.

Consider this technique a kind of streaming. Its processing is fragmented - if the experience is not social, I see more than unnecessary, I see it as something wasteful.

Besides everything else, the concept of "all or nothing" doesn’t apply without a stash to a later commit - even if you apply this, keeping the connection alive, the stash and the commit require an "extra" of hardware - which further depresses the performance you both seek.


On the other side of tatami, we have:

Read all to a data structure and insert all at the end

Understand:

This is a technique much used in the latest versions of Microsoft software - that’s where the fallacy that Windows Phone is flowing comes from, for example: all the loading is done before the final execution of an application in question, which makes the experience smooth and without crashes. Understand this as a video being 100% uploaded before it runs: it takes longer for you to watch, though, you watch smoothly.

Technical terms:

This method requires one request with the database: when to save. This only request does not make its structure redundant and guarantees its "all or nothing". If you were able to save it, that means it was all - and that’s a point that you raised.

Overall, the response time to save all will be minor than the previous technique. This is because the processing is now smaller; the hardware responds faster and manages to run everything smoothly, fluidly and consequently more agility.

Once you no longer want the "all or nothing," consider re-examining the previous streaming technique.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.