I would like to propose an alternative that aims to screw with a screwdriver and not with a hammer, that is, to syntactically analyze a hierarchical structure with a parser for real.
This is one of the few situations where the excessive verbosity of GIFT does not hinder the solution of the problem. However, for this solution to work according it is necessary that HTML is semantically formulated. For this reason I will be assuming an HTML that contains <UL tags>:
<h4>Jogo: Area 51</h4>
<ul>
<li>Região: 2 - </li>
<li>Sistema: 8 - Sony PlayStation</li>
<li>Ano: 2003</li>
<li>Publicadoras: 1190 - Midway, 730 - GT Interactive</li>
<li>Desenvolvedora: 1165 - Mesa Logic</li>
</ul>
The solution:
$dom = new DOMDocument;
$dom -> loadHTML( $html );
$data = array();
foreach( $dom -> getElementsByTagName( 'ul' ) as $node ) {
if( $node -> hasChildNodes() ) {
foreach( $node -> childNodes as $children ) {
$nodeValue = trim( $children -> nodeValue );
if( ! empty( $nodeValue ) ) {
$structure = preg_split(
'/(.*?):\s+(.*?)/', $nodeValue, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
);
$data[ spl_object_hash( $node ) ][ $structure [ 0 ] ] = $structure[ 1 ];
}
}
}
}
We iterate all <UL> elements through DOM::getelementsbytagname(). Of all that $Node will work only with the property value Domnode::nodeValue
Here opens the first possibility of gambiarra. We could explode the line breaks of this value and mount the indexes of the array directly. But we’re parsing syntactically, so this is wrong, and so we need to iterate on the list children.
We could instead get all the lists, get directly the children (<LI>), but this would require additional code and personally would make less logical sense.
To avoid mistakes, notices and cia. we will check if there are we-children, even if we are seeing that these exist. For this, we use Domnode::hasChildNodes() and, if they exist, we will work with the value of the property Domnode::childNodes
From that point on we are no longer analyzing syntactically but manipulating the text of the nodes. Let’s break every string, already cleaned with Trim(), separating the possible labels from their value.
It’s not within the scope of the topic to explain the ER, but as you can see it’s quite simple.
When adding to the array $date we need a way to make every single piece of information unique. We could do a little trick with a manually incremented counter but since we are with many objects in play, I opted for spl_object_hash() which returns a unique numerical sequence for each object at runtime, that is, each time you update p gina, will be other values.
This is purely structural, when iterating this array to insert into the database just ignore the value of the first key. Simple as this!
It would be nice a definitive guide of those around here.
– brasofilo
In fact, I can only support.. hahaha
– Gabriel Tadra Mainginski
I think only with the DOM can solve your case. And you could keep trying until you answer ;)
– brasofilo
Now it’s time to sleep, I’ve been breaking my head for a long time. Leave the HTML like this, cute already gave a lot of trampo. hahaha
– Gabriel Tadra Mainginski
@Gabrieltadramainginski sleep? What is this? Programmers do not sleep :)
– gmsantos
Your list looks like this: http://pastebin.com/muEge8cw?
– Marcos Vinicius
It has several ways to solve. A simple and fast medium, just use explode(), strpos(), substr() and so on. In less than 5 minutes I can finish this.
– Daniel Omine
Yes @Marcosv, just like that.
– Gabriel Tadra Mainginski
@gmsantos students out of test days sleep HSUAHS
– Gabriel Tadra Mainginski
@Danielomine, I thought about doing this, but I imagine it is an extremely slow operation, because it would give several vectors with thousands of positions, but if you think it is possible, send bullet! :)
– Gabriel Tadra Mainginski
Can you make this list (file) available so that I have a better view and can give you an answer on your case? Maybe modify it to make these fields encapsulated by a <div> to make data capture easier.
– Marcos Vinicius
List - Note: I will probably use only the Ids, and there are some data that have 2 developers or 2 publishers or 2 regions.
– Gabriel Tadra Mainginski
Gabriel, the performance in this case is indifferent. Using vector, using ER.. by the way, ER, depending on the complexity is much slower. But I didn’t understand the "shoot".. I mean, you expect me to make a script? rsrsrs the job is yours, right?? rsrsrs
– Daniel Omine
Well, the joke here is to suggest solutions and well, you said, "In less than five minutes I can finish this.".
– Gabriel Tadra Mainginski