How to obtain a limited number of occurrences with DOM?

Asked

Viewed 69 times

2

I’m parsing from a website, I want to get some data from it, the data is structured as follows:

<div class="interesses">
<span class="tipo" >Tipo 1</span>
<span class="tipo" >Tipo 1</span>
<span class="tipo" >Tipo 2</span>
<span class="tipo" >Tipo 2</span>
<span class="tipo" >Tipo 3</span>
<span class="tipo" >Tipo 3</span>
</div>

I want to get the information inside the span tipo, so I used the DOM:

$html = file_get_contents("http://exemplo.com");
    $DOM =  new DOMDocument();
    $DOM->loadHTML('<meta charset="utf-8">'.$html);
    $xpath = new DomXpath($DOM);
        $tipo = $xpath->query('//*[contains(concat(" ", normalize-space(@class), " "), "tipo")]');
        $arrValues = array_map(null,iterator_to_array($tipo))
        foreach($arrValues as $value){
            echo $value[0]->nodeValue."<br />";
        }

Works!

But the problem is that on the source page, as you’ve seen, there are two "type 1" and two "type 2" and so on, the site always generates duplicate information, but I’m only interested in showing one of each, that is, just a "Type 1" and another "Type 2" and so on. But everything is coming and I have no idea what to do to prevent duplicity.

Updating:

The array_unique that @Miguel Angelo taught, it worked! But now imagine the following scenario: There is 1 bakery online that sells various types of sweet bread: coconut and coconut free. The buyer then chooses two buns one with coconut and the other without coconut, the HTML structure would look something like this:

<div class="interesses">
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Com coco</span>
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Com coco</span>
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Sem coco</span>
<span class="tipo" >Pão Doce</span>
<span class="tipo" >Sem coco</span>
</div>

I want now to show the user only the 2 types of bread he asked for:

Item 1: Coconut Bun, Item 2: Coconut-free Bun.

The DOM I would return something like:

Item 1: Coconut Bun, Item 2: Coconut Bun, Item 3: Coconut-free Bun, Item 4: Coconut-free Bun coconut

If you use the unique from @Miguel Angelo’s tip, the "type" will only repeat once, ie:

Item 1: Coconut Bun, Item 2: Coconut Free.

That is, if you have two of the same types of bread, it will only show either all or only 1, but I want you to show only one group of each: "Sweet Coconut Bread" and take the repeat "Sweet Coconut Bread" but keep the "Sweet Coconut Cake" and remove again the duplicate "Sweet Coconut Free Bread".

Is there any way to do that?

1 answer

2


You can’t use it array_unique?

Example:

$html = file_get_contents("http://exemplo.com");
$DOM =  new DOMDocument();
$DOM->loadHTML('<meta charset="utf-8">'.$html);
$xpath = new DomXpath($DOM);
    $tipo = $xpath->query('//*[contains(concat(" ", normalize-space(@class), " "), "tipo")]');
    $arrValues = array_unique(array_map(
            function ($el) { return $el->nodeValue; },
            iterator_to_array($tipo)));
    foreach($arrValues as $value){
        echo $value."<br />";
    }

Editing to solve additional problem:

The second problem seems to be to concatenate the elements of an array, from 2 to 2 elements. That is, an array like this:

[ "a", "b", "c", "d" ]

It would have to stay that way:

[ "ab", "cd" ]

Before moving on to the array_unique.

For that I did the following function:

function func_concat_N_a_N($num, $array) {
  $length = count($array);
  $item = "";
  $result = array();
  for ($i = 0; $i < $length; $i++) {
    $item = ($i % $num)==0 ? $array[$i] : $item." ".$array[$i];
    if ((($i+1) % $num)==0)
      $result[] = $item;
  }
  return $result;
};

Which will be used in the original code like this:

$arrValues = array_unique(
               // aqui está ela sendo utilizada
               func_concat_N_a_N(
                   // indicando que será de 2 a 2
                   2,
                   // array que queremos unir de 2 a 2
                   array_map(
                       function ($el) { return $el->nodeValue; },
                       iterator_to_array($tipo))));

Online demonstration of the above code

  • I didn’t know the array_unique, even so it did not work, only displayed 1 occurrence to 7 that has currently on the site.

  • As array_map was with a single array, I changed it by passing a function to get only the contents of the element, in string form, before passing to array_unique... see if it now works.

  • It worked! Now please read the update, to see if you can also get me this question.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.