Catching value created at runtime

Asked

Viewed 165 times

5

I’m trying to get a value contained in a tag (span) located in another domain using PHP and Javascript. There are cases where I can get the required value but mostly the returned value is null. I believe that when I give one file_get_contents to fetch the span, and it does not appear, it is because it needs the scripts referring to the value mount. Someone has already gone through a similar situation and managed to solve the problem?

Follow the code I’m using:

        $regex = '/\<span class="special-price-value"(.*?)?\>(.|\\n)*?\<\/span\>/i';
        $url = file_get_contents("http://link.com.br");                                                         
        preg_match_all($regex, $url, $scripts);
        print_r($scripts);
  • What form are you searching with PHP? using Domdocument or just File_get_contents?

  • file_get_content only, passing the link from where I want to search

  • 1

    Could you put the part of your code that is in doubt? So it would make it easier to help you.

  • I added the block to my post!

  • ah, you’re trying to pick up by PHP!

  • Yes, now, but had already tried for Javascript also haha

Show 1 more comment

3 answers

3

Why complicate using Regex to parse HTML? Use DOM and Xpath:

<?php

$doc = new DOMDocument();
$doc->loadHTML(file_get_contents("http://link.com.br"));
$xpath = new DOMXpath($doc);

$spans = $xml->xpath('//span[@class="special-price-value"]');

foreach ($spans as $span)
{
    echo $span->nodeValue;;
}
  • 1

    +1. I was already preparing this answer in my localhost/teste/index.php. At least I remembered how you use it!

  • But in my case, the value search site may contain, for example a &. This would cause a parse error, no?

  • Something like that ? http://link.com.br?teste=foo&bar=teste

  • Also, but if in this case I could give an escape. But what about when the characters are inside the site? File_get_contents would return me the HTML and loadHTML would load it, so when loadHTML arrived at that character, the error would appear. If I could, I could even run a str_replace, but as the link I will scan can switch and not in the same domain, I could not do it.

  • @Henriquecosta did not understand your concern... why & would cause some error ? If possible open another question.

  • Sorry, I think str_replace will work yes, because I will have html loaded in my variable...

  • From what I understand, that’s another question. If the answer has solved your initial problem, accept the answer and ask another question if necessary. Read more at [tour]

  • @gmsantos The & fara with which the parse thinks we have an Entity Reference (e.g. ). In some cases, this returns ; and then it is declared that the document has ended.

Show 3 more comments

1

Instead of using regular expressions you could use the resources of this library Simple HTML DOM, with it is very simple to scan the html see an example of the manual itself:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');//seu site aqui

// Find all spans 
foreach($html->find('span') as $element) 
   echo $element . '<br>';
//$html->find('span[id=especial]')
  • you can use all types of selectors (same as CSS) necessary to find the desired elements

  • I think an example using the structure he wants to use is more useful.

  • In my example will be found and printed all span found in the link arranged in file_get_html, to get a more filtered result you can add a selector specified as id,class,name

  • I had already tried this way, with file_get_html. I was able to locate the tag I wanted, but between it, the value was null

  • Some spans may have their values loaded with Ajax?

  • I believe so. So when I try to load them the value returns null.

Show 1 more comment

1


I could not get the value with Domdocument because it does not load values sent to the tags via Javascript. My solution was to use Javascript itself to find the required values.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.