Simple html dom grab "text/javascript" link?

Question

Simple html dom grab "text/javascript" link?

Asked 8 years, 1 month ago

Viewed 305 times

-1

Like the url inside:

<script type="text/javascript">
    var src = "https:www.site.com";
</script>

I’ve tried to research but the examples I find I can’t change to what I need.

The code goes like this:

include('simple_html_dom.php');
$page = 'www.site.com';
$html = new simple_html_dom();
$html->load_file($page);

$links = array(); 
foreach($html->find(script) as $element) {
   $links[] = $element;
echo $element;
}

reset($links);

What I want is to get the link inside the

<script type="text/javascript">
  var src = "https:www.site.com";
</script>

Returning only this: https:www.site.com

Explain better what you’re trying to do.

– RFL

2017/05/14 at 01:54
Explain in more detail, so we understand your problem

– Matheus Miranda

2017/05/14 at 02:04

2 answers

Browser other questions tagged php

You are not signed in. Login or sign up in order to post.

by Guilherme Nascimento • **98,651** points · Answer 1 · 2017-05-14T03:04:46+00:00

You can use the native PHP API called DOMDocument combined with curl or file_get_contents and then use preg_match, a simple example to understand:

<?php
$meuhtml = '
<script type="text/javascript">
    var src = "https:www.site.com";
</script>
<script type="text/javascript">
    var    src    = \'https:www.site2.com\';
</script>
';

$doc = new DOMDocument;
$doc->loadHTML($meuhtml);

$tags = $doc->getElementsByTagName('script');

$urls = array();

foreach ($tags as $tag) {
    if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
        $result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
        $urls[] = $result; //Adiciona ao array
    }
}

//Mostra todas urls
print_r($urls);

To regex used #var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)# is who extracts the data returned by $tag->nodeValue. See working in https://repl.it/Hwt4 (click on the button Run when the page loads).

Of course this was an example to understand the code, to download the data from another site you can use the curl or whether in your php.ini the allow_url_fopen for on, example with Curl:

<?php
$url = 'http://site.com';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);

if (!$data) {
     die('Erro');
}


$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);

if ($httpcode !== 200) {
    die('Erro na requisição');
}

curl_close($ch);

$doc = new DOMDocument;
$doc->loadHTML($data);

$tags = $doc->getElementsByTagName('script');

$urls = array();

foreach ($tags as $tag) {
    if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
        $result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
        $urls[] = $result; //Adiciona ao array
    }
}

//Mostra todas urls
print_r($urls);

Or if you just want to get the first URL change to:

$url = '';

foreach ($tags as $tag) {
    if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
        $result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
        $url = $result;

        break;// Finaliza o foreach assim que encontrar a url
    }
}

echo $url;

by Inkeliz • **20,671** points · Answer 2 · 2017-05-14T03:10:31+00:00

Just use the Xpath of PHP, basically the following:

$html = "seu HTML obtido por file_get_content ou por cURL...";

$DOM = new DOMDocument;
$DOM->loadHTML($html);

$XPath = new DomXPath($DOM);

$TagScriptJavascript = $XPath->query('//script[@type="text/javascript"]');

foreach($TagScriptJavascript as $item){

    if(preg_match('/var src = "(.*)";/', $item->nodeValue, $url)){

        echo $url[1];

    }

}

Explanations:

First start DOM with your HTML, obtained anyway.
The $TagScriptJavascript return all elements that are script and who possess the attribute of type with the value of text/javascript, conforms to the query (//script[@type="text/javascript"]).
The foreach will make the option 4 for each $TagScriptJavascript obtained.
The preg_match will seek for var src="(qualquer coisa)";, if he finds it will show, due to the echo $url[1].

Test it out here.