How can I capture a favicon from a site via PHP?

Asked

Viewed 446 times

5

I upload an external content from a website and then import it to the DOMDocument.

Currently, I can capture tag information title easily.

I do so:

$dom = new DOMDocument();

@$dom->loadHtml('<?xml encoding="UTF-8" ?>' . $conteudo_html)

$title = $dom->getElementsByTagName('title')->item(0)->nodeValue;

However, I would like, through DomDocument,also capture favicon of that content.

How could I do that?

Note: If there’s a way to do this with the DOMXPath, it’ll be even better.

  • I’m not sure you can do this, but try $fav = $dom->querySelector('link[rel*="shortcut icon"]');

  • In case I would return the <link tag there>

  • 1

    Other problem: relative url!

5 answers

3

In a half-manual way I could see by the attribute rel="shortcut icon", that is to say, I caught everyone who is tag link ($dom->getElementsByTagName('link');) and then I check your attribute rel ($itens->item($i)->getAttribute('rel') === 'shortcut icon'), and play it in an array. Just have to make an adaptation on that site where it has several, following the same logic!

<?php

    //endereço do site  
    $site = ''

    $conteudo_html = file_get_contents($site);

    $dom = new DOMDocument();

    @$dom->loadHtml('<?xml encoding="UTF-8" ?>' . $conteudo_html);

    $itens = $dom->getElementsByTagName('link');
    $count = $itens->length;


    $finds = array();

    for($i = 0; $i < $count; $i++)
    {

        if ($itens->item($i)->getAttribute('rel') === 'shortcut icon')
        {

            array_push($finds, [
                'tag' => 'link', 
                'href' => $itens->item($i)->getAttribute('href'),
                'id' => 'shortcut icon',                
                'type' => $itens->item($i)->getAttribute('type'),
                ]
            );

        }

    }

    //itens encontrados
    var_dump($finds);

1

Here is a very simple example, available on the PHP.net however, with some modifications, to deal with errors, and portability, for being a function.

function getUrl($url){
    $doc = new DOMDocument;
    // Aqui suprimi os erros, prepositadamente;
    if(!@$doc->loadHTMLFile($url)){
        $err="";    
        $erros = libxml_get_errors();
        foreach($erros as $erro){
            $err .= $erro->message;    
        }    
        return $err;
    } else {
        $xml = simplexml_import_dom($doc);
        $arr = $xml->xpath('//link[@rel="shortcut icon"]');
        return $arr[0]['href'];    
    }
    
}

// Ativar a gestão de erros
libxml_use_internal_errors(true);

echo getUrl("http://answall.com");

References:

XML - PHP.net

Favicon Class - Controlstyle

How to get favicon from websites using PHP - Soen

1

Using Stackoverflow as an example:

<link rel="shortcut icon" href="//cdn.sstatic.net/br/img/favicon.ico?v=c6678b633455">


$html = file_get_contents('http://answall.com');

$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$favicon = $xpath->evaluate("//link[@rel='shortcut icon']");

print_r($favicon->item(0)->getAttribute('href'));

Will return:

//Cdn.sstatic.net/br/img/favicon.ico? v=c6678b633455

1

I suggest using the Simple Dom Library: Simple Html Dom

Example:

<?php
include("simple_html_dom.php");
$html = file_get_html('/');

echo $html->find('link[rel="shortcut icon"]', 0)->href;

Satida: //Cdn.sstatic.net/br/img/favicon.ico? v=c6678b633455

0

One idea I can give you is to first capture the site url using curl() or file_get_contents():

  <?php

if (isset($_GET['img'])) {
    $favicon = $_GET['img'];
    print_r(array('favicon'=>$favicon));
  die(); 
}

function capturarFaviconSite($url_metodo) {

$script = "\n" . '<script>' .
                 'function captureFavicon() {
                     var objSerializer = new XMLSerializer(), favicon;
                     var expFormat = /href="(.+).[png|ico|jpg|(.+)?v=(.+)]"/gi;
                     var expCheck = /(rel="icon"|rel="shortcut icon"|type="image\/png"|rel="apple-touch-icon")/gi;
                     var all = document.querySelectorAll("link");
                     for (var i in all) {
                           var fav = objSerializer.serializeToString(all[i]); 
                     if (expCheck.test(fav)) {
                         favicon = expFormat.exec(fav)[0]
                         .replace("href=\"","")
                         .replace("\"","");
                         break;
                      }
                   }
                   if (favicon!="") {
                       location.href="?img="+escape(favicon);
                   }
                 }';

$html = file_get_contents($url_metodo);
        return preg_replace('/<\/head>/',$script . 'captureFavicon();'."\n".'</script></head>',$html);
}

echo capturarFaviconSite('http://www.uol.com.br');

In the above case, what I am doing is returning this by a javascript method:

function captureFavicon() {

  var objSerializer = new XMLSerializer(), favicon;
  var expFormat = /href="(.+).[png|ico|jpg|(.+)?v=(.+)]"/gi;
  var expCheck = /(rel="icon"|rel="shortcut icon"|type="image\/png"|rel="apple-touch-icon")/gi;
  var all = document.querySelectorAll('link');
  for (var i in all) {
       var fav = objSerializer.serializeToString(all[i]); 
       if (expCheck.test(fav)) {
           favicon = expFormat.exec(fav)[0]
          .replace("href=\"",'')
          .replace("\"",'');
       break;
      }
  }
return favicon;
}
  • How do I load this via javascript? Usually ajax requests are blocked. That’s why I’ve made a proxying via php

  • can use curl or simplexml_load_file

  • And put javascript down.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.