Improve file_get_contents performance in loops

Asked

Viewed 453 times

2

There is a way to do the file_get_contents perform a quick function within a loop?

Follows the code

<?php foreach ($links->result() as $value) : ?>

<?php 
          $url = $value->lnkUrl;
          $domain = parse_url($url, PHP_URL_HOST);
          $web_link = "http://".$domain;

          $str = file_get_contents($web_link);

          if(strlen($str)>0){
            preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
             if ( isset( $title[1] ) ) {
               echo "<span class='directi_web' title='".$title[1]."'>". $title[1] ."</span>";
             }else{
                echo "<span class='directi_web'>...</span>";
             }
           }
?>


<?php endforeach; ?>
  • 4

    Uellington, the delay is not in itself related to the file_get_contents but rather because your script downloads an entire page to extract only the title. Maybe curl be the most suitable for what you want to do.

  • And welcome to Stack Overflow. If you describe better what the purpose of your code can be better understood and useful to other people. Take a [tour] and see the guide [Ask] to better understand how the community works.

  • Explaining the goal helps people propose more efficient solutions to their problem.

  • I have a website that gathers content from various websites on the internet, is displayed in blocks on the home page, and would like to display the name of the link origin site.

1 answer

2


With the use of cURL (English) to collect the page and DOMDocument (English) to extract the title significantly simplifies the work done:

Function to collect HTML

function file_get_contents_curl($url) {

    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    $data = curl_exec($ch);
    curl_close($ch);

    return $data;
}

Your code using Domdocument

foreach ($links->result() as $value) {

    // recolher página
    $domain = parse_url($value->lnkUrl, PHP_URL_HOST);
    $html = file_get_contents_curl("http://".$domain);

    // processar e recolher titulo
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $nodes = $doc->getElementsByTagName('title');
    $title = $nodes->item(0)->nodeValue;

    // output
    if (!empty($title)) {
      echo '<span class="directi_web" title="'.$title.'">'.$title.'</span>';
    }
    else {
      echo '<span class="directi_web">...</span>';
    }
}

Note: This type of work should be carried out in background and information stored in a file or database. When you serve a page to the visitor, the data must be ready to use. If you are processing all the information when presenting the page, the visitor naturally has to wait and things take a lot longer than they are supposed to.

  • Thank you very much, it really worked, but it continues with a little delay. The background you speak, would be a cache ?

  • Yes, essentially a code that collects and stores the titles and Urls of the pages in a table or file, which is executed from X in X minutes. So on the side of Frontend, the query is just caché making the whole process super agile!

  • Good observation, helped a lot. Hugs.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.