Read Feed (rss) description with PHP

Asked

Viewed 726 times

0

I’m trying to get a description of feed to display on a site I have to do, but when trying to catch it returns to me as empty. When I see the source code of the link that the feed is in, the description is there:

<description><![CDATA[
    <div>
    <a href="http://eissomesmo.com.br/blog/e-dicas/"><img title="Eisso4" src="http://eissomesmo.com.br/blog/wp-content/uploads/2016/02/Eisso4.jpg" alt="É Dicas!" width="230"  height="230" /></a>
    </div>
    Para um conteúdo cumprir a sua função, deve ser feito adaptado para ser exibido em várias plataformas de mídia e pensado estrategicamente para atrair a atenção do público-alvo e mantê-lo. Este conteúdo pode assumir diversas formas como notícias, videos instrutivos, e-books, posts de blog, guias, artigos, perguntas e respostas, imagens, entre outros. Empresas que constroem]]></description>

But PHP returns me empty when I pull the tag description. I’m using this code:

$curl_handle=curl_init();
        curl_setopt($curl_handle, CURLOPT_URL,'http://eissomesmo.com.br/blog/feed/');
        curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
        curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_handle, CURLOPT_USERAGENT, 'rss');
        $query = curl_exec($curl_handle);
        curl_close($curl_handle);

        $rss = new SimpleXmlElement($query);

        echo '<pre>';
        echo var_dump($rss->channel->item->description);
        echo '</pre>';

The Feed is being generated by Wordpress. Link to the feed is http://eissomesmo.com.br/blog/feed/

1 answer

1

Well this is because within XML some tags contain the commands CDATA, that as we know it serves to pass values that should not be interpreted by XML but as the values themselves, in this case it serves to XML not to be confused with HTML because they contain < & > ...

<![CDATA[ ... ]]>

The problem is that PHP and its libraries dealing with XML have a bug that cannot correctly interpret CDATA, according to some forums included SOF (in English) if you update your XML libs maybe it works properly this problem.

But if you don’t want to update a way to do this is to force PHP to merge CDATA with the text using the argument LIBXML_NOCDATA within the function simplexml_load_string at reading time for example.

<?php
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, "http://eissomesmo.com.br/blog/feed/");
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, "rss");
$query = curl_exec($curl_handle);
curl_close($curl_handle);

$rss = simplexml_load_string($query, "SimpleXMLElement", LIBXML_NOCDATA);
$desc = $rss->channel->item->description;

preg_match('/<a href=\"([^\"]*)\"><[^>]*?src="([^"]+)"[^>]*><\/a>/isU', $rss->channel->item->description, $valores);

echo "<pre>";
print_r($valores);
echo "</pre>";
?>

EDIT: As the question of the colleague asked via comment, I am adding a Regular Expression that extracts the data inside the div with the help of the function preg_match returning in a matrix like this:

 Array (
  [0] => <a href="http://eissomesmo.com.br/blog/e-pascoa-2/"><img title="post_relacionamento_3" src="http://eissomesmo.com.br/blog/wp-content/uploads/2016/03/post_relacionamento_3.jpg" alt="É Páscoa!" width="230"  height="220" /></a>
  [1] => http://eissomesmo.com.br/blog/e-pascoa-2/
  [2] => http://eissomesmo.com.br/blog/wp-content/uploads/2016/03/post_relacionamento_3.jpg
)

We tested this solution worked well, but as I do not know the version of your PHP need to test.

  • worked out, just displayed something else, which was the content of <div></div> that I don’t need, because the image has already taken another way. You would know how to remove the content that is between the tag <div> ?

  • @Alisson Acioli well this was not in your initial question, you were just saying that the value was empty :P, but with an ER I read the internal data and return in a matrix, I hope this is it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.