utf-8 encoding in XML file to generate RSS

Asked

Viewed 923 times

3

On a news site I want to generate an RSS page with the news of the day. The page is well generated, but if when generating there is a news with accents or special characters the generation fails. In my code I have the following:

<?php

    header("Content-Type: application/rss+xml; charset=ISO-8859-1");

    $rssfeed = '<?xml version="1.0" encoding="ISO-8859-1"?>';
    $rssfeed .= '<rss version="2.0">';
    $rssfeed .= '<channel>';
    $rssfeed .= '<title>RSS feed</title>';
    $rssfeed .= '<link>http://www.xxxxxxxxxxxxx.pt</link>';
    $rssfeed .= '<description>RSS feed</description>';
    $rssfeed .= '<language>en-us</language>';
    $rssfeed .= '<copyright>Copyright (C) 2014 xxxxxxxxxxxxx.pt</copyright>';

    $data = date("Y-m-d");
    $query = "SELECT * FROM tbl_noticias WHERE data = '$data' ORDER BY id DESC";
    $result = mysql_query($query) or die ("Could not execute query");

    while($row = mysql_fetch_array($result)) {
        $title = $row['titulo'];
        $title = $title;
        $description = $row['intro'];
        $date = $row['data']." - ".$row['hora'];
        $link = "http://www.xxxxxxxxxxxxx.pt/detalhe.php?id=".$row['id'];
        $image = "http://www.xxxxxxxxxxxxx.pt/images/resize_listagem/".$row['foto'];

        $rssfeed .= '<item>';
        $rssfeed .= '<title>' . $title . '</title>';
        $rssfeed .= '<description>' . $description . '<![CDATA[<br><img src="' . $image . '" />]]></description>';
        $rssfeed .= '<link>' . $link . '</link>';
        $rssfeed .= '<pubDate>' .$date. '</pubDate>';
        $rssfeed .= '</item>';
    }

    $rssfeed .= '</channel>';
    $rssfeed .= '</rss>';

    echo $rssfeed;
?>

I have tried changing the header to UTF-8 but without success. I have also tried utf8_encode and also failed. The tag is what causes the error. Because the tag with special characters does not give error. The fields in the BD are saved as utf8_general_ci

  • 1

    Have you tried using the GIFT PHP to generate XML? That’s what I use and there is no problem, it already deals with these encoding problems. You’d have to take care of it yourself, since you’re writing.

  • @Jorgeb. I haven’t tried it yet. If you can’t solve it this way I choose your suggestion. thank you

  • The influence on this is linked as the data is recorded in your table, IE, which encoding you are using is latin? If it is you can use a utf8_encode

  • @Harrypotter, the fields in the comic are stored as utf8_general_ci

  • @pc_oc I edit the answer to another! please try

  • Although the DOM cover a lot more ground I think generating a simple XML with it would be killing ants with cannonballs. In such cases, the native Xmlwriter can be a good choice.

Show 1 more comment

2 answers

1


Add this line before the $query, mysql_query("SET NAMES 'utf8'");

header("Content-Type: application/rss+xml; charset=utf-8");

$rssfeed = '<?xml version="1.0" encoding="utf-8"?>';
$rssfeed .= '<rss version="2.0">';
$rssfeed .= '<channel>';
$rssfeed .= '<title>RSS feed</title>';
$rssfeed .= '<link>http://www.xxxxxxxxxxxxx.pt</link>';
$rssfeed .= '<description>RSS feed</description>';
$rssfeed .= '<language>en-us</language>';
$rssfeed .= '<copyright>Copyright (C) 2014 xxxxxxxxxxxxx.pt</copyright>';

$data = date("Y-m-d");
mysql_query("SET NAMES 'utf8'");
$query = "SELECT * FROM tbl_noticias WHERE data = '$data' ORDER BY id DESC";
$result = mysql_query($query) or die ("Could not execute query");

while($row = mysql_fetch_array($result)) {
    $title =  $row['titulo'];
    $title = $title;
    $description = $row['intro'];
    $date = $row['data']." - ".$row['hora'];
    $link = "http://www.xxxxxxxxxxxxx.pt/detalhe.php?id=".$row['id'];
    $image = "http://www.xxxxxxxxxxxxx.pt/images/resize_listagem/".$row['foto'];

    $rssfeed .= '<item>';
    $rssfeed .= '<title>' . $title . '</title>';
    $rssfeed .= '<description>' . $description . '<![CDATA[<br><img src="' . $image . '" />]]></description>';
    $rssfeed .= '<link>' . $link . '</link>';
    $rssfeed .= '<pubDate>' .$date. '</pubDate>';
    $rssfeed .= '</item>';
}

Reference:

  • No, my table is storing utf8_general_ci in all fields. What’s strange is only in the title field give Ester error. thanks

  • Add before that query a command like: mysql_query("SET NAMES 'utf8'");

  • Maybe it’s cache take a look at this! because it’s default !!!

  • from the moment I change the hearder to utf-8, it no longer appears.. Only in safari it is fine

  • @pc_oc still didn’t work out? sorry I didn’t get it ?

  • 1

    yes I got it! thanks for the help!

Show 1 more comment

1

An initial idea would be to use htmlentities to have no trouble with encoding:

while($row = mysql_fetch_array($result)) {
    $title = htmlentities( $row['titulo'], ENT_COMPAT, 'utf-8' );

If not enough, you can force the output on UTF-8:

    $title = utf8_encode( htmlentities( $row['titulo'], ENT_COMPAT, 'utf-8' ) );

But in this case, you need to update these two lines to utf-8 also:

header("Content-Type: application/rss+xml; charset=UTF-8");
$rssfeed = '<?xml version="1.0" encoding="UTF-8"?>';

If you still have problems, you can use a CDATA in the title:

    $title = '<![CDATA['.utf8_encode( htmlentities( $row['titulo'], ENT_COMPAT, 'utf-8' ) ).']]>';;

This solution can be extended to other fields as required.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.