XML returning strange characters

Asked

Viewed 203 times

1

I manage a code in Php that requests server data as a "changelog" from a game server. But in changelog XML returns strange values. As for example in the words "creating" the result is "child§" or "you" the result is "you guys!"

The full XML it returns it:

<?xml version="1.0" encoding="Windows-1250" ?>
<SauronGamer>
<Count>2</Count>
    <Content>
        <News>
            <title>[22/04] Preparando tudo para o OpenBETA</title>
            <description>Galera com todos os nosso projetos estamos felizes em anunciar que depois de tanto tempo sim! O OpenBETA será aberto ao público! Como muitos minigames e muita diversăo.</description>
        </News>
        <News>
            <title>[NA] Criamos o servidor SauronServer.</title>
            <description>Foi iniciado o projeto de criação do servidor sauron server de mingames para vocês!</description>
        </News>
    </Content>
</SauronGamer>

The main goal would be "changelog" to be in the database and return in json, but I couldn’t make it when using fetch_row to add an item in the array();

PHP code:

<?php
error_reporting(0);
header("content-type: text/xml");

$comando1 = "SELECT * FROM  `saurongamer_news` ORDER BY  `ID` DESC LIMIT 0,10 ";

mysql_connect('localhost', 'root', 'usbw');
mysql_select_db('test');

$consulta1 = mysql_query($comando1) or die(mysql_error());

if($consulta1 == TRUE){
    line('<?xml version="1.0" encoding="Windows-1250" ?>');
    line('<SauronGamer>');
        line('<Count>' . mysql_num_rows($consulta1) . '</Count>');
            line('<Content>');

                while($row = mysql_fetch_array($consulta1)){
                    line('<News>');
                        line('<title>' . $row[1] . '</title>');
                        line('<description>' . $row[2] . '</description>');
                    line('</News>');
                }

            line('</Content>');
    line('</SauronGamer>');
}

else{

}

function line($text){
    echo $text . "\n";
}
?>

I’ve tried every possible combination like ISO-5859-1, ISO-5859-2, ISO-5859-15, ANSI, UTF-8, Windows-1250 and the characters are still weird.

And in C# a simple code that reads this:

 private const string UpdateNovidadesServer = "http://localhost:8080/minecraft/novidades.php";

    public Dictionary<string, string> UpdateNovidades()
    {
        var dicionario = new Dictionary<string, string>();


        XmlTextReader xtr = new XmlTextReader(UpdateNovidadesServer);
        xtr.ReadStartElement("SauronGamer");

        xtr.ReadStartElement("Count");
        int size = xtr.ReadContentAsInt();
        xtr.ReadEndElement();

        xtr.ReadStartElement("Content");

        for (int i = 0; i < size; i++)
        {
            xtr.ReadStartElement("News");

            xtr.ReadStartElement("title");
            string k = xtr.ReadContentAsString();
            xtr.ReadEndElement();

            xtr.ReadStartElement("description");
            string v = xtr.ReadContentAsString();
            xtr.ReadEndElement();

            dicionario.Add(k, v);

            xtr.ReadEndElement();

        }

        xtr.ReadEndElement();

        xtr.ReadEndElement();


        return dicionario;
    }
  • An insignificant detail: it seems to me that where you wrote ISO-5859-1 would be ISO-8859-1; the same for others.

  • On it! Encoding continues with strange characters!

  • @sysWOW32-- This comment was only about jackdaw in your text: Iso-5859 corresponds to "Aerospace series -- Graphic Symbols for Schematic Drawings of Hydraulic and Pneumatic systems and Components". Look at my answer: as long as the sources are corrupted, there’s nothing to be done.

  • I’ve already made a script that solves!

1 answer

0

...This is not really a response, it’s more of a difficult comment:

Your XML file is corrupted: denotes two different mixed encodings:

  • the first paragraph of despription is in UTF8 (probably coming from an original Latin1=iso_8859-1=CP1252)
converte( CP1252->UTF8, "texto em CP1252")
  • the second paragraph description was in CP1250 format and was converted to UFT8 as if it were latin1=CP1252 or it got into corrupt format:
converte( CP1252->UTF8, "texto em CP1250")

So : your problem is that you have "sources" in different encodings, to be processed with the same conversion process.

  • For me, this was an answer, and excellent, by the way, pointed out problems that prevent due treatment in the file. : ) +1

  • 1

    @Diegof, thank you! (the treatment of encodings coming from legacies and heterogeneous sources, sometimes is a headache)

  • According to php here CP1252 does not exist as valid encoding!

  • as indicated in the text (in its relevant part) the CP1252= latin1 = iso_8859-1. It can also be deigned by "Windows 1252". The problem is you have two different encodings mixed together.

  • there are no two encodings!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.