Special characters () appear in front of the XML tag

Asked

Viewed 2,799 times

2

I am reading two XML files but created on different computers. This first was created on my computer:

<?xml version="1.0" encoding="UTF-8"?> ...

Already this below with same content appears a sequence of special characters () before the tag xml. Behold:

<?xml version="1.0" encoding="UTF-8"?> ...

Even copying the contents of one file and pasting in another, still at the time of using the new SimpleXMLElement($contentFile) remains the same.

As these strange characters appear, an error occurs:

Warning: Simplexmlelement::__Construct(): Entity: line 1: parser error : Start tag expected, '&lt;' not found in C: wamp64 www...

At first I thought to remove these characters using regular expression, but I thought maybe there is already something ready in relation to this (which I still don’t know);

How can I fix this problem?

  • ends the two xml files?

  • Related to Soen: https://stackoverflow.com/q/3255993/1452488

  • @Virgilionovic the problem is only at the beginning of the file, so I found it irrelevant to insert all. So I’m using reticence...

  • These two files are created by whom?

  • @Virgilionovic Do you really think this is relevant to give an answer?! One for me and one for my cousin. Notepad on my pc and notepad on her pc. xD

  • Of course I think it’s relevant @acklay may be that the editors are in different encodings, we have to take into account the whole process, this is my view ( I don’t see anything including ) to try to help.

  • 1

    Saw the reply @acklay how useful it is to say how the files were created.!

  • @Virgilionovic But it is because it is the following, "my cousin" created an archive through a software geolocation bla bla... Then she tried to get into the system, which she was making a mistake. I asked her to send it to me, so it didn’t work. When I went to debug, appeared these special characters in front, in which the files generated on my pc does not do this.

  • @acklay yes of course...

Show 4 more comments

1 answer

7


The characters  indicate that the document was saved with "UTF-8 with GOOD", when you were supposed to save "UNWELL".

How to resolve in the application

Note: in the examples I used simplexml_load_string, but both he and the simplexml_load_file returns a SimpleXMLElement:

SimpleXMLElement simplexml_load_string ( string $data [, string $class_name = "SimpleXMLElement" [, int $options = 0 [, string $ns = "" [, bool $is_prefix = false ]]]] )

I have no way to state which document it is, you can try to decode the XML content before "parse":

$data = file_get_contents($url);
$data = utf8_decode($data);

$xml = simplexml_load_string($data);

...

$xml->asXML();

Or decode and encode again (if you have a problem with XML being UTF-8 in the header):

$data = file_get_contents($url);
$data = utf8_decode($data);
$data = utf8_encode($data);

$xml = simplexml_load_string($data);

...

$xml->asXML();

You can choose to try using the trim (this by the way was the only one that worked for me):

$data = file_get_contents('A.xml');

$data = trim($data); //Remove os espaçamentos incluindo o "BOM"

$xml = simplexml_load_string($data);

...

$xml->asXML();

If none works can try the substr with strpos thus:

$data = file_get_contents($url);

$data = substr($data, strpos($data, '<'));

$xml = simplexml_load_string($data);

...

$xml->asXML();

Even if you fail, it might match utf8_decode and utf8_encode again:

$data = file_get_contents($url);

$data = substr($data, strpos($data, '<'));

$data = utf8_decode($data);
$data = utf8_encode($data);

$xml = simplexml_load_string($data);

...

$xml->asXML();

How to resolve with text editors/processors

If you have access to these .xml you can edit them using the Notepad++:

notepad++

Or sublimetext:

sublimetext

  • I’m sorry, but I won’t have access to XML. Actually the system managers will create XML based on geolocation software. I don’t even want to know how they are generating these files, it just has to be in a format that the system can interpret (in this case is what I want to do). Nor will I give instructions for anyone to install/open both the noteped++/sublime. The system should read and check whether it is GOOD or NOT and do the proper treatment (which is what I want to do). Maybe I haven’t been able to fully demonstrate the problem itself, but in fact, your answer won’t help me.

  • Thank you so much for the answer Bro and for trying to help. = I will do some more research, and perhaps try to adapt the question so that your answer is validated. If I don’t find a solution, I will try to create a new question with the problem actually. = D

  • @acklay but the tip of trim did not solve the problem of BOM? Because the test I did worked, I read a remote XML with file_get_contents used trim and then I applied the string to simplexml_load_string and recognized.

  • @acklay edited the answer with another suggestion from substr and strpos

  • It didn’t work for me. So, you saw that I quoted the SimpleXMLElement. You ever work with him? Well, this "class" instead of receiving a file URL as it is in the question (which I just edited), it receives the contents of the file, then I use the $xml->asXML() to recognize this content as xml. Only when receiving the content via POST, when I call the SimpleXMLElement($contentFile), already gives error with the special characters. I tried to use Trim in $contentFile, but no function.

  • @acklay the simplexml_load_string is a shortcut to the SimpleXMLElement. So in all examples I used it, see in doc the type of return: http://php.net/manual/en/function.simplexml-load-string.php

  • 1

    GOOD hahaha, already gave right?! I used the utf8_decode and it worked beauty! Sorry for the petulance, and thanks for the patience! (+1) admirer(in a good sense - huehe) here from SO. Abs.

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.