1
I’m trying to make a request for a URL that contains a return on XML
with the function simplexml_load_file('url')
and all I get is false
when I debug using var_dump
.
$xml = simplexml_load_file('http://www.cinemark.com.br/programacao.xml');
var_dump($xml);
Return:
bool(false)
What other method can I use to treat XML
out this function, and what would be the "correct" application for this function?
After enabling php error display, I get the following Warning:
Warning: simplexml_load_file(): http://www.cinemark.com.br/programacao.xml:1: parser error : Start tag expected, '<' not found
Warning: simplexml_load_file(): ...
I believe it’s some encoding problem, already tried to put the function header()
and use the chatset=utf-8
but the error is this shown above.
After using the mb_detect_encoding
receive: "UTF-8"
put the
header
asiso-8859-1
, and made the request withfile_get_contents
and thensimplexml_load_string
with the result offile_get_contents
and the error is the same only with the different character " �"– RFL
can draw up a response?
– RFL
I recommended as a test only, probably for definitive use is better another solution. Who knows how to try Domdocument or Xpath is better.
– Bacco
tried
utf8_decode
andencode
as a result offile_ge_contents
but without success, the characters remain "scrambled"– RFL
Have you tried changing the first line of XML to stay :
<?xml version="1.0" encoding="UTF-8" ?>
– jlHertel
@jlHertel really he would have to change the 1st line and change the encoding also (with utf8_encode), is that I deleted the comments not to get too much here, but had already verified that the original encoding is ISO-8859-1 (not only by the top statement, I saw by the returned bytes even to make sure it wasn’t conflicting). In fact, I had no problem reading this XML in other languages, it is a more specific problem of the same author’s situation.
– Bacco
@Bacco, but opening the URL by the browser, even with the ISO-8859-1 declaration, has invalid characters, that is, the browser can read, but sees that the encoding is incorrect. What I’m trying to say is that probably the function you used in another language ignores this character problem, while the PHP function when finding this is returning false. Can confirm if in another language the characters appear correctly?
– jlHertel
What you see on the browser screen is the encoding that the browser understands. Since this page does not have a header saying encoding (not to be confused with the XML declaration) your browser is trying to display in UTF-8 (after all, it does not have an HTTP header saying it is ISO). Now, an XML parser should not take into account the HTTP header when importing the file, but rather the declaration. Reading the original file and displaying as ISO-8859-1 the characters are perfect. The problem is not the encoding, but the way to make the diagnosis.
– Bacco
@jlHertel see the page being displayed in ANSI - https://i.stack.Imgur.com/2m1hX.png - And displaying this XML in a desktop application, everything is normal too. The fact is that the best encoding test is to download the data and display in hexadecimal, for example in an editor like Hxd, so it does not depend on errors on the screen. Looking at the binary data has no way to err the diagnosis. Output on the screen confuses as the error may be of display only. (PS: I tested all this when the question was asked too, no problem).
– Bacco
@Bacco, in this case, removing the line that defines the charset should be possible to read the XML. Or if you still have an error, it could set the charset line to ANSI itself.
– jlHertel
@jlHertel gives me the impression that the Loader he is using (simplexml) only understands UTF-8, so the conversion + your string swap solution should solve. Out of curiosity, follow a Hex Dump from the top of the page, with all headers including: https://i.stack.Imgur.com/Qowu2.png - You can see the encoding more clearly. (this was downloaded directly via HTTP, and saved to the hard drive just for ease, without any filter, using pure socket, ie no client-side conversion)
– Bacco
@jlHertel believe that "changing" the first line would not be the case because when I receive the request data it already comes with the scrambled characters. accept suggestions from other classes/functions to use as well, let’s not get stuck only in that function in specific.
– RFL