0
I am creating a program where I need to save in a JSON file a dictionary that contains strings with Unicode characters. See the example below:
import json
data = {"face": "( ͡° ͜ʖ ͡°)"}
with open("file.txt", "w", encoding = "utf-8") as file:
file.write(json.dumps(data, indent = 4))
The problem is that whenever I save the file, all Unicode characters are converted to their respective codes \uXXX
and I need the file to have the original texts.
In the case of the above example, the contents of the file created by the program are like this:
{
"face": "( \u0361\u00b0 \u035c\u0296 \u0361\u00b0)"
}
I need the characters to remain the same so that the content is visually pleasing to the user. How could I keep the original text ?
What version of python are you using? Are you using
#coding: utf-8
at the beginning of the archive?– Danizavtz
I’m using Python 3 and the files
.py
and.json
are encoded in UTF-8. But no, I did not comment on this in my code.– JeanExtreme002
i tested by setre_ascii = False from json.dumps and it worked here, give a peek at this link: https://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence
– Erick Kokubum
Dude, I don’t know if you noticed, but the file you’re reading isn’t the same one you’re recording. I tested your code and it worked. But I had to change the name of the file to the correct one.
– Danizavtz
@Danizavtz I know it is not the same that is recording. Those two are really different files.
– JeanExtreme002
Thanks @Erickkokubum, your tip worked to resolve the file writing part.
– JeanExtreme002
The
UnicodeDecodeError
probably because other_file.txt was not saved in UTF-8 (I was only able to simulate the error by generating a file in UTF-16 and trying to read as UTF-8).– hkotsubo
@hkotsubo Not possible because I am saving the file in UTF-8. I am using Windows notepad, has problem in it ?
– JeanExtreme002
@hkotsubo I managed to read the file saving it with encoding
utf-8-sig
. I have no idea the difference between it for the UTF-8, but that was the only encoding that could read the file.– JeanExtreme002
So the file is in UTF-8 but was saved with the BOM (Byte Order Mark), this encoding utf8-Sig ignores BOM: https://docs.python.org/3/howto/unicode.html#Reading-and-writing-Unicode-data
– hkotsubo
@hkotsubo If I save my file in another editor that does not write with this GOOD, it will give error in reading for using the
utf-8-sig
or I can use this encoding without or with GOOD ?– JeanExtreme002
I don’t remember (but I think so), just testing to find out :-)
– hkotsubo
@hkotsubo Ok thanks. I edited the question so that it stays only with the subject of file creation. I think this is a good question. It would be possible to evaluate it ?
– JeanExtreme002
Yes, now it has improved. Before it had 2 problems not necessarily related in the same question
– hkotsubo
Just to complement, follow excerpt from the documentation: "Microsoft invented a Variant of UTF-8 (that Python 2.5 calls "utf-8-Sig") for its Notepad program: Before any of the Unicode characters is Written to the file, a UTF-8 encoded BOM (which looks like this as a byte Sequence: 0xef, 0xbb, 0xbf) is Written. ... On Decoding utf-8-Sig will Skip those three bytes if they appear as the first three bytes in the file. In UTF-8, the use of the BOM is discouraged and should generally be avoided."
– hkotsubo
And finally, if you’re reading/writing files, you don’t have to call
read
andwrite
, just dodata = json.load(file)
andjson.dump(data, file, indent=4, ensure_ascii=False)
(the methods areload
anddump
, without the "s" at the end - the versions with "s" -loads
anddumps
- are used to work directly with strings - although it should make no difference in the final result...)– hkotsubo