How to accent in Python

Asked

Viewed 1,296 times

3

I’m writing a program and it keeps giving ascii error.
I’ve put this on the first line and nothing:

# -*- coding: utf-8 -*-  

On request, follows edition explaining what is now.

now is variable for datetime and I believe it has no relation.

The accent on the console (terminal) is normal. What does not work is because I put to write in a file . txt and when put to write with accent that appears this error.

I did what mgibsonbr asked and gave everything ok. Use OS X.

So, by acting out:
- now is a variable for datetime that has nothing to do with accentuation (I believe);
- Accent error only appears when I write to a file. txt;

  • Could put the part of the code that gives error?

  • The code I want to put accent is this: f.write('- %s/%s/%s às %s:%s -\n' % (now.day, now.month, now.year, now.hour, now.minute)) And the error I get is this:;Traceback (most recent call last): &#xA; File "/Users/gabrielazevedo/Meus Trabalhos/Python/Calculadora de Imposto e Gorjeta.py", line 32, in <module> &#xA; f.write('- %s/%s/%s às %s:%s -\n' % (now.day, now.month, now.year, now.hour, now.minute)) &#xA;UnicodeEncodeError: 'ascii' codec can't encode character '\xe0' in position 12: ordinal not in range(128)

  • Edit your issue with the code and, if possible, put what is the now. (:

  • Dumb question, but are you sure your source file is encoded as UTF-8, right? To be sure, try putting x = "á" in the second line of the archive and print("ok") in the third, if he nay print "ok" so probably your source file is with the wrong encoding (if you are on Windows, it is probably Cp1252).

  • I edited it for you to read

1 answer

4


When you open a Python file for writing using built-in open:

with open("arquivo.txt", mode="w") as f:
    f.write("blá")

He assumes that this file is in the system’s default encoding (in Python 2, it assumes that the file is latin-1, which is bad). Or at least that’s what the documentation says, because its error suggests that the stream is rejecting characters above 0x7F, so that the encoding used seems to be ASCII.

Python 3 has a some different methods of handling character encodings, but the most recommended way to work with text files when you have full control of them is to provide explicit coding. Then the writing should occur smoothly:

with open("arquivo.txt", mode="w", encoding="utf-8") as f:
    f.write("blá")

For reference, I will also show how to do it in Python 2:

import codecs
with codecs.open("arquivo.txt", "w", "utf-8") as f:
    f.write(u"blá")

Of course, you can choose another character encoding for your output file if you want (it doesn’t have to be the same as the source file).

  • It worked perfectly! Thank you :)

  • @mgibsonbr Please am instead of "in some circumstances", and encoding ASCII, a read on this article: http://local.joelonsoftware.com/wiki/O_M%C3%Adnimo_absoluto_que_todos_programmeres_de_software_they need,_Absolutely,Positivamente_de_Saber_Sobre_Unicode_e_Conjuntos_de_Caracteres%28Sem_Desculpas! %29 --(do not feel constrained by the title ) - Although your answer works, you are with a lot of 'achism" for something that is deterministic: text encoding o - and will certainly take advantage of the information there.

  • @jsbueno I know this article. Where exactly in my answer did I miss? Python 3, when you do not specify a encoding, lets you read and write ASCII characters (bytes between 0x00 and 0x7F) at ease, and also leaves for example you read one byte above 0x80 and rewrite that same byte, without modification, in another file. Personally, it seems to me a default unsafe (not all encodings are compatible with ASCII, those used in Japan for example), but that’s what Python 3 does, so I thought it was important to mention. The reference link explains this behavior better.

  • 1

    @jsbueno Reli the documentation, it seems to me that Python 3 has several modes of operation: "best effort is acceptable", where the programmer knows that the encoding is ASCII-compatible but doesn’t quite know what it is. In this case, he specifies latin-1 as encoding and the thing works as I described above. Another way is "minimize risk of data corruption", similar scenario, but with different error handling. Others are "always use the default system encoding", "use an explicit encoding", "use a marker to auto-identify encoding".

  • 1

    @mgibsonbr - I saw some of your answers on the subject in Java - you understand more about strigns than I was finding by that answer. But then: even if Python can "guess" some encoding (it uses sys.getdefaultencoding() to create an automatic codec of open files in text mode) - it is a bad practice to let divination happen. And, as you yourself commented in Java’s replies: it’s not recommended to use legacy encodings with limited character representation - so I don’t see the use of legacy encodings latin-1in a new program that does not have to exchange data with legacy

  • @jsbueno You’re absolutely right! I think I’ll edit my answer to make it more explicit.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.