13
I’m trying to list a folder (files, subfolders) in Python [2.7 on Windows XP], and I’m having problems with sharp files. I know the method os.listdir
behaves differently if the argument is a single string or a Unicode string. My problem is that I have encoded files in different ways:
>>> import os
>>> os.listdir('teste')
['a\xb4rvore.jpg']
>>> os.listdir(u'teste')
[u'a\u0301rvore.jpg']
>>> os.listdir('teste2')
['\xe1rvore.txt']
>>> os.listdir(u'teste2')
[u'\xe1rvore.txt']
In Windows Explorer, both files look normal: árvore.jpg
and árvore.txt
. But while the second is listed normally, the first gives an error message no matter how I access it:
def imprimir(pasta):
print pasta
for x in os.listdir(pasta):
sub = os.path.join(pasta, x)
if os.path.isfile(sub):
print sub
else:
imprimir(sub)
>>> imprimir('teste2')
teste2
teste2\ßrvore.txt
>>> imprimir(u'teste2')
teste2
teste2\árvore.txt
>>> imprimir('teste')
teste
teste\a┤rvore.jpg
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "teste.py", line 11, in imprimir
imprimir(sub)
File "teste.py", line 6, in imprimir
for x in os.listdir(pasta):
WindowsError: [Error 3] O sistema nÒo pode encontrar o caminho especificado: 'teste\\a\xb4rvore.jpg/*.*'
>>> imprimir(u'teste')
teste
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "teste.py", line 9, in imprimir
print sub
File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0301' in position 7: character maps to <undefined>
How do I access this other file? I don’t think it is with corrupted name as a\u0301
is a valid manner to take place á
. However, I don’t know how to access it, and I have a volume with several files in this format (I can avoid producing similar files in the future, but I still need to process existing ones), I find it impracticable to convert them by hand.
Apparently this is a bug in version 2.7 of the language, since the same code posted works perfectly with version 3.3.2 (which I have installed), modifying only the use of the function
print
.– Zignd