How to solve this encoding error (encoding error) in Pandas

Asked

Viewed 817 times

3

I’m having trouble when python reads the xlsx with pandas. When it runs the des_pt = (f_pt.head()[pt][0]).encode('utf-8').strip() and puts the variable pt. Has an Encode problem because some characters are in utf-8.

import pandas as pd

create_result = open('resultado.json', 'w')
i = 0

file_name_pt = pd.ExcelFile('pt.xlsx', encoding='utf-8')
file_name_en = pd.ExcelFile('en.xlsx')

f_pt = pd.read_excel(file_name_pt, sheet_name='Sheet1')
title_pt = f_pt.columns[1:]

f_en = pd.read_excel(file_name_en, sheet_name='Sheet1')
title_en = f_en.columns[1:]

create_result.write('{\n"resultados": [\n')
while i <= 25:
    for pt,en in zip(title_pt, title_en):
        print pt
        pt = pt.encode('utf-8').strip()
        en = en.encode('utf-8').strip()
        print pt

        des_pt = (f_pt.head()[pt][0]).encode('utf-8').strip()
        des_en = (f_en.head()[en][0]).encode('utf-8').strip()

        print des_pt     
        create_result.write('{\n"id":%s,\n"nome":"%s",\n"name":"%s",\n"descricao":"%s",\n"description":"%s",\n"combinacoes":[]},\n'%(i, pt, en, '', des_en))
        i+=1
create_result.write(']\n}')
create_result.close()
print 'Done'

The error message

Traceback (most recent call last):
  File "/Users/atila/Desktop/PyAutomate/firjan_result_generator/firjangenerator.py", line 23, in <module>
    des_pt = (f_pt.head()[pt][0]).encode('utf-8').strip()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
    values = self._data.get(item)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3066, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'T\xc3\xa9cnico em Energias Renov\xc3\xa1veis'
  • Atila, consider writing the text in Portuguese, to be answered by the OS community at english take a look at the forum from there directly. Try not to title the post with the error log as it adds little description of your problem.

  • Oops! That’s cool. I didn’t pay attention to that detail

  • 1

    The problem is that Voce modifies the key and tries to access it later. If you comment on this line pt = pt.encode('utf-8').strip(), keeps giving error?

  • Nop! The code works well without this line

1 answer

0

You could provide the format and some samples of the content (spreadsheets) you are trying to read?

With the description of your question, the only thing I can contribute is the following:

The mistake KeyError: 'T\xc3\xa9cnico em Energias Renov\xc3\xa1veis' happens because it is accessing a key structure, value (key, value) and it does not find the key, in case the string’T xc3 xa9cnico in Energies Renewable xc3 xa1veis'.

To help you more, maybe make the content (at least the first lines) of the files you are reading available.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.