How to keep zero left on import in Python?

Asked

Viewed 1,120 times

2

Hello I have several TXT files with CPF numbers. CPF has 11 numbers, so you can have zeros left to complete this size I opened these files in Excel and gathered them in one. In the CPF column, before opening each file, I turned the Excel import command to as Text to keep the zeros. In xlsx format it worked

But when I recorded it as CSV I saw the zeroes on the left were gone

In Python 3 opened, to use pandas later, and I saw that really lost the zero on the left and turned int64:

candidatos = pd.read_csv("candidatos_excel_cpfinteiro_csv.csv",sep=';',encoding = 'latin_1')

candidatos.info()

    <class 'pandas.core.frame.DataFrame'>
RangeIndex: 26245 entries, 0 to 26244
Data columns (total 9 columns):
UF                  26245 non-null object
Estado              26245 non-null object
Cargo               26245 non-null object
Nome_completo       26245 non-null object
CPF                 26245 non-null int64
Nome_urna           26245 non-null object
Partido             26245 non-null object
Partido_completo    26245 non-null object
Situacao            26245 non-null object
dtypes: int64(1), object(8)
memory usage: 1.0+ MB

I tried to open the Excel file directly with the xlrd, but it got as number in the same way:

xlsx = pd.ExcelFile('candidatos_excel_cpfinteiro.xlsx')

candidatos = pd.read_excel(xlsx, 'Planilha1')

candidatos.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26245 entries, 0 to 26244
Data columns (total 9 columns):
UF                  26245 non-null object
Estado              26245 non-null object
Cargo               26245 non-null object
Nome_completo       26245 non-null object
CPF                 26245 non-null int64
Nome_urna           26245 non-null object
Partido             26245 non-null object
Partido_completo    26245 non-null object
Situacao            26245 non-null object
dtypes: int64(1), object(8)
memory usage: 1.0+ MB

Would someone please have a suggestion on how to keep zeros on the left? Or keep that column as a string?

  • 1

    how the cpfs are inside the . csv file? From what I understand of the question, excel already exports them without zeros.

  • Hello, thank you. You are always with 11 digites ("99493713415", "02376606446", "16436105468", "02108363262"...). Yes, when I record as CSV he loses the zero on the left

  • Hello, I think I explained wrong. In the original TXT is 11 digits ("99493713415", "02376606446", "16436105468", "02108363262"...). But when I turn into CSV in Excel then lose the zero on the left

2 answers

1

You can define the dtype at the time of import. For example:

candidatos = pd.read_csv("candidatos_excel_cpfinteiro_csv.csv",sep=';',
                         encoding = 'latin_1', dtype=str)

It is also possible to define the dtype individually, for example, {‘a’: np.float64, ‘b’: np.int32}.

For more details, I recommend documentation.

0

Try to import directly from excel, use this line of code you will not lose any value:

import pandas as pd

seu_dataframe = pd.read_excel('nome do seu arquivo .xlsx')
  • Thanks, but it didn’t work. Also cut the left zero when I care as you indicated above: the column looks like this - CPF 26245 non-null int64

  • But in excel file is with zero on the left and column formatted as text

Browser other questions tagged

You are not signed in. Login or sign up in order to post.