Error when transporting excel data to python - date format

Asked

Viewed 483 times

1

My code consists of pulling information from an Excel file and transporting it to a database through Python. But in data type fields (formatted in Excel itself) when importing the data they are formats for text type where the value is the difference between the current date and 01/01/1900 (Excel’s date.value function). Does anyone know a way to export the date correctly without having to format the table in hand?

*NOTE: I already tried to change the type of the field in the spreadsheet, but when performing this it goes back to the kind of running days.

The first image shows how the data comes:

Como a data é mostrada numa planilha sem tratamento

In the second picture, in the first cell is the date with "date" formatting the second is the same value with 'general' formatting"

inserir a descrição da imagem aqui

  • Suellen, you’re not informing how you’re getting the Excel data, are you using any library? It’s important to add the code of what you tried in your question so it’s easier to help you.

  • You have to use the function =Eval() python to catch exactly what is being sent, in this case the date. However, as Fernando said, without the code it is impossible to help you beyond that

  • If you’re not sure what you’re doing NEVER use eval() no language. Your problem is another, Excel saves dates as integers representing the number of days since 01/01/1900.

1 answer

1


In Excel, dates are saved as integers representing how many days have passed since 01/01/1900.

If it were that simple it would be enough to create the initial date and add the days with a timedelta:

from datetime import date, timedelta

# código ERRADO    
def converte_data_excel(qtde_dias):
    data_inicial = date(1900, 1, 1)
    return data_inicial + timedelta(days=qtde_dias)

dias = [0, 1, 60, 42_005, 43_663]

for dia in dias:
    data = converte_data_excel(dia)
    print(f"{dia:>5}: {data:%d/%m/%Y}")

Code "working" on Repl.it


But not everything is pink, the exit from the previous code is:

    0: 01/01/1900
    1: 02/01/1900
   60: 02/03/1900
42005: 03/01/2015
43663: 19/07/2019

Using Excel online I put the integer values in one column and in the other the integer is converted to date by Excel itself:

Dados do Excel


We have two problems here:

  1. The first is that the whole is not exactly the number days since 01/01/1900, because if it were, the number 0 would represent the day 01/01/1900 and not the day 02/01/1900, because no day went from 01/01/1900 until 01/01/1900. And strangely when converting the number zero to date in Excel the date is shown as 00/01/1900 (you will understand).

    So the first correction to be made is to decrease 1 of the number of days to compensate for this bizarreness, or decrease a day of data_inicial for the same reason.

    data_inicial = date(1899, 12, 31)
    
  2. The second problem is that Excel considers that 1900 was a leap year, but this is not true. Years ending in "00" are only leap if they are divisible by 400, and this is not the case.

    So it is necessary to compensate this day by decreasing in 1 the number of days of the value if it is greater than or equal to 60.

    if qtde_dias >= 60:
        qtd_dias -= 1
    

With these details fixed, the code is now able to correctly convert the entire values received from the spreadsheet:

from datetime import date, timedelta

def converte_data_excel(qtde_dias):
    data_inicial = date(1899, 12, 31)

    if qtde_dias >= 60:
        qtde_dias -= 1

    return data_inicial + timedelta(days=qtde_dias)

dias = [0, 1, 60, 42_005, 43_663]

for dia in dias:
    data = converte_data_excel(dia)
    print(f"{dia:>5}: {data:%d/%m/%Y}")

Code working on Repl.it

Now the way out is:

    0: 31/12/1899
    1: 01/01/1900
   60: 28/02/1900
42005: 01/01/2015
43663: 17/07/2019

Remembering that the day 29/02/1900 did not exist, if you really want to replicate this value in python will need to treat in some other way, be returning None, making an exception or whatever you think is best.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.