How does "parse" work for handling dates in Python?

Asked

Viewed 234 times

-2

I’m using a sequence of codes to apply in a forecasting:

'''
     dataparse = lambda dates: pd.datetime.strptime(dates, '%d/%m/%Y')

     df = pd.read_csv('BBSE3.csv', encoding ='utf8', sep=';', parse_dates = 
          ['Data'], index_col = 'Data', date_parser = dateparse)
'''

But is returning the following error:

~\Anaconda3 lib_strptime.py in _strptime(data_string, format) 360 if not found: 361 raise Valueerror("time data %r does not match format %r" % --> 362 (data_string, format)) 363 if Len(data_string) != found.end(): 364 raise Valueerror("unconverted data remains: %s" %

Valueerror: time data '02/01/2018' does not match format 'dd/mm/yyyy'

df.head(5) 

Data    Valor
0   2018-02-01  28.7
1   2018-03-01  28.72
2   2018-04-01  28.78
3   2018-05-01  28.97
4   2018-08-01  29.14

Invert the date format reference to "%Y/%m/%d" but the error persists:

ValueError: time data '02/01/2018' does not match format '%Y/%m/%d'

  • 2

    There is some other problem there, in parts of your code or data that are not in the question. If we try to apply the example you have there, parse works, only with atda.. Try adding some lines of the CSV file in your question.

  • Transform the spine Date for the format datetime after uploading the file would be a valid answer for you? Or the question is unique to the parse within the command read_csv?

  • @Terry, it could be during the loading or after, I just thought to do during the load to expedite the process in already being with the data in the appropriate format for the statistical treatments that I will perform. Is there any best practice in this regard?

1 answer

0


I like (personal opinion) to parse after loading the data for two reasons: (1) Make the code more readable and (2) with the command pd.to_datetime it is possible to handle errors that may occur during the transformation. Just try to load the data without parsing, and then, use the to_datetime treating any errors with the parameter coerce(invalid dates will be set as NaT):

df = pd.read_csv('BBSE3.csv', encoding ='utf8', sep=';')

df['Data'] = pd.to_datetime(df['Data'], format = '%d/%m/%Y', errors = 'coerce')
df.set_index('Data', inplace = True)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.