Divide date (day, month, year) into new columns - Dataframe Pandas

Asked

Viewed 2,524 times

2

I have a DataFrame and needed to give a split in his date field to later add month and day columns.

The problem is that the field data Dataframe is not the type str, so I can’t use the method split.

Structure of the Dataframe

data        usuarios
2018-01-01  215.0

Goal

data          usuarios      ano      mes
2018-01-01    215.0         2018     01

Attempts

# df.data.apply(str)
# df["data"].apply(str)
# df["data"].astype(basestring)
df.data.str.split("-")

I’ve tried those ways but none of them solved my problem.

I also tried using the official documentation but I couldn’t solve it either.

Error Displayed

Attributeerror: Can only use . str accessor with string values!

  • you could mark one of the answers as accepted. Understand the importance of this link: https://pt.meta.stackoverflow.com/questions/1078/como-e-por-que-aceitar-uma-reply

3 answers

3

This error happens because your column data is not the type str, and yes of the type datetime64. To see the column types of your Data Frame, just do

>>> df.dtypes
data        datetime64[ns]
usuarios           float64

To catch the year and month of a column datetime, just do:

df['ano'] = df['data'].dt.year
df['mes'] = df['data'].dt.month

1

Hello, I believe the simplest way to do it is to convert the date to datetime in pandas. In the case of your code is not clear but I believe it is Year-Month-Day, if not just reorganize in the code below:

df['data'] = pd.to_datetime(chamadosClientes['data'], format='%Y-%m-%d')

After that you place the parts you want in a new column

df['ano']  = df['data'].dt.strftime('%Y')
df['mes'] = df['data'].dt.strftime('%m')

If you want in the future to put in some graph per month in the year, it can be done using more than one value in the code

df['mes_ano'] = df['data'].dt.strftime('%m-%Y')

I believe this makes it more logical and easy to manipulate, since you can extract the specific information and reorder it as you like, dd/mm/yyyy or mm/dd/yyyy.

1


You can use lambda to turn the date variable into string and then use slicing to select the part of the date you want. In your case:

df['ano'] = df['data'].apply(lambda x: str(x)[:4])
df['mes'] = df['data'].apply(lambda x: str(x)[5:7])

A replicable example:

import pandas as pd

#criando um banco de exemplo
df = pd.to_datetime(pd.DataFrame({'year': [2015, 2016]*30, 
'month': [2, 3]*30,'day': [4, 5]*30}, 
index = range(60))).to_frame().join(pd.DataFrame({'dados': range(60)}, 
index = range(60)))
df.columns = ['data', 'dados']


#Criando as colunas de ano e mês
df['ano'] = df['data'].apply(lambda x: str(x)[:4])
df['mes'] = df['data'].apply(lambda x: str(x)[5:7])

print(df.head())

Output:

        data  dados   ano mes
0 2015-02-04      0  2015  02
1 2016-03-05      1  2016  03
2 2015-02-04      2  2015  02
3 2016-03-05      3  2016  03
4 2015-02-04      4  2015  02

Browser other questions tagged

You are not signed in. Login or sign up in order to post.