How to unify dates with Python dataframe

Asked

Viewed 30 times

0

Good afternoon guys, I have the following situation. I have a spreadsheet with the following columns: name, surname, dates(from 2011 until 2021)

Follow the table to view:

inserir a descrição da imagem aqui

As you can see, these dates are out of order. How can I sort them without changing the columns name and surname?

I came to execute the code in the following way:

df_json_meses = df_json1.iloc[:, 58:64] # Coluna referente aos meses de ago até dez/2020

df_json_meses2 = df_json1.iloc[:, 52:55] # Coluna que se refere aos meses jan a maio/2021

df_json_meses3 = df_json1.iloc[:, 99] # coluna referente ao mês de jun/2021

df_json_meses4 = df_json1.iloc[:, 55:57] # coluna referente aos meses Jul e Ago/2021

df_json_soma_meses = pd.concat([df_json_meses, df_json_meses2 , df_json_meses3, 

df_json_meses4], Axis=1)

df_json_nome = df_json1.iloc[:, :2]


df_json_unico = pd.concat([df_json_nome, df_json_soma_meses], axis=1)

I managed to organize as I wanted, only that I answer me for this report, now if I generate a new report, and generate new columns or fewer columns and the amount is smaller or greater than I have set in the code, I have to change in hand? If so, I don’t want it. How can I automate this code in the best way possible?

Follow the final result of the report:

inserir a descrição da imagem aqui

I really appreciate anyone who can help.

1 answer

1

Starting from the following dataset as an example:

df = pd.DataFrame({
    'Nome': ['João', 'José'],
    'Sobrenome': ['Da Silva', 'Soares'],
    '2011-02': [10, 20],
    '2009-08': [90, 200],
    '2011-12': [1, 5],
    })
print(df)

# output:
#    Nome Sobrenome  2011-02  2009-08  2011-12
# 0  João  Da Silva       10       90        1
# 1  José    Soares       20      200        5

First, reorder all columns with df.sort_index (how dates are in format YYYY-MM, the default sorting algorithm is able to sort them in ascending order):

df = df.sort_index(axis=1)
print(df)

# output:
#    2009-08  2011-02  2011-12  Nome Sobrenome
# 0       90       10        1  João  Da Silva
# 1      200       20        5  José    Soares

The columns of dates were ordered - all that remains is to bring the columns Nome and Sobrenome forward. We can do this using df.pop and df.insert to remove the column and reinsert it at the beginning of the dataframe:

df.insert(0, "Sobrenome", df.pop('Sobrenome'))
df.insert(0, "Nome", df.pop('Nome'))
print(df)

# output:
#    Nome Sobrenome  2009-08  2011-02  2011-12
# 0  João  Da Silva       90       10        1
# 1  José    Soares      200       20        5

Finally, if you want the date columns to be in descending order, just pass the argument ascending=False when calling the method df.sort_index.

  • Good morning jfaccioni. Thanks for the help.. I did it. I didn’t need to use df.pop and df.Insert to remove and insert the column at the beginning. I did so: df.sort_index (Axis = 1, inplace = True, ascending = False) and it worked.

  • Another question jfaccioni, sorry to bother you. I want to pick up a column-specific Qtde from 2021-08 until 2020-09. I made my code like this below: df_names = df.iloc[:, :2] df_months = df.iloc((:, 2:14) - 2:14 Qtde of date columns I want. df_unico = pd.Concat([df_names, df_months], Axis = 1)&#Xa

  • Another question jfaccioni, sorry to bother you. I want to pick up a column-specific Qtde from 2021-08 until 2020-09. I made my code like this below: df_names = df.iloc[:, :2] df_months = df.iloc((:, 2:14) - 2:14 Qtde of date columns I want. df_unico = pd.Concat([df_names, df_months], Axis = 1) df_unico.to_excel('test.xlsx', index = False) My question is whether, for example, I generate a new report in which I increase or decrease the number of date columns, this code I made will work smoothly or I will have to change in hand?

  • Hi @Fabriciorezende, the simplest way I can think is to use the column name (method .loc) instead of its position (method .iloc which you are using). For example, start from cols = ['2021-08', '2021-07', ... , '2020-10', 2020-09'] and then write down df.loc[:, cols]. Your problem becomes "how to create a list of all months between a final date '2021-08' and an initial date '2020-09'". I suggest searching for the answer on this and other websites, or opening a new question if you don’t find it.

  • About your call df.sort_index(axis=1, inplace=True, ascending=False), should work yes, because the ascending=False sorts the columns in reverse order that I demonstrated in my reply. Just note that this causes the column 'Sobrenome' appear before 'Nome', but if that’s not a problem for you, then you really don’t need to use df.pop and df.insert.

  • Gratitude @jfaccioni. I managed to make it work. I gave a Sort.index on my dataframe df_months. It was the way I need it. Vlw same... Hugs.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.