Ordering of data Pandas

Asked

Viewed 55 times

2

I have a table . cvs and accurate

Print the employee with the highest amount billed each month, including employee name and total (example: "August 2020 - João - Total Billed: 150");

I got with the code below to get next to it.

Imagery: https://prnt.sc/xidwq9

# Read File
df = pd.read_csv('https://..../employee_billing.csv',sep=';')
df['Month'] = pd.Categorical(df['Month'],categories=["Apr", "May", "Jun", "Jul", "Aug", "Sep", "Nov", "Dec"],ordered=True)
df = df.sort_values(by=['Month', 'Day', 'Year'])
group = df.groupby(["Month", "Year", "Nome"]).sum()
billed = group["Billed"].groupby(level=0, group_keys=False)
billed.nlargest(1)
print(billed.nlargest(1))

But I can’t put the Billed Total (Total) in the last column.

  • Housekore, good afternoon! Provide test data, this way people can help you!

1 answer

2


I don’t know if there’s a single step to this, but with two it would be:

Group by Month and Name and add Billed

>>> df1 = df.groupby(["Month","Nome"])["Billed"].sum()

>>> df1

Month  Nome
Apr    Billy     18
       John      80
       Laura    112
       Mike     215
       Paul     250
       Sandy     60
Aug    Craig     20
       John     120
       Kate      62
       Laura    166
       Mike      70
Dec    Craig     49
       John     345
(...)

Catch the biggest of each group

>>> df2 = df1.loc[df1.groupby(level=0).idxmax()]

>>> df2

Month  Nome
Apr    Paul     250
Aug    Laura    166
Dec    John     345
Jul    Billy    205
Jun    Tom      210
May    Sandy    319
Nov    Craig    280
Sep    Mike     338
Name: Billed, dtype: int64

Update

The generated result is of Series type

To name the data column

>>> df3 = pd.DataFrame(df2, columns=["Billed"])

>>> df3

             Billed
Month Nome
Apr   Paul      250
Aug   Laura     166
Dec   John      345
Jul   Billy     205
Jun   Tom       210
May   Sandy     319
Nov   Craig     280
Sep   Mike      338

Note Month and Name columns are indexes.

To turn them into columns use

>>> df3.reset_index(inplace=True)

>>> df3
  Month   Nome  Billed
0   Apr   Paul     250
1   Aug  Laura     166
2   Dec   John     345
3   Jul  Billy     205
4   Jun    Tom     210
5   May  Sandy     319
6   Nov  Craig     280
7   Sep   Mike     338

If you want to rename the column to another name use:

>>> df3.rename(columns={"Billed": "Total Billed"}, inplace=True)

>>> df3
  Month   Nome  Total Billed
0   Apr   Paul           250
1   Aug  Laura           166
2   Dec   John           345
3   Jul  Billy           205
4   Jun    Tom           210
5   May  Sandy           319
6   Nov  Craig           280
7   Sep   Mike           338

In time: may have a more direct way of doing all this...

I hope it helps

  • Thank you very much Paulo, I also got this exit. It would be possible to put in the last column the name "Total Billed", for example?

  • I updated the answer

Browser other questions tagged

You are not signed in. Login or sign up in order to post.