How to group data from another grouping?

Asked

Viewed 82 times

1

I have the following dataframe:

x= {"Nome": ['Carla','Rafael','Juliana','Carla','Carla','Juliana','Rafael','Juliana','Rafael'],
    "Salario": [1200,1500,2000,1230,1250,2050,1700,2500,1750]}
tabela_salarios=pd.DataFrame(x)

I need to calculate the percentage of variation between the three salaries so that it appears as follows:

Nome  Variação Salário
Carla 1
Carla 1.25
Carla 1.016

And then calculate that cumulative percentage, which would be the product of all the variations:

Nome Porcentagem acumulada
Carla 1.27

Does anyone know how to do this using groupby?

  • User123, good afternoon! Is this variation of 1.25 correct? Is the 1.016 correct? Hug!

1 answer

1


A possible solution

df['Variação_Salario'] = df.sort_values('Nome').groupby(['Nome'])['Salario'].pct_change().fillna(0).add(1)
df['Porcentagem acumulada'] = df.sort_values('Nome').groupby(['Nome'])['Variaco_Salario'].cumprod()
df = df.sort_values('Nome')
  1. First line we group by name and use pct_change of pandas
  2. Second line we do something similar to the first but we use the cumprod of pandas to calculate the product of the variation
  3. We sort the values to display the data frame

Exit:

    Nome    Salario  Variaco_Salario    Porcentagem acumulada
0   Carla   1200        1.000000               1.000000
3   Carla   1230        1.025000               1.025000
4   Carla   1250        1.016260               1.041667
2   Juliana 2000        1.000000               1.000000
5   Juliana 2050        1.025000               1.025000
7   Juliana 2500        1.219512               1.250000
1   Rafael  1500        1.000000               1.000000
6   Rafael  1700        1.133333               1.133333
8   Rafael  1750        1.029412               1.166667

Another possible solution:

I separated the code into 3 blocks to summarize the explanation

  1. First block sorts the data frame and stretches(pivot) the data frame, turning row into column;
  2. The second block creates the variation of the salary and the accumulated percentage using pct_change() and cumprod() of pandas
  3. Create new data frames with results
df['i'] = df.groupby('Nome')['Nome'].cumcount()
df1 = df.set_index(['i','Nome']).stack().unstack(1).reset_index(drop = True)

pct_change = df1.pct_change().fillna(0).add(1)

vsalario = pct_change.unstack().reset_index().drop(columns = 'level_1')
pa = pct_change.cumprod().unstack().reset_index().drop(columns = 'level_1')

Exit:

vsalarium

    Nome    0
0   Carla   1.000000
1   Carla   1.025000
2   Carla   1.016260
3   Juliana 1.000000
4   Juliana 1.025000
5   Juliana 1.219512
6   Rafael  1.000000
7   Rafael  1.133333
8   Rafael  1.029412

pa

    Nome    0
0   Carla   1.000000
1   Carla   1.025000
2   Carla   1.041667
3   Juliana 1.000000
4   Juliana 1.025000
5   Juliana 1.250000
6   Rafael  1.000000
7   Rafael  1.133333
8   Rafael  1.166667

Browser other questions tagged

You are not signed in. Login or sign up in order to post.