How to apply Transform to two variables in pandas?

Asked

Viewed 51 times

1

I would like to create a new variable which is the scalar product of two variables in a sub-group of my dataframe. I know that when I want to create a group-level variable, I can use the method transform. For example, in the example dataset auto (download) I can calculate the average price of brands as follows:

import pandas as pd
import numpy as np

df=pd.read_csv("auto.csv")
#cria a variável de marca
df['brand']=[k[0] for k in df.make.str.split(" ")]

#média dos preços
df['price_mean']=df.groupby("brand")['price'].transform(np.mean)
df.head()

Upshot:

| make          |   price |   mpg |   rep78 |   headroom |   trunk |   weight |   length |   turn |   displacement |   gear_ratio | foreign   | brand   |   price_mean |
|:--------------|--------:|------:|--------:|-----------:|--------:|---------:|---------:|-------:|---------------:|-------------:|:----------|:--------|-------------:|
| AMC Concord   |    4099 |    22 |       3 |        2.5 |      11 |     2930 |      186 |     40 |            121 |         3.58 | Domestic  | AMC     |      4215.67 |
| AMC Pacer     |    4749 |    17 |       3 |        3   |      11 |     3350 |      173 |     40 |            258 |         2.53 | Domestic  | AMC     |      4215.67 |
| AMC Spirit    |    3799 |    22 |     nan |        3   |      12 |     2640 |      168 |     35 |            121 |         3.08 | Domestic  | AMC     |      4215.67 |
| Buick Century |    4816 |    20 |       3 |        4.5 |      16 |     3250 |      196 |     40 |            196 |         2.93 | Domestic  | Buick   |      6075.29 |
| Buick Electra |    7827 |    15 |       4 |        4   |      20 |     4080 |      222 |     43 |            350 |         2.41 | Domestic  | Buick   |      6075.29 |

I don’t know how to do a similar procedure using two variables. Concretely, the variable I want to create is called result and is the result of the scalar product of a dummy variable called size and the variable price. For a brand would look like this:

df['size']=np.where(df.length>200, 1,0)

tab=df[df.brand=='Olds']
tab['result']=np.dot(tab['price'], tab['size'])
tab

Upshot:

| make           |   price |   mpg |   rep78 |   headroom |   trunk |   weight |   length |   turn |   displacement |   gear_ratio | foreign   | brand   |   price_mean |   size |   result |
|:---------------|--------:|------:|--------:|-----------:|--------:|---------:|---------:|-------:|---------------:|-------------:|:----------|:--------|-------------:|-------:|---------:|
| Olds 98        |    8814 |    21 |       4 |        4   |      20 |     4060 |      220 |     43 |            350 |         2.41 | Domestic  | Olds    |      6050.86 |      1 |    24075 |
| Olds Cutl Supr |    5172 |    19 |       3 |        2   |      16 |     3310 |      198 |     42 |            231 |         2.93 | Domestic  | Olds    |      6050.86 |      0 |    24075 |
| Olds Cutlass   |    4733 |    19 |       3 |        4.5 |      16 |     3300 |      198 |     42 |            231 |         2.93 | Domestic  | Olds    |      6050.86 |      0 |    24075 |
| Olds Delta 88  |    4890 |    18 |       4 |        4   |      20 |     3690 |      218 |     42 |            231 |         2.73 | Domestic  | Olds    |      6050.86 |      1 |    24075 |
| Olds Omega     |    4181 |    19 |       3 |        4.5 |      14 |     3370 |      200 |     43 |            231 |         3.08 | Domestic  | Olds    |      6050.86 |      0 |    24075 |
| Olds Starfire  |    4195 |    24 |       1 |        2   |      10 |     2730 |      180 |     40 |            151 |         2.73 | Domestic  | Olds    |      6050.86 |      0 |    24075 |
| Olds Toronado  |   10371 |    16 |       3 |        3.5 |      17 |     4030 |      206 |     43 |            350 |         2.41 | Domestic  | Olds    |      6050.86 |      1 |    24075 |

What I tried to do was:

df.groupby('brand')[['price','size']].transform(lambda x,y: np.dot(x,y))

But it didn’t work. How can I do an operation on two variables using transform?

  • I believe it depends on the function you are using, in this case you may have to use the apply: df.groupby('brand')[['price','size']].apply(lambda x: np.dot(x['price'],x['size'])). Hug!

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.