Calculate the average of a variable for each type of flower in a column

Asked

Viewed 323 times

0

In the Species column of the dataset, I have these flower species:

df['species'].unique()
output: array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']

I need to average for each flower species using the variable sepal_width of dataset:

df.sepal_width.head()
0    3.5
1    3.0
2    3.2
3    3.1
4    3.6

I only know how to make one by one using this code, for example:

   especie_iris_setosa = df[df['species'] == 'Iris-setosa'] #traz todas as linhas que contenham a especie Iris-setosa
   especie_iris_setosa['sepal_length'].mean()
   output: 5.005999999999999

How can I make a loop that averages the sepal_width for each species of flower in the column species?

I think it would be something like this: for i in df.species: but I don’t know how to.

1 answer

0

Group the dataset by the flower species, select the column width of the sepals and apply the average function in each group.

df_medias = df.groupby(['species'], as_index=False)['sepal_width'].mean()

The df_medias is a new dataframe with 2 columns:

  • species, contains the name of each species; and
  • sepal_width, contains the average width of the sepals of that species.

References:

Browser other questions tagged

You are not signed in. Login or sign up in order to post.