How to get the average sizes of a cluster with Pandas?

Asked

Viewed 1,599 times

1

Given a Pandas Dataframe, with the data in such a structure:

import pandas as pd

raw_data = {
    'tipo': ['a', 'a', 'b', 'c', 'c', 'c', 'd'],
    'ano': [2000, 2000, 2000, 2001, 2001, 2001, 2001],
}

df = pd.DataFrame.from_dict(raw_data)

I want to average the number of items of different types per year.

Grouping with: df.groupby(['tipo', 'ano']).size() I get the numbers of items by type in each year in a Pandas Series:

tipo  ano 
a     2000    2
b     2000    1
c     2001    3
d     2001    1
dtype: int64

I want to get the averages of those numbers per year, as in:

ano     media
2000    1.5
2001    2.0

for the purpose of plotting them using Pandas.

I tried to do this with Pandas, but after a while trying to use the API and failing I gave up and did it with Python even, using a dictionary and calculating the averages "in hand".

There is a simple way to do this using the abstractions and API of Pandas itself?

1 answer

3


In accordance with answer I just got in the OS in English, the solution is to make another groupby specifying the meter level:

df.groupby(['tipo', 'ano']).size().groupby(level=1).mean()

ano
2000    1.5
2001    2.0
dtype: float64

Browser other questions tagged

You are not signed in. Login or sign up in order to post.