Python - Select 2 columns from a DF and sort them

Question

Python - Select 2 columns from a DF and sort them

Asked 6 years, 11 months ago

Viewed 5,205 times

0

I am new in the world of programming, and I am doing some studies with the aim of gaining knowledge in the area of Data Science.

Come on... I have a Dataframe with a lot of information, among it gender and age. I want to bring the amount of lines of each gender (male and female) and classify them as children (0 to > 12 years), youth (12 to > 18 years) and adults (18+ years).

I’m lost to the point of not even knowing if I got it right...

Input: df.groupby("Sex").Age.unique()
Output: 
Sex
female    [38.0, 26.0, 35.0, 27.0, 14.0, 4.0, 58.0, 55.0...
male      [22.0, 35.0, 29.0, 54.0, 2.0, 20.0, 39.0, 34.0...
Name: Age, dtype: object

Variável:
classification = df.groupby("Sex").Age.unique()

Now I imagine I have to make a for loop, is that it? But how to name each case.

To know the quantity just do len(classification[i]), i equals 0 for Female and 1 for Male. To sort, see if this link help you

– AlexCiuffa

2018/08/21 at 14:58

3 answers

2

From this Dataframe:

# -*- coding: utf-8 -*-
import pandas as pd

d = {'Sex':['female','female', 'female', 'female', 'male', 'male','male','male'],
     'Age':[38.0,26.0,4.0,14.0,33.0,24.0,7.0,16.0]}

df = pd.DataFrame(data=d)

>>> print(df)
    Age     Sex
0  38.0  female
1  26.0  female
2   4.0  female
3  14.0  female
4  33.0    male
5  24.0    male
6   7.0    male
7  16.0    male

We make the classification by age:

def define_classe(idade):
    if idade >= 18:
        return 'Adulto'
    elif idade >= 12:
        return 'Jovem'
    return 'Criança'

df['Classification'] = df['Age'].map(define_classe)
>>> print(df)
    Age     Sex Classification
0  38.0  female         Adulto
1  26.0  female         Adulto
2   4.0  female        Criança
3  14.0  female          Jovem
4  33.0    male         Adulto
5  24.0    male         Adulto
6   7.0    male        Criança
7  16.0    male          Jovem

And now just filter the fields. In the example, Adult Man:

>>> print (len(df.loc[df['Classification'] == 'Adulto'].loc[df['Sex'] == 'male']))
2

Another way would be to directly filter the values, without doing the classification before:

>>> df.loc[df['Age'] >= 18].loc[df['Sex'] == 'male']
    Age   Sex Classification
4  33.0  male         Adulto
5  24.0  male         Adulto

>>> print(len(df.loc[df['Age'] >= 18].loc[df['Sex'] == 'male']))
2

>>> print(df.loc[df['Age'] >= 12].loc[df['Age'] < 18].loc[df['Sex'] == 'male'])
    Age   Sex Classification
7  16.0  male          Jovem

>>> print(len(df.loc[df['Age'] >= 12].loc[df['Age'] < 18].loc[df['Sex'] == 'male']))
1

It worked out Alex. Thank you very much!

– Leandro Baruch

2018/08/21 at 17:43

Browser other questions tagged python-3.x

You are not signed in. Login or sign up in order to post.

by Júlio Cesar Pereira Rocha • **161** points · Answer 1 · 2018-08-21T15:34:06+00:00

If you just want to exchange the values of these columns for children, young people and adults you can use the method . apply in each column:

first you create a function:

def classifica_idade(x):
    if < 12:
        returne criança
    elif x >= 12 and x <= 18:
        returne joven
    returne adulto

this done just go into the dataframe and apply in the column you want as follows: Classification['coluna_desired'] = Classification['coluna_desired']. apply(classifica_age())

also works with Dual functions.

by Leandro Baruch • 3 points · Answer 2 · 2018-08-21T16:10:56+00:00

Maybe I haven’t been clear... my goal is to have a result similar to this:

Children Female: x amount

Young Female: y amount

Adult Female: z amount

Children Male: n amount

Young Male: k amount

Adult Male: j amount

I have evolved by creating a new dataframe with only the Sex and Age columns. I think it will be easier to continue from here...