-2
Though I see it’s been solved.
When we have a categorical variable, depending on the case, the use of "Category" is ideal.
Creating Test Dataframe
>>> df = pd.DataFrame({"Sex": ["male","male","female","male","female","female","male","male"]})
>>> df
Sex
0 male
1 male
2 female
3 male
4 female
5 female
6 male
7 male
Applying category
>>> df["Category"] = df["Sex"].astype("category")
>>> df["Category"].cat.categories = [0,1]
>>> df
Sex Category
0 male 1
1 male 1
2 female 0
3 male 1
4 female 0
5 female 0
6 male 1
7 male 1
Edited on 25/3/2021
Based on the @Woss question: "How would you define which is 0 and which is 1?"
Answer: The definition is in alphabetical order. That’s why 0 is associated with Female and 1 to but
See another example:
New test base
>>> df = pd.DataFrame({"Sex": ["outro", "male","male","female","male","outro","female","female","outro","male","male","outro"]})
>>> df
Sex
0 outro
1 male
2 male
3 female
4 male
5 outro
6 female
7 female
8 outro
9 male
10 male
11 outro
Applying categories
>>> df["Category"] = df["Sex"].astype("category")
>>> df["Category"].cat.categories = [0,1,2]
>>> df
Sex Category
0 outro 2
1 male 1
2 male 1
3 female 0
4 male 1
5 outro 2
6 female 0
7 female 0
8 outro 2
9 male 1
10 male 1
11 outro 2
Note that even if another is the first item, it gets category 2. If you want to associate in another order, something like [0,2,1]
would lead to 0=Female, 2=Male, 1=other
If
Sex
can befemale
andmale
, all tiny, why in themap
you putFemale
andMale
?– Woss
That’s really it, thank you
– Lucas Gonçalves e Silva