3
I need to do a calculation in Rstudio that is at first simple to do in excel. We have a table with two columns, one with occupation codes (Occupation Column), and another column with two codes for women, being 0 for men and 1 for women (Women Column). I need to add the number of women per occupation, and with this value calculate the percentage of women that each occupation. Our original table presents more than 360 codes of occupation, with more than 250 thousand observations, making it impossible to use excel.
Dice:
df_1 <- structure(list(Ocup = c(11, 12, 13, 11, 11, 12, 12, 13, 13,
13, 11, 12, 12, 13, 13, 11, 12, 11, 13, 13, 11, 12, 13, 12, 12, 12, 13,
11, 12, 11), Mulher = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0,
1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0)), row.names = c(NA, 30L),
class = "data.frame")
Welcome to Stack Overflow! Your question seems to have some problems and your experience here in Stack Overflow may not be the best because of this. We want you to do well here and get what you want, but for that we need you to do your part. Here are some guidelines that will help you: Stack Overflow Survival Guide in English (short version). If the help is very simple it is still possible to do in the comments.
– Maniero
Hello, @Mario. Avoid posting images as a data table. Cannot play your example this way.
– neves
Perform this in
R
:dput(head(dados, 30))
. After, copy and paste here.– neves
Hello @neves, but there are a lot of data about 212 thousand observations.
– Mario Filizzola
It’s only two columns, right? Run the function I told you (it will return only 30 rows). Then copy and paste here. Or keep my edit.
– neves
Structure(list(Ocup = c(11, 12, 13, 11, 11, 12, 12, 13, 13, 13, 11, 12, 12, 13, 13, 11, 11, 12, 13, 11, 12, 12, 12, 13, 11, 12, 11, 11), Woman = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0)), Row.Names = c(NA, 30L), class = "data frame.")
– Mario Filizzola
@neves, it is worth noting that we have in this case only 3 types of occupation, being the code 11,12 and 13 and the column women where the number 1 indicates to be female and 0 for men. In the original table we have more than 433 occupations and we need to calculate the percentage of women of each occupation.
– Mario Filizzola
Consider the database name as
df_1
. See if this solves:df_1 %>% 
 filter(Mulher == 1) %>% 
 group_by(Ocup) %>% 
 summarise(contagem_por_grupo = n()) %>% 
 mutate(percentagem = contagem_por_grupo / sum(contagem_por_grupo) * 100)
– neves
All right @neves, thank you so much for your help.
– Mario Filizzola