Sums and Dynamic table by R

Asked

Viewed 250 times

3

I need to do a calculation in Rstudio that is at first simple to do in excel. We have a table with two columns, one with occupation codes (Occupation Column), and another column with two codes for women, being 0 for men and 1 for women (Women Column). I need to add the number of women per occupation, and with this value calculate the percentage of women that each occupation. Our original table presents more than 360 codes of occupation, with more than 250 thousand observations, making it impossible to use excel.

Dice:

df_1 <- structure(list(Ocup = c(11, 12, 13, 11, 11, 12, 12, 13, 13, 
13, 11, 12, 12, 13, 13, 11, 12, 11, 13, 13, 11, 12, 13, 12, 12, 12, 13, 
11, 12, 11), Mulher = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 
1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0)), row.names = c(NA, 30L), 
class = "data.frame") 
  • Welcome to Stack Overflow! Your question seems to have some problems and your experience here in Stack Overflow may not be the best because of this. We want you to do well here and get what you want, but for that we need you to do your part. Here are some guidelines that will help you: Stack Overflow Survival Guide in English (short version). If the help is very simple it is still possible to do in the comments.

  • Hello, @Mario. Avoid posting images as a data table. Cannot play your example this way.

  • Perform this in R: dput(head(dados, 30)). After, copy and paste here.

  • Hello @neves, but there are a lot of data about 212 thousand observations.

  • It’s only two columns, right? Run the function I told you (it will return only 30 rows). Then copy and paste here. Or keep my edit.

  • Structure(list(Ocup = c(11, 12, 13, 11, 11, 12, 12, 13, 13, 13, 11, 12, 12, 13, 13, 11, 11, 12, 13, 11, 12, 12, 12, 13, 11, 12, 11, 11), Woman = c(1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0)), Row.Names = c(NA, 30L), class = "data frame.")

  • @neves, it is worth noting that we have in this case only 3 types of occupation, being the code 11,12 and 13 and the column women where the number 1 indicates to be female and 0 for men. In the original table we have more than 433 occupations and we need to calculate the percentage of women of each occupation.

  • 2

    Consider the database name as df_1. See if this solves: df_1 %>% &#xA; filter(Mulher == 1) %>% &#xA; group_by(Ocup) %>% &#xA; summarise(contagem_por_grupo = n()) %>% &#xA; mutate(percentagem = contagem_por_grupo / sum(contagem_por_grupo) * 100)

  • 1

    All right @neves, thank you so much for your help.

Show 4 more comments
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.