A: Column average over the range of values in R

Asked

Viewed 174 times

2

I made a distribution by quantis of the Hdis of all municipalities in Brazil. The distribution was as follows:

 0%   25%   50%   75%  100% 
0.418 0.599 0.665 0.718 0.862 

In my dataframe, there is a column with the percentage of votes each municipality gave to a particular candidate in the last presidential election. I’m trying to average this percentage by considering some HDI ranges, for example, between 0.418 and 0.599. I tried to do it this way:

mean(votos_idhm$PERC[votos_idhm$IDHM.2010 >= 0.418 & < 0.599], na.rm=TRUE)

However, the following error message appears:

Error: unexpected '<' in "mean(votos_idhm$PERC[votos_idhm$IDHM.2010 >= 0.418 & <"

Does anyone have any idea how to operationalize this? Thanks in advance!

3 answers

5


I believe that R is not understanding which variable it needs to use in the second comparison. That is why it needs to be repeated. Using the which function for that case, we would have:

mean(votos_idhm$PERC[which(votos_idhm$IDHM.2010 >=0.418 & votos_idhm$IDHM.2010 < 0.599)], na.rm=TRUE) 
  • The exact way you put it, it wasn’t. But I made some changes and it worked well: Mean(votos_idhm$PERC[which(votos_idhm$IDHM.2010 >=0.418 & votos_idhm$IDHM.2010 < 0.599)], na.rm=TRUE) Thank you very much!

  • 1

    OK. I will edit my answer so that there is no confusion between future users who might have the same question.

  • 1

    So. don’t need the which, just: mean(votos_idhm[votos_idhm[["IDHM.2010"]] >= .418 & votos_idhm[["IDHM.2010"]] < .599, "PERC"], na.rm=TRUE)

4

Suppose the database is df and the variable IDH:

set.seed(123)

df <- data.frame(
  IDH = runif(n = 100, min = .250, max = 1)
)

With tidyverse the solution is simple:

df %>% 
  filter(between(x = IDH, .418, .599)) %>% 
  summarise(var = mean(IDH))

        var
1 0.5158508

2

Good afternoon, whenever you want to mess with the data frame, use the function subset(dataframe, condições). In the first argument you include the dataframe you want to return, it can be whole or indicate the columns you want to return. In the second argument you put the conditions that are met.

mean(subset(votos_idhm$PERC, votos_idhm$IDHM.2010 >=0.418 & votos_idhm$IDHM.2010 < 0.599))

In this case it returns the average of the column "PERC", where the IDHM.2010 is between 0.418 and 0.599. The subset function makes the code much cleaner and clear that using "which".

Browser other questions tagged

You are not signed in. Login or sign up in order to post.