R - Create Binary Variable (dummy) value 1 for 50% of the total

Asked

Viewed 446 times

1

Well, I’d like some help, I have these data each column has 100%, first wanted to index from the largest to the smallest and later make use of the dummy variable of value equal to 1 for the species that add up 50%, the rest would be of value equal to zero.

           Nome      Dens      Freq       Dom
1       Abarema  9.090909 46.153846 29.411765
2         Abuta 13.636364 11.538462 23.529412
3     Agonandra 18.181818  7.692308 11.764706
4        Aiouea 22.727273 15.384615 29.411765
5 Alchorneopsis 36.363636 19.230769  5.882353

Upshot:

           Nome      Dens      Freq       Dom v1 v2 v3
1       Abarema  9.090909 46.153846 29.411765  0  1  1
2         Abuta 13.636364 11.538462 23.529412  0  0  0
3     Agonandra 18.181818  7.692308 11.764706  0  0  0
4        Aiouea 22.727273 15.384615 29.411765  1  0  1
5 Alchorneopsis 36.363636 19.230769  5.882353  1  1  0

If you could help me, I’d be grateful.

  • 1

    Hello André, welcome to Stackoverflow PT, first I would like to ask you a more detailed explanation of your problem, because it is a little confused and so difficult to understand. Second, it would be pretty cool if you could put a little bit of what you’ve been able to do so far, because then the community can see that it’s a problem that you’re struggling to solve, and that it’s not a case where you’re asking something without at least trying.

  • Of a dput in your data as it becomes easier to work on the code with the same database you have.

  • How do you calculate "the species totalling 50%"? Adding up the values from the highest to 50%? In your example this is what you seem to have done.

  • Thank you very much for the guidelines, it was the first post I made. The example below worked perfectly.

1 answer

3


I believe that the following code solves the problem of the question.

First I define a function that processes class columns numeric and creates each dummy. It does this by adding the values from the highest until it reaches or exceeds 50%. These values will be coded as 1L (class integer) and the others as 0L.

dummyFun <- function(x){
  n <- NROW(x)
  inx <- order(x, decreasing = TRUE)
  d <- which(cumsum(x[inx]) >= 50)[1]
  d <- c(rep(1L, d), rep(0L, n - d))[order(inx)]
  d
}

num <- sapply(dados, is.numeric)

dum <- sapply(dados[num], dummyFun)
colnames(dum) <- paste0("v", seq_len(ncol(dum)))
Resultado <- cbind(dados, dum)
rm(dum, num)    # Limpeza final

Resultado
#           Nome      Dens      Freq       Dom v1 v2 v3
#1       Abarema  9.090909 46.153846 29.411765  0  1  1
#2         Abuta 13.636364 11.538462 23.529412  0  0  0
#3     Agonandra 18.181818  7.692308 11.764706  0  0  0
#4        Aiouea 22.727273 15.384615 29.411765  1  0  1
#5 Alchorneopsis 36.363636 19.230769  5.882353  1  1  0
  • That’s just what I needed. Thanks for the help and sorry for the confusion in the question data.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.