turn numbers into relative frequency

Asked

Viewed 117 times

3

Dear, I have the following table with absolute numbers:

structure(
  list(
    X = c("Ver_suj", "Ver_obj", "Substantivo", "Adjetivo" ), 
    Bolsonaro = c(59L, 299L, 988L, 653L), 
    Ciro = c(188L, 242L,  128L, 212L), 
    Manuela = c(59L, 66L, 1024L, 629L), 
    Marina = c(87L,  135L, 741L, 28L)
  ), 
  class = "data.frame", row.names = c(NA, -4L )
)

which I would like to turn into proportional numbers, the result of which would be (following the formula n / sum(n): Tabela final

What would be the best way?

Grateful

  • 3

    How about sharing the data with dput(dados)? See more here on how to improve the question.

  • Thank you, I’ll accomplish that

2 answers

4

Reproducing the data

dados <- structure(
  list(
    X = c("Ver_suj", "Ver_obj", "Substantivo", "Adjetivo" ), 
    Bolsonaro = c(59L, 299L, 988L, 653L), 
    Ciro = c(188L, 242L,  128L, 212L), 
    Manuela = c(59L, 66L, 1024L, 629L), 
    Marina = c(87L,  135L, 741L, 28L)
  ), 
  class = "data.frame", row.names = c(NA, -4L )
)

Since the latest version of , it is possible to use formula notation within mutate. So we have,

dados %>% 
  mutate_at(-1, ~.x/sum(.x))
#             X  Bolsonaro      Ciro    Manuela     Marina
# 1     Ver_suj 0.02951476 0.2441558 0.03318335 0.08779011
# 2     Ver_obj 0.14957479 0.3142857 0.03712036 0.13622603
# 3 Substantivo 0.49424712 0.1662338 0.57592801 0.74772957
# 4    Adjetivo 0.32666333 0.2753247 0.35376828 0.02825429

What this "sentence" means is "Make a mutation in all columns minus the first column. This mutation will be divide each number by the sum of the numbers in the column".

The first bold part is determined by -1 in the function and the second is determined by the formula ~.x / (sum.x). In this formula .x is a generic representation for each vector value (column)

Alternative

In the more traditional version of dplyr the common would be to define a function that returns the percentages and use it in a mutate_at() or mutate_if(). Something like that:

percentual <- function(n) {
  n / sum(n)
}

dados %>% 
  mutate_if(is.integer, percentual)
  • 1

    Thank you so much! I am a beginner and I suffer a lot still

  • Just one question, why the ~ before the formula?

  • Is just the ~ which transforms the following code into formula. This allows the purrr know how to deal with it (treat, for example, the .x as each value passed in the first argument, etc)

  • In this material explains some of the use of formulas in purrr.

0


dat <- structure(
  list(
    X = c("Ver_suj", "Ver_obj", "Substantivo", "Adjetivo" ), 
    Bolsonaro = c(59L, 299L, 988L, 653L), 
    Ciro = c(188L, 242L,  128L, 212L), 
    Manuela = c(59L, 66L, 1024L, 629L), 
    Marina = c(87L,  135L, 741L, 28L)
  ), 
  class = "data.frame", row.names = c(NA, -4L )
)

Solving through apply:

dat[, -1] <- apply(dat[, -1], 2, function(x) {x/sum(x)})
dat
            X  Bolsonaro      Ciro    Manuela     Marina
1     Ver_suj 0.02951476 0.2441558 0.03318335 0.08779011
2     Ver_obj 0.14957479 0.3142857 0.03712036 0.13622603
3 Substantivo 0.49424712 0.1662338 0.57592801 0.74772957
4    Adjetivo 0.32666333 0.2753247 0.35376828 0.02825429

apply is a vector function, so for each column "x", which is the input of the function, it will take the column and divide by the sum of it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.