turn numbers into relative frequency

Question

turn numbers into relative frequency

Asked 6 years, 6 months ago

Viewed 117 times

3

Dear, I have the following table with absolute numbers:

structure(
  list(
    X = c("Ver_suj", "Ver_obj", "Substantivo", "Adjetivo" ), 
    Bolsonaro = c(59L, 299L, 988L, 653L), 
    Ciro = c(188L, 242L,  128L, 212L), 
    Manuela = c(59L, 66L, 1024L, 629L), 
    Marina = c(87L,  135L, 741L, 28L)
  ), 
  class = "data.frame", row.names = c(NA, -4L )
)

which I would like to turn into proportional numbers, the result of which would be (following the formula n / sum(n):

What would be the best way?

Grateful

3

How about sharing the data with dput(dados)? See more here on how to improve the question.

– Tomás Barcellos

2019/02/11 at 22:47
Thank you, I’ll accomplish that

– user135517

2019/02/11 at 23:09

2 answers

0

dat <- structure(
  list(
    X = c("Ver_suj", "Ver_obj", "Substantivo", "Adjetivo" ), 
    Bolsonaro = c(59L, 299L, 988L, 653L), 
    Ciro = c(188L, 242L,  128L, 212L), 
    Manuela = c(59L, 66L, 1024L, 629L), 
    Marina = c(87L,  135L, 741L, 28L)
  ), 
  class = "data.frame", row.names = c(NA, -4L )
)

Solving through apply:

dat[, -1] <- apply(dat[, -1], 2, function(x) {x/sum(x)})
dat
            X  Bolsonaro      Ciro    Manuela     Marina
1     Ver_suj 0.02951476 0.2441558 0.03318335 0.08779011
2     Ver_obj 0.14957479 0.3142857 0.03712036 0.13622603
3 Substantivo 0.49424712 0.1662338 0.57592801 0.74772957
4    Adjetivo 0.32666333 0.2753247 0.35376828 0.02825429

apply is a vector function, so for each column "x", which is the input of the function, it will take the column and divide by the sum of it.

Browser other questions tagged r table conversion

You are not signed in. Login or sign up in order to post.

by Tomás Barcellos • **5,562** points · Answer 1 · 2019-02-11T23:50:14+00:00

Reproducing the data

dados <- structure(
  list(
    X = c("Ver_suj", "Ver_obj", "Substantivo", "Adjetivo" ), 
    Bolsonaro = c(59L, 299L, 988L, 653L), 
    Ciro = c(188L, 242L,  128L, 212L), 
    Manuela = c(59L, 66L, 1024L, 629L), 
    Marina = c(87L,  135L, 741L, 28L)
  ), 
  class = "data.frame", row.names = c(NA, -4L )
)

Since the latest version of dplyr, it is possible to use formula notation within mutate. So we have,

dados %>% 
  mutate_at(-1, ~.x/sum(.x))
#             X  Bolsonaro      Ciro    Manuela     Marina
# 1     Ver_suj 0.02951476 0.2441558 0.03318335 0.08779011
# 2     Ver_obj 0.14957479 0.3142857 0.03712036 0.13622603
# 3 Substantivo 0.49424712 0.1662338 0.57592801 0.74772957
# 4    Adjetivo 0.32666333 0.2753247 0.35376828 0.02825429

What this "sentence" means is "Make a mutation in all columns minus the first column. This mutation will be divide each number by the sum of the numbers in the column".

The first bold part is determined by -1 in the function and the second is determined by the formula ~.x / (sum.x). In this formula .x is a generic representation for each vector value (column)

Alternative

In the more traditional version of dplyr the common would be to define a function that returns the percentages and use it in a mutate_at() or mutate_if(). Something like that:

percentual <- function(n) {
  n / sum(n)
}

dados %>% 
  mutate_if(is.integer, percentual)