Is it possible to multiply a variable of type "factor"?

Asked

Viewed 42 times

1

I’m working with the following database: Qualis.

I import this database using the rio::import() and write to the object "df". And I load library(dplyr)

Upshot:

library(dplyr)
df<-rio::import("EXEMPLO.xlsx")
head(df)

  ordem  ano qualis.ref
1     1 2017         B1
2     2 2017         B4
3     3 2017         NP
4     4 2017         A3
5     5 2017         B4
6     6 2017         B1

It turns out that the values of the variable "Qualis.ref" correspond to weights. According to the following equivalence:

A1=1,0

A2=0,8

A3=0,7

A4=0,6

B1=0,5

B2=0,35

B3=0,2

B4=0,1

C=0

NP=0

What I’m trying to do is get the score, per year, of each "Qualis.ref"

To do so, I first convert the variable "Qualis.ref" into factor using the function factor():

df$qualis.ref<-as.factor(df$qualis.ref)

Then I create a new variable called "weight", which is a copy of "Qualis.ref":

peso<-df$qualis.ref

To then assign VALUES according to the above mentioned equivalence:

levels(peso)<-c(1, 0.85, 0.7, 0.6, 0.5, 0.35, 0.2, 0.1, 0, 0)

Then bundle everything into a new data.frame called "df2" using the function cbind():

df2<-cbind(df, peso)

  ordem  ano qualis.ref peso
1     1 2017         B1  0.5
2     2 2017         B4  0.1
3     3 2017         NP    0
4     4 2017         A3  0.7
5     5 2017         B4  0.1
6     6 2017         B1  0.5

Finally, group using the function group_by() and ask to count the "Qualis.ref" with the function count().

That’s where my problem arises, I used the function mutate() to create a new column called "score" in the perspective that I could multiply the amount of "Qualis.ref" counted by their respective weights.

Stayed like this:

df2 %>% 
  group_by(ano, qualis.ref, peso) %>%
  count(qualis.ref) %>% 
  mutate(pontuacao=peso*n)

# A tibble: 40 x 5
# Groups:   ano, qualis.ref, peso [40]
     ano qualis.ref peso      n pontuacao
   <dbl> <fct>      <fct> <int> <lgl>    
 1  2017 A1         1         4 NA       
 2  2017 A2         0.85      8 NA       
 3  2017 A3         0.7      26 NA       
 4  2017 A4         0.6       4 NA       
 5  2017 B1         0.5      39 NA       
 6  2017 B2         0.35     10 NA       
 7  2017 B3         0.2       3 NA       
 8  2017 B4         0.1       9 NA       
 9  2017 C          0        14 NA       
10  2017 NP         0        10 NA       
# ... with 30 more rows
There were 40 warnings (use warnings() to see them)

However, the whole variable "score" appears with "NA".

What makes me think that the problem is with the variable of type "factor".

I tested with a variable of type "double" from "mtcars" and the multiplication worked. I multiplied the variables "Gear" and "carb":

mtcars %>% 
  mutate(teste=gear*carb) %>% 
  head()

   mpg cyl disp  hp drat    wt  qsec vs am gear carb teste
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    16
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    16
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1     4
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1     3
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2     6
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1     3
  • 1

    No, it is not possible to operate variables of the type factor recommend that reading

  • 1

    How is it possible for a question to have 2 votes against and an answer with 5 votes in favour? The answer may be useful to others but the question is not?

1 answer

6


A variable of the class factor is a set of values and labels, the levels. This class then serves to store categorical data, making no sense operations like multiplication with it.

In levels(peso)<-c(1, 0.85, 0.7, 0.6, 0.5, 0.35, 0.2, 0.1, 0, 0) was changing the data labels, not a data type transformation. So you kept the data as categories only with different names for them.

peso
[1] 0.85 0.7  0.6  1    0.7  0.85
Levels: 1 0.85 0.7 0.6 0.5 0.35 0.2 0.1 0

Your code you can fix by turning the factors first into Character, to access the labels, and then into Numeric, to turn them into numbers.

df2 %>% 
  group_by(ano, qualis.ref, peso) %>%
  count(qualis.ref) %>% 
  mutate(pontuacao=as.numeric(as.character(peso))*n)

However, what I suggest is to do the transformation differently, without involving the creation of the factor. I used the data you gave to create a dictionary of values and with them you can make the transformation directly.

valores <- c("A1", "A2", "A3", "A4", "B1", "B2", "B3", "B4", "C", "NP")
dicionario <- setNames(c(1, 0.85, 0.7, 0.6, 0.5, 0.35, 0.2, 0.1, 0, 0), valores)

df %>% 
  mutate(peso = dicionario[qualis.ref]) %>% 
  group_by(ano, qualis.ref, peso) %>% 
  count(qualis.ref) %>% 
  mutate(pontuacao = peso*n)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.