How to transform data frame variables into the indexes of a matrix with R?

Asked

Viewed 61 times

1

I have a data frame with a lot of information and need to rearrange it in another matrix.

The data frame is of the type:

  userID  itemID  rating  
     169      242     3  
     186      302     3   
     22       377     1  

(the matrix has 10,000 items, with the userid varying up to 943 and Itemid varying up to 1682, with a rating of 1-5) what I need to do is assemble a matrix that has 943 rows and 1682 columns, initially filled with only NA, and I need to index the "rating" for each position, respecting the data frame information. In that case, in the matrix, I would need the number that gets in position [169,242]<- 3, and the position [22,377]<- 1

But I can’t assemble for anything a code to do that, I need help!!

3 answers

1

The basic function xtabs solves the problem in a line of code but first you have to transform the columns userID and itemID in "factor" with the full levels.

dados$userID <- factor(dados$userID, levels = 1:max(dados$userID))
dados$itemID <- factor(dados$itemID, levels = 1:max(dados$itemID))
xt <- xtabs(rating ~ userID + itemID, dados)
xt
#      itemID
#userID 1 2 3
#     1 0 0 0
#     2 0 0 5
#     3 0 0 0
#     4 0 1 0
#     5 0 0 0
#     6 0 0 3

To have NA where zeroes are:

is.na(xt) <- xt == 0
xt
#      itemID
#userID 1  2  3
#     1        
#     2       5
#     3        
#     4    1   
#     5        
#     6       3

The method print for the class 'xtabs' has na.print = "" by default. If you want see the NA's (that are there) this should be explicitly changed.

print(xt, na.print = "NA")
#      itemID
#userID  1  2  3
#     1 NA NA NA
#     2 NA NA  5
#     3 NA NA NA
#     4 NA  1 NA
#     5 NA NA NA
#     6 NA NA  3

Dice

The data are from answer from Carlos Eduardo Lagosta.

1

You can create an empty array with number of rows equal to the maximum value of userID and number of columns equal to a maximum of itemID and use a loop to fill in the values according to each row of your date.frame:

# Dados de exemplo
dados <- data.frame(
  userID = c(4, 6, 2),
  itemID = c(2, 3, 3),
  rating = c(1, 3, 5))

# matriz vazia
matriz <- matrix(nrow = max(dados$userID), ncol = max(dados$itemID))

for (i in seq_len(nrow(dados))) {
  matriz[dados$userID[i], dados$itemID[i]] <- dados$rating[i]
}

matriz
#>      [,1] [,2] [,3]
#> [1,]   NA   NA   NA
#> [2,]   NA   NA    5
#> [3,]   NA   NA   NA
#> [4,]   NA    1   NA
#> [5,]   NA   NA   NA
#> [6,]   NA   NA    3

0

I’m going to use an adaptation of the data you cited as an example:

df <- read.table(
   header=TRUE, 
   text = 
 "userID  itemID  rating  
     2      3     4  
     3      2     1   
     1      1     3")

Then I use some package functions tidyverse to transform this data. The details of each of them can be found in the comments below:

df2 <- df %>% 
#primeiro ordeno os dados pelos valores do itemID para, em seguida, transformá-lo em colunas
    arrange(itemID) %>% 
#Segundo, transponho o itemID para as colunas, e preencho as células com as informações dos ratings
    pivot_wider(
    names_from = itemID, 
    values_from = rating) %>%
#Terceiro, ordeno os dados pelo userID
    arrange(userID) %>%
#Quarto, transformo o userID em índice do dataframe;
    column_to_rownames(., var = "userID")

Finally, I turn the dataframe into a Matrix:

m <- as.matrix(df2)

Which generates the following result:

print(m)

   1  2  3
1  3 NA NA
2 NA NA  4
3 NA  1 NA

Browser other questions tagged

You are not signed in. Login or sign up in order to post.