How to transform data frame variables into the indexes of a matrix with R?


I have a data frame with a lot of information and need to rearrange it in another matrix.

The data frame is of the type:

  userID  itemID  rating  
     169      242     3  
     186      302     3   
     22       377     1  

(the matrix has 10,000 items, with the userid varying up to 943 and Itemid varying up to 1682, with a rating of 1-5) what I need to do is assemble a matrix that has 943 rows and 1682 columns, initially filled with only NA, and I need to index the "rating" for each position, respecting the data frame information. In that case, in the matrix, I would need the number that gets in position [169,242]<- 3, and the position [22,377]<- 1

But I can’t assemble for anything a code to do that, I need help!!

The basic function xtabs solves the problem in a line of code but first you have to transform the columns userID and itemID in "factor" with the full levels.

dados$userID <- factor(dados$userID, levels = 1:max(dados$userID))
dados$itemID <- factor(dados$itemID, levels = 1:max(dados$itemID))
xt <- xtabs(rating ~ userID + itemID, dados)
#      itemID
#userID 1 2 3
#     1 0 0 0
#     2 0 0 5
#     3 0 0 0
#     4 0 1 0
#     5 0 0 0
#     6 0 0 3

To have NA where zeroes are: <- xt == 0
#      itemID
#userID 1  2  3
#     1        
#     2       5
#     3        
#     4    1   
#     5        
#     6       3

The method print for the class 'xtabs' has na.print = "" by default. If you want see the NA's (that are there) this should be explicitly changed.

print(xt, na.print = "NA")
#      itemID
#userID  1  2  3
#     1 NA NA NA
#     2 NA NA  5
#     3 NA NA NA
#     4 NA  1 NA
#     5 NA NA NA
#     6 NA NA  3


The data are from answer from Carlos Eduardo Lagosta.


You can create an empty array with number of rows equal to the maximum value of userID and number of columns equal to a maximum of itemID and use a loop to fill in the values according to each row of your date.frame:

# Dados de exemplo
dados <- data.frame(
  userID = c(4, 6, 2),
  itemID = c(2, 3, 3),
  rating = c(1, 3, 5))

# matriz vazia
matriz <- matrix(nrow = max(dados$userID), ncol = max(dados$itemID))

for (i in seq_len(nrow(dados))) {
  matriz[dados$userID[i], dados$itemID[i]] <- dados$rating[i]

#>      [,1] [,2] [,3]
#> [1,]   NA   NA   NA
#> [2,]   NA   NA    5
#> [3,]   NA   NA   NA
#> [4,]   NA    1   NA
#> [5,]   NA   NA   NA
#> [6,]   NA   NA    3


I’m going to use an adaptation of the data you cited as an example:

df <- read.table(
   text = 
 "userID  itemID  rating  
     2      3     4  
     3      2     1   
     1      1     3")

Then I use some package functions tidyverse to transform this data. The details of each of them can be found in the comments below:

df2 <- df %>% 
#primeiro ordeno os dados pelos valores do itemID para, em seguida, transformá-lo em colunas
    arrange(itemID) %>% 
#Segundo, transponho o itemID para as colunas, e preencho as células com as informações dos ratings
    names_from = itemID, 
    values_from = rating) %>%
#Terceiro, ordeno os dados pelo userID
    arrange(userID) %>%
#Quarto, transformo o userID em índice do dataframe;
    column_to_rownames(., var = "userID")

Finally, I turn the dataframe into a Matrix:

m <- as.matrix(df2)

Which generates the following result:


   1  2  3
1  3 NA NA
2 NA NA  4
3 NA  1 NA

