Transformation of data frames into binaries in R

Asked

Viewed 15 times

1

I’m doing a class exercise, but I believe you’re doing it the hard way and I’d like to know if there’s an easier way. This is the matrix I need to work on. The final matrix needs to have the matrix values be a column, if in any of the rows the observation that is the column is present, the value in the row gets 1, otherwise 0.

my.matrix <- matrix(c("A","A","G","G","I","B","B","A","A","J","C","C","D","D","A","E","E","H","H","D","F","F","B","B","F"), ncol = 5, nrow = 5)

To visualize the problem, this is the data.frame that I need to create.

my.new.matrix <- matrix(c(1,1,1,1,1,1,1,1,1,0, 1,1,0,0,0, 0,0,1,1,1, 1,1,0,0,0,1,1,0,0,1, 0,0,1,1,0, 0,0,1,1,0,0,0,0,0,1, 0,0,0,0,1), ncol = 10, nrow = 5)
my.new.matrix <- data.frame(my.new.matrix)
names(my.new.matrix) <- c("A","B","C","D","E","F","G","H","I","J")
my.new.matrix

I didn’t like the ways I created them, as they are repetitive and almost at hand and would be unusual for a longer data.frame. I wonder if you have any easy way to do the task.

1 answer

1


First, let’s find out what are the unique elements present in the matrix:

elementos <- sort(unique(as.vector(my.matrix)))
elementos
#> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"

Then the function of which with the argument arr.ind = TRUE shows which are the rows and columns in which each element occurs. Below I show this for the element A:

which(my.matrix == "A", arr.ind = TRUE)
#>      row col
#> [1,]   1   1
#> [2,]   2   1
#> [3,]   3   2
#> [4,]   4   2
#> [5,]   5   3

Then I create a matrix of zeros, with the number of rows of the original matrix and the number of columns equal the number of unique elements:

my.new.matrix <- matrix(0, nrow = dim(my.matrix)[1], ncol = length(elementos))
colnames(my.new.matrix) <- elementos
my.new.matrix
#>      A B C D E F G H I J
#> [1,] 0 0 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 0 0 0 0
#> [5,] 0 0 0 0 0 0 0 0 0 0

I’ll gather the information from the which with the new my.new.matrix to fill with 1 only the places where each letter occurs.

for (j in 1:length(elementos)){
  nao.nulos <- which(my.matrix == elementos[j], arr.ind = TRUE)[, 1]
  my.new.matrix[nao.nulos, j] <- 1
}

my.new.matrix
#>      A B C D E F G H I J
#> [1,] 1 1 1 0 1 1 0 0 0 0
#> [2,] 1 1 1 0 1 1 0 0 0 0
#> [3,] 1 1 0 1 0 0 1 1 0 0
#> [4,] 1 1 0 1 0 0 1 1 0 0
#> [5,] 1 0 0 1 0 1 0 0 1 1

It is possible to turn this series of commands into a function, but this remains as an exercise for the reader.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.