how to identify and delete columns with characters and factor in R

Asked

Viewed 38 times

0

I have a data.frame with several columns with different data, integers, numbers, character and factor.

I need to perform a correlation matrix with this data, but R can only perform the correlation with numerical data(int and Dbl).

I would like to separate only the numerical data(int and Dbl) to make the correlation. how could I do this?

example of my data.

j<-c(1,2,3,4,5,6,7,8,9,10)
k<-c(50,2,042,3658,14,3586,324,24,352,217)
y<-c('aaa','bbb','ccc','ccc','ddd','eee','eee','bbb','aaa','aaa')
x<-c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE)
z<-c("segunda","terça","quarta","segunda","sexta","quinta","quinta","sexta","quarta","terça")#fator

df<-data.frame(g,j,k,y,x,z)

view(df)

2 answers

2


Use the function select_if of tidyverse:

library(tidyverse)

j<-c(1,2,3,4,5,6,7,8,9,10)
k<-c(50,2,042,3658,14,3586,324,24,352,217)
y<-c('aaa','bbb','ccc','ccc','ddd','eee','eee','bbb','aaa','aaa')
x<-c(TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE)
z<-c("segunda","terça","quarta","segunda","sexta","quinta","quinta","sexta","quarta","terça")#fator

df<-data.frame(j,k,y,x,z)

df %>%
  select_if(is.numeric) %>%
  cor()
#>             j           k
#> j  1.00000000 -0.03185042
#> k -0.03185042  1.00000000

Created on 2020-12-29 by the reprex package (v0.3.0)

  • Hello @Marcus Nunes I tested the command here and it worked for the whole example. But I had a problem applying in my real df. I know I didn’t specify it in the question, but I have a "list" class column which is an a sf() of a state polygon. The results were only with the columns "Numeric" and "list". I tried applying the select(- ) command in the list column and other web tips, but it did not resolve. Finally I still can not make the correlation because of this list. I can edit the question or open a new?

  • I recommend opening a new question as the original question has been answered.

1

Can apply is.numeric to the columns and use the resulting logical vector to index the data frame.:

df[sapply(df, is.numeric)]
#>     j    k
#> 1   1   50
#> 2   2    2
#> 3   3   42
#> 4   4 3658
#> 5   5   14
#> 6   6 3586
#> 7   7  324
#> 8   8   24
#> 9   9  352
#> 10 10  217

This is not your case to register: if you want to delete some specific classes, you can use the negation operator (!) together with multiple comparisons is.*:

df[!sapply(df, function(x) is.factor(x) | is.character(x))]
#>     j    k     x
#> 1   1   50  TRUE
#> 2   2    2 FALSE
#> 3   3   42  TRUE
#> 4   4 3658 FALSE
#> 5   5   14  TRUE
#> 6   6 3586 FALSE
#> 7   7  324  TRUE
#> 8   8   24 FALSE
#> 9   9  352  TRUE
#> 10 10  217 FALSE

Browser other questions tagged

You are not signed in. Login or sign up in order to post.