Data preparation in R for cluster use

Asked

Viewed 283 times

2

I have a database, where in the first column are some basketball teams and the following columns are some variables observed. I would like to make a cluster analysis using such packages:

library("cluster") ;
library("factoextra") ;
library("magrittr")

Database:

Amostra do banco de dados

When reading my database that was in csv, I transformed it into data.frame, but in the attempt to scale the variables with the code below appears this error saying that my column "Time" should be numerical and so consequently I can not also make the correlation matrix, because the label appears some random numbers, instead of appearing the name of the teams.

ERROR

my_data <- na.omit(my_data)
my_data <- scale(my_data)


Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

GRAPHICAL ERROR CORRELATION

res.dist <- get_dist(my_data, stand = TRUE, method = "pearson")
fviz_dist(res.dist, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

GRÁFICO CORRELAÇÃO

Somebody knows how I fix this?

  • Graphic images of data are not useful because we cannot test code with them. It is better to edit the question with the output of dput(dados) or, if the bank is large, with dput(head(dados, 30)).

1 answer

1


As Rui said in the comment, the image of your data does not help us to help you. As for doubt, the function scale needs your database to be a numerical matrix. One solution is for you to transform the column with the name of the teams on behalf of the rows:

row.names(my_data) <- my_data[,1]
my_data <- my_data[,-1]

and then follow with your code. Probably the second error will no longer happen.

  • That suggestion of yours worked and worked.

  • I’m glad it worked out. Consider vote and accept my answer if she solved your problem.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.