As already said, this is a question more related to statistics, but as there is not a statexchange in Portuguese I will help you in this.
The correlation method you are trying will only work for numerical variables, if you want to create relations between categorical variables with continuous variables what I recommend more would be boxplots or histograms/density.
I will demonstrate some examples in R of these analyses. For this I am using the dataset iris
which is in the standard R package datasets
and the package ggplot2
to plot the graphs. Within the dataset we will compare the different sizes of the sepals iris$Sepal.Length
of the different species we have iris$Species
.
BOXPLOT
require(datasets)
require(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot()
DENSITY
require(datasets)
require(ggplot2)
ggplot(iris, aes(x = Sepal.Length, fill = Species)) +
geom_density(alpha=0.3)
But if you really want a "number" to guide you, an ANOVA test can give you this, basically it will tell you if the average differences (the test can be applied to other attributes) of your variable continues for each category are "statistically significant".
ANOVA
require(datasets)
anova <- aov(Sepal.Length ~ Species, iris)
summary(anova)
output:
Df Sum Sq Mean Sq F value Pr(>F)
Species 2 63.21 31.606 119.3 <2e-16 ***
Residuals 147 38.96 0.265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In this case the null hypothesis that Sepals have an average of equal size is rejected by a p-value <2e-16 (basically zero), so that it can be said that the species is a relevant factor, "correlated" to the size of the sepal of these plants.
BS: I probably said some nonsense there, but I hope I helped.
This is not a question of R, but a question of mathematics/statistics. Correlation is a mathematical account that needs numerical values of two variables, it cannot be done with a categorical variable. You can explore other ways of analyzing the data, but I think only descriptive statistics.
– Molx