2
Today I was analyzing a data set and realized something I had never noticed before. In order to visualize a multivariate data set, I created your PCA and designed the observations into the two main components. For this, I used the packages ggplot2 and ggfortify. I’m going to reproduce the results with another data set, which is not the one I’m analyzing, but the same phenomenon occurs. The results are below:
library(ggplot2)
library(ggfortify)
iris.pca <- prcomp(iris[, -5])
ggplot(iris.pca$x, aes(x = PC1, y = PC2)) +
geom_point()
autoplot(iris.pca)
Notice that qualitatively, I have the same result in both graphs. The difference between them arises in the scale: while the Main Component 1 (PC1) of the graph called ggplot2 varies between approximately -3 and 4, this same PC1 in the graph called ggfortify varies between approximately -0.125 and 0.15. Similar behaviors occur in the other main components.
I know that the ggplot2 is not wrong, because when calculating the statistics of iris.pca$x, i get values that match what the graph shows:
summary(iris.pca$x)
PC1 PC2 PC3 PC4
Min. :-3.2238 Min. :-1.37417 Min. :-0.76017 Min. :-0.5054344
1st Qu.:-2.5303 1st Qu.:-0.32492 1st Qu.:-0.17582 1st Qu.:-0.0778999
Median : 0.5546 Median : 0.02216 Median :-0.01639 Median : 0.0007274
Mean : 0.0000 Mean : 0.00000 Mean : 0.00000 Mean : 0.0000000
3rd Qu.: 1.5501 3rd Qu.: 0.32542 3rd Qu.: 0.20550 3rd Qu.: 0.0896801
Max. : 3.7956 Max. : 1.26597 Max. : 0.69415 Max. : 0.5053050
Therefore, what is happening with the function autoplot? What transformation is she applying to my data to leave them at this reduced amplitude? And why does she do this?


