Creating a matrix with variables with different correlations in R?

Asked

Viewed 322 times

5

I need to generate data series that have correlations defined using R. I used a method I found here in the OS (How to generate correlated variables in R?) and I was able to create the variables with the desired correlation, however, by trying to automate this process for the creation of 1000 estimates and for different correlations, the result obtained is a 1000x5 matrix with all identical values. The code I’m using is as follows::

set.seed(2423049)
corr = matrix(,1000,5) 
for(k in 1:5){
  for (i in 1:5){
    for(j in 1:1000){
rho = c(-0.7,-0.3,0,0.1,0.5) # correlações que preciso utilizar

xstar=rnorm(1000,2,2) # x* com distribuicao normal N(2,2)

a2=rnorm(1000,2,2) # parametro criado para obter w a partir da correlacao rho com x*

w = rho[k]*xstar+sqrt(1-rho[k]^2)*a2 # w calculado a partir de uma correlacao definida com x

corr[j,i]=cor(xstar,w) # matriz de correlacoes entre x* e w
  }
 }
}

Through this process, the result obtained was a 1000x5 matrix where all values were 0.5499732

What am I doing wrong?

1 answer

3


Peter,

First of all, you have an extra loop in your code. Note that you are generating an array 1000 by 5. Then you start a loop by k (correlations), then by i (columns), and then by j (lines). See that you rotate for each k the 5 columns and 1000 rows, that is, each k you are writing over all previous results. Then at the end you will only save the results of the last k (rho=0.5) in the matrix.

To avoid this problem the loop should only be something like:

for (i in 1:5){
    for(j in 1:1000){
rho = c(-0.7,-0.3,0,0.1,0.5) # correlações que preciso utilizar

xstar=rnorm(1000,0,1) # x* com distribuicao normal N(2,2)

a2=rnorm(1000,0,1) # parametro criado para obter w a partir da correlacao rho com x*

w = rho[i]*xstar+sqrt(1-rho[i]^2)*a2 # w calculado a partir de uma correlacao definida com x

corr[j,i]=cor(xstar,w) # matriz de correlacoes entre x* e w
  }
 }

However, note that I changed the variables to normal with mean zero and standard deviation one, because this formula you are using w = rho[k]*xstar+sqrt(1-rho[k]^2)*a2 only serves for Normal(0.1).

To generate multiple arbitrarily related variables, you can use the package MASS. In the case of normal, you can use the function mvrnorm, would look something like:

rho = c(-0.7,-0.3,0,0.1,0.5) 
library(MASS)

### definindo uma função para gerar variáveis correlacionadas
### rho é a correlação, mu é o vetor de médias, e var o vetor de variâncias
sim.cor <- function(rho,mu=c(2,2), var=c(2,2), n=1000, sim=1000){
  correlacoes <- vector(length=sim)
  cov <- rho*sqrt(var[1])*sqrt(var[2])
  for (i in 1:sim){
    simulacao <- mvrnorm(n=n, mu=mu, Sigma=matrix(c(var[1],cov, cov, var[2]), ncol=2))
    correlacoes[i] <- cor(simulacao[,1], simulacao[,2])
  }
  correlacoes
}

### aplicando a função para cada rho
resultados <- mapply(sim.cor, rho)
  • thank you so much for your help! I managed to do what I needed!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.