How to create a loop that turns columns into variables and returns Shapiro.test at the end?

Asked

Viewed 146 times

3

I have several . csv files with a high number of columns. I would like to optimize the work by creating a function that reads the columns and returns the result of the normality test (Shapiro.test) of each of them.

    data <- read.csv2("C:/Users/z/Desktop/CSVFOREST_WB.csv")

tnorm <- function(x){
  for (a in x) {
    a = x[[1,]]
    return(shapiro.test(a))

}  

                     }
tnorm(data)

The code, of course, returns error. What can I do?

1 answer

6


The R is not a very good language to use loops like for and while. Depending on the number of replications and their complexity, execution may become very slow.

However, it has some functions that facilitate the work of those who want to repeat the same calculation many times. Some of these functions are in the family *apply, as apply, sapply and lapply.

Take, for example, the data set below. It has 5 columns, each with 100 observations. All have normal distribution with mean 0 and standard deviation 1:

n <- 100 # tamanho amostral
r <- 5   # quantidade de amostras

dados <- data.frame(matrix(rnorm(n*r, mean=0, sd=1), ncol=5))

If I want to test the normality of each of the columns of this data set, just run

apply(dados, 2, shapiro.test)

in which

  • dados: is the data set

  • 2: indicates that I will apply a function in each column of dados. If only I had 1, this function would be applied on the lines of dados

  • shapiro.test: indicates the function I will apply to each column (2 in the above item) dados

The result obtained is as follows::

$X1

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98757, p-value = 0.4773


$X2

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98678, p-value = 0.4228


$X3

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.95448, p-value = 0.001656


$X4

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98871, p-value = 0.5622


$X5

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98234, p-value = 0.2015

Note that in each column the Shapiro-Wilk test was applied and we obtained the statistical value and the p-value associated with it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.