How to select data in R?

Asked

Viewed 538 times

3

I have a database with 1441 lines. I need to group them into groups of 30, and extract averages from each of these subgroups. Is there a command that allows me to do all this automatically? "every 30 rows, create a new column and calculate the average". I am separating the data manually, which will take me time. I am doing so:

primeiro = pott [1:30,c('GPP')]
segundo = pott [31:60,c('GPP')]

And so on, until 1441. I don’t see this form as very practical! : s

From now on I thank anyone who can help me

2 answers

5

In fact, it’s not a good idea to do this by hand, let alone have so many objects in the globalenv. The best is to create these sub data.frames in a list with, for example, split.

set.seed(4577)  # porque vou usar 'rnorm' para criar a data.frame

n <- 1441
pott <- data.frame(GPP = rnorm(n))

fact <- rep(1:(1 + n %/% 30), each = 30)[seq_len(n)]

lista_pott <- split(pott, fact)

Now, to make calculations we use the functions *apply.

medias <- sapply(lista_pott, function(x) mean(x$GPP))

4

Using Rui’s sample database, another alternative is:

tapply(pott$GPP, gl(nrow(pott)/30, 30), mean)

Explaining: the command gl(nrow(pott)/30, 30) creates size 30 factors for your database. And tapply does the split with sapply at the same time, applying the function mean to the vector pott$GPP for each factor of 30 observations.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.