Split base with "for" in R

Question

Split base with "for" in R

Asked 8 years, 1 month ago

Viewed 347 times

6

As it is my first for in the R I found it difficult to apply this function. I have a base with a base date of different years and would like to divide the base by base dates.

The variable "date" has a split date from January 1995 (199501) to March 2017 (201703).

With that, I tried to divide in the following way without success:

for(i in 199501:201703){
dados[i]<-
subset(dados,data==i)
}

You know where there’s good material on that function?

1 answer

Browser other questions tagged r for

You are not signed in. Login or sign up in order to post.

by Marcus Nunes • **17,915** points · Answer 1 · 2017-06-29T23:51:10+00:00

Whenever possible, avoid using for in the R. It is computationally slow and can lead to making silly mistakes. For example, make a for starting like this

for(i in 199501:201703)

will take you to consider the months 199501, 199502, ..., 199512, 199513, 199514 and so on. Not a good idea.

Another problem is saving something within a position reserved for number (dados[i]) something that has two dimensions (subset(dados,data==i)). This will not work. Ideally saving these results within a list. Also, you were trying to save new objects inside the old object, thus creating a recipe for the loop not to work.

Assuming your dataset is called dados and he has a column with dates called data, a way to solve this problem using for is the following:

dadosLista <- list()

for (i in unique(dados$data)){
  dadosLista[[i]] <- subset(dados, data==i)
}

This will cause a minor inconvenience that the first 199500 positions on the list dadosLista evening NULL, and all positions that do not have a corresponding year and month, type 199533, will be NULL as well. The advantage is that the command

dadosLista[[199803]]

will return the data to March 1998. You can remove the NULL spinning

dadosLista <- Filter(Negate(is.null), dadosLista)

The problem with doing this is that references are lost with the indexes of years and months. No free lunch.

However, there is a better solution. Assuming your dataset is called dados and he has a column with dates called data, do the following:

dadosLista <- split(dados, dados$data)

This will put your data in a list. It will be possible to access each of the separate datasets via commands similar to

dadosLista$199501

Thus, each position on the list will be identified by a name, identical to the desired year and month, and not by a number. It will make the code more organized, cleaner and, I believe, run faster than if you used a for.