Split a data frame and save to different directories

Asked

Viewed 1,016 times

3

I have a data frame composed of 100 lines and two columns (name and quantity). The quantity column is an integer number ranging from 1 to 4. How can I split my original data frame into four data frames following column 2 (quantity)?

In other words, I expect the following result, after the division: data frame 01, with 20 lines of quantity 01. Data frame 02, with 25 lines with quantity 02. Data frame 03, with 30 lines of quantity 03 and data frame 03, with 25 lines of quantity 04. This is a fictional example.

  • It was not clear to me what the "save to different directories" in the question title means. The body of the question makes no reference to saving the created data frames.

3 answers

4

This is the ideal case for the function split. With the function split you can share your data.frame according to the values in the column Quantidade:

tab_split <- split(tab, tab$Quantidade)

The result in the above command was saved in a list of the four data.frames separated:

str(tab_split)
    List of 4
     $ 1:'data.frame':  20 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 22 2 12 5 25 22 24 15 20 19 ...
      ..$ Quantidade: num [1:20] 1 1 1 1 1 1 1 1 1 1 ...
     $ 2:'data.frame':  25 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 26 10 20 17 20 21 1 1 14 20 ...
      ..$ Quantidade: num [1:25] 2 2 2 2 2 2 2 2 2 2 ...
     $ 3:'data.frame':  30 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 24 21 1 19 24 13 6 22 25 15 ...
      ..$ Quantidade: num [1:30] 3 3 3 3 3 3 3 3 3 3 ...
     $ 4:'data.frame':  25 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 8 22 25 3 5 21 23 12 5 8 ...
      ..$ Quantidade: num [1:25] 4 4 4 4 4 4 4 4 4 4 ...

I recommend leaving the four data.frames on the list, it’s easier and more organized to work with. But if you want to put data.frames in the global environment just use list2env():

names(tab_split) <- paste0("df", seq_along(tab_split))
list2env(tab_split, envir = globalenv())

3

Two other ways to solve the problem. The first one uses the package dplyr:

library(dplyr)
tab01 <- tab %>%
  filter(Quantidade==1)
tab02 <- tab %>%
  filter(Quantidade==2)
tab03 <- tab %>%
  filter(Quantidade==3)
tab04 <- tab %>%
  filter(Quantidade==4)

The second uses the command subset:

tab01 <- subset(tab, Quantidade==1)
tab02 <- subset(tab, Quantidade==2)
tab03 <- subset(tab, Quantidade==3)
tab04 <- subset(tab, Quantidade==4)

2

tab <- data.frame("Nome" = sample(LETTERS, 100, rep = T),
                  "Quantidade" = c(rep(1,20),rep(2,25),rep(3,30),rep(4,25)))
tab1 <- tab[which(tab$Quantidade == 1),]
tab2 <- tab[which(tab$Quantidade == 2),]
tab3 <- tab[which(tab$Quantidade == 3),]
tab4 <- tab[which(tab$Quantidade == 4),]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.