Delete totalizing lines

Asked

Viewed 55 times

3

I have the following structure of a database:

MES EST.DET1 EST.DET2 EST.DET3 DIAS
2  Curso 1  Turma A    Manha    5
2  Curso 1  Turma A    Tarde    5
2  Curso 1  Turma B     <NA>    5
2  Curso 1     <NA>     <NA>   15
2  Curso 2  Turma A     <NA>    7
2  Curso 2     <NA>     <NA>    7
2  Curso 3     <NA>     <NA>   10
3  Curso 1  Turma A    Manha    6
3  Curso 1  Turma A    Tarde    6
3  Curso 1  Turma B     <NA>    6
3  Curso 1     <NA>     <NA>   18
3  Curso 2  Turma A     <NA>    7
3  Curso 2     <NA>     <NA>    7
3  Curso 3     <NA>     <NA>   13
4  Curso 1  Turma A    Manha    5
4  Curso 1  Turma A    Tarde    5
4  Curso 1  Turma B     <NA>    5
4  Curso 1     <NA>     <NA>   15
4  Curso 2  Turma A     <NA>    6
4  Curso 2     <NA>     <NA>    6
4  Curso 3     <NA>     <NA>   10

Basically, are 3 courses that occur over three months, and courses 1 and 2 structures "daughters". Course 1, has 2 classes (A and B), and the A can happen in the morning or afternoon. Course 2 has only class A and course 3 has no detailed "daughter" structures.

The 4th line and the respective ones of Course 1 for the other months, is nothing more than the totalizer (sum) of the structures "daughters". The same goes for the 6th line (Course 2).

Is there any way to filter my database so that these totalizing lines are deleted? (Note that Course 3 should be maintained)

1 answer

3


The simplest way is to use a logical vector to select your lines:

data <- c("2", "Curso 1", "Turma A", "Manha", "5",
          "2", "Curso 1", "Turma A", "Tarde", "5",
          "2", "Curso 1", "Turma B", "",      "5",
          "2", "Curso 1", "",        "",      "15",
          "2", "Curso 2", "Turma A", "",      "7",
          "2", "Curso 2", "",        "",      "7",
          "2", "Curso 3", "",        "",      "10",
          "3", "Curso 1", "Turma A", "Manha", "6",
          "3", "Curso 1", "Turma A", "Tarde", "6",
          "3", "Curso 1", "Turma B", "",      "6",
          "3", "Curso 1", "",        "",      "18",
          "3", "Curso 2", "Turma A", "",      "7",
          "3", "Curso 2", "",        "",      "7",
          "3", "Curso 3", "",        "",      "13",
          "4", "Curso 1", "Turma A", "Manha", "5",
          "4", "Curso 1", "Turma A", "Tarde", "5",
          "4", "Curso 1", "Turma B", "",      "5",
          "4", "Curso 1", "",        "",      "15",
          "4", "Curso 2", "Turma A", "",      "6",
          "4", "Curso 2", "",        "",      "6",
          "4", "Curso 3", "",        "",      "10")

data <- data.frame(matrix(data, ncol = 5, byrow = TRUE))

names(data) <- c("MES", "EST.DET1", "EST.DET2", "EST.DET3", "DIAS")

In the following execution I select lines that are not empty in column 3 or contain 'Stroke 3' in column 2.

data[data[,3] != '' | data[,2] == 'Curso 3', ]

Upshot:

   MES EST.DET1 EST.DET2 EST.DET3 DIAS
1    2  Curso 1  Turma A    Manha    5
2    2  Curso 1  Turma A    Tarde    5
3    2  Curso 1  Turma B             5
5    2  Curso 2  Turma A             7
7    2  Curso 3                     10
8    3  Curso 1  Turma A    Manha    6
9    3  Curso 1  Turma A    Tarde    6
10   3  Curso 1  Turma B             6
12   3  Curso 2  Turma A             7
14   3  Curso 3                     13
15   4  Curso 1  Turma A    Manha    5
16   4  Curso 1  Turma A    Tarde    5
17   4  Curso 1  Turma B             5
19   4  Curso 2  Turma A             6
21   4  Curso 3                     10
  • I thought of a solution this way, but since I have many variations, I was looking for something more automatic.

  • @Rafaelcunha if you put your problem with multiple variables, I can try to solve otherwise.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.