How to filter data according to part of the characters of a variable?

Asked

Viewed 243 times

3

How can I, for example, list only the observations it contains in the variable Name, the word Silva?

Nome                Nota
    João Silva      9
   Pedro Souza      8
     Ana Silva      6
Isabela Cabral      10
  Paulo Santos      5

I would like you to print only one table this way:

Nome                Nota
    João Silva      9
     Ana Silva      6

I am new here, I apologize for the presentation of the problem. Thank you from now!

3 answers

7


Suppose your dataset is called dados:

dados <- data.frame(Nome=c("João Silva", "Pedro Souza", "Ana Silva",
  "Isabela Cabral", "Paulo Santos"), Nota=c(9, 8, 6, 10, 5))

Use the function grep to find which lines have the word that interests you. In this case, lines 1 and 3:

grep("Silva", dados$Nome)
[1] 1 3

Select only these lines in the original dataset and your problem is solved:

dados[grep("Silva", dados$Nome), ]
        Nome Nota
1 João Silva    9
3  Ana Silva    6

7

Using Marcus' answer, I just want to draw attention to something that often goes unnoticed by "new" users of R, which is the variable given$Name being of class factor. This has importance in the final result, after eliminating the values that matter the levels (levels) variable are still there. See code:

dados2 <- dados[grep("Silva", dados$Nome), ]
str(dados2)
'data.frame':   2 obs. of  2 variables:
 $ Nome: Factor w/ 5 levels "Ana Silva","Isabela Cabral",..: 3 1
 $ Nota: num  9 6

dados2$Nome
[1] João Silva Ana Silva 
Levels: Ana Silva Isabela Cabral João Silva Paulo Santos Pedro Souza

If you want to delete these levels you can use the function droplevels.

dados2$Nome <- droplevels(dados2$Nome)
dados2$Nome
[1] João Silva Ana Silva 
Levels: Ana Silva João Silva

The other solution will be to start by creating the data.frame dados, use the argument stringsAsFactors.

dados <- data.frame(Nome=c("João Silva", "Pedro Souza", "Ana Silva",
  "Isabela Cabral", "Paulo Santos"), Nota=c(9, 8, 6, 10, 5),
  stringsAsFactors = FALSE)   ## Aqui, por defeito é TRUE

Then just use Marcus' solution.

1

Using the df created by @Marcos, you can also work with tidyverse, without the difficulty presented by @Rui:

    library(tidyverse)
    library(stringr)
    dados <- tibble(Nome=c("João Silva", "Pedro Souza", "Ana Silva",
                           "Isabela Cabral", "Paulo Santos"),
                    Nota=c(9, 8, 6, 10, 5)) %>% 

      .[str_which(.$Nome,"Silva"),] 

Browser other questions tagged

You are not signed in. Login or sign up in order to post.