Find an expression in several elements of a list

Asked

Viewed 166 times

9

Guys, I got a problem. I have 200 spreadsheets with some data from a survey, and I am importing into R and, because they are with different columns, I assign to each element of my list a different spreadsheet. I need to search for a name that may be on any of the spreadsheet and that returns me on which element of the list that name is on. How can I do that?

For example, find out where "José da Silva is":

df1 <- data.frame(nome = c("José da Silva", "Maria da Silva"),
              idade = c(45, 54))
df2 <- data.frame(nome_completo = c("Mauro Pereira", "João Paulo"),
              idade = c(30, 12))

lista <- list()
lista[[1]] <- df1
lista[[2]] <- df2

3 answers

8

Using the function which inside lapply

lapply(lista, function(x) which(x == "José da Silva"))
[[1]]
[1] 1

[[2]]
integer(0)

This is an option to search for an exact term, as in his example the "José da Silva"

  • This answer is good, but in my case I have a list with 250 elements, I needed something that would return me only the position where the expression is.

6


I’d do it this way:

library(purrr)

buscar_nome <- function(lista, nome) {
  map_lgl(lista, ~any(nome %in% .x[[1]])) %>% which()  
}

# > buscar_nome(lista, "Maria da Silva")
# [1] 1
# > buscar_nome(lista, "Mauro Pereira")
# [1] 2

One important assumption I’m making is that the name I’m looking for is in the first column of the data.frame.. This can be modified as follows to search in all columns (but losing efficiency).

buscar_nome <- function(lista, nome) {
  map_lgl(lista, ~any(nome %in% as.matrix(.x))) %>% which()  
} 
  • Daniel, I just improved a little its function to be able to find name snippets, when I don’t have the exact name of the person: buscar_name2 <- Function(list, name) { map_lgl(list, ~any(grep(name, as.Matrix(.x)))) %>% which() }

5

I made a small change in your data to increase the number of cases:

df1 <- data.frame(nome = c("José da Silva", "Maria da Silva"),
              idade = c(45, 54))
df2 <- data.frame(nome_completo = c("Mauro Pereira", "João Paulo", "João Pedro"),
              idade = c(30, 12, 1))
df3 <- data.frame(renda = c(1, 2, 3),
              idade = c(3, 2, 9),
              nome_do_cabra = c("Antônio Augusto", "João Marcos", "João Ivo"))

lista <- list()
lista[[1]] <- df1
lista[[2]] <- df2
lista[[3]] <- df3

See if this function solves your problem. It is not very efficient (loop inside loop... etc), but I believe it does work.

procura_nome <- function(x, pattern){
    list_result <- list()
    element_list_i = 1
    for(j in 1:length(x)){
            for(k in 1:ncol(x[[j]])){
                    linhas_result <- grep(x = x[[j]][,k], pattern = pattern)
                    if(length(linhas_result) > 0){
                            list_result[[element_list_i]] <- cbind(j, k, linhas_result)
                            element_list_i = element_list_i + 1
                    }
            }
    }
    if(length(list_result) >0 ){
            matrix_result <- purrr::reduce(list_result, rbind)
            df_result     <- as.data.frame(matrix_result)
            names(df_result) <- c("numero_lista", "numero_coluna", "numero_linha")
            return(df_result)
    }else{
            return(NULL)
    }
}

Since the string search function used internally is the grep, you can search for names in a way not exact. It is possible to improve, of course, to make case-insensitive, ignore accents etc.

The result is a data.frame with a column indicating the element number within the list, another indicating the data.frame column, and a third indicating the row, such as the next one:

procura_nome(lista, "João")
###   numero_lista numero_coluna numero_linha
### 1            2             1            2
### 2            2             1            3
### 3            3             3            2
### 4            3             3            3
  • Rogério, this function is not returning me the right row and column and I can’t see where to fix it

Browser other questions tagged

You are not signed in. Login or sign up in order to post.