R - Search lines in a data frame conditioned on the part of a string

Asked

Viewed 1,913 times

0

I have a data frame with several lines, in these lines I have a sentence that contains by default the word "Ad", I want to generate a new data frame that contains only the lines that in the sentence have the word "Ad". I’ve searched several links on how to do this in R but haven’t found the solution yet.

Exemplo:

#Gerar um novo data frame com os dados a serem trabalhados das visualizações de anúncios
DadosAnuncios=data.frame(Usuário=Dados[,1],DescricaoURL=Dados[,6])
#Pegar somente as descrições que são dos anúncios
DadosAnuncios=DadosAnuncios[grep("Anúncio:", DadosAnuncios$DescricaoURL),]
View(table(DadosAnuncios[,2]))

Example 2:

Usuarios=c("Joao1", "Joao2", "Joao3", "Joao4")
Acessos=c("Página 01", "Página 02", "Anúncio: 01", "Anúncio: 02")
MeusDados=data.frame(Usuarios,Acessos)
DadosAnunciosTeste=MeusDados[grep("Anúncio:", MeusDados$Acessos),]
View(table(DadosAnunciosTeste[,2]))
  • 1

    It would be nice if you made your database, or a part of it, available for use as an example. Also, what have you tried? Also put your code

2 answers

1


Provided that your data.frame be it dados and the column containing the word "Ad" is frase,

dados[grep("Anúncio", dados$frase),]

should solve your problem

  • Thanks Rafael, this solution worked! Only when I will generate a frequency table of an attribute of this new dataframe through the 'table' function the generated frequency table brings the frequency of the lines that were extracted but also brings the old dataframe lines but often reset! For better understanding follows below an example:

  • would you be able to put your data, or a part of it? use the command dput so that we can reproduce your problem

  • I edited the question and put an example code.

  • the example code is ok, but it is important to make the data available

  • I put in question an example 2 sample of how are my data, because I do not know do that you asked rsrs...I started using this site recently.

  • for you to put your database here, type dput(Dados) in the R and copy and paste the output here on the site. But as for your comment, it is appearing at zero frequencies as the code MeusDados <- data.frame(Usuarios,Acessos) changes the class of variables to factor. Turn them into character

  • I didn’t understand the part about converting the variable to character... When I introduce the new dataframe DadosAnunciosTeste it only has two instances, but when displaying the frequency table, the four instances are shown.

  • run the command str(DadosAnuncios) and, if I’m right, it will appear that your two variables are factors. What I recommend is, after rotating DadosAnuncios=data.frame(Usuário=Dados[,1], DescricaoURL=Dados[,6]), turn your variables into character, with the command DadosAnuncios <- as.character(DadosAnuncios[,1]) and DadosAnuncios <- as.character(DadosAnuncios[,2])

Show 3 more comments

1

One way will be this.
First let’s create an artificial table, just for testing.

set.seed(2203)    # torna os resultados reprodutíveis

s <- 
"Tenho um data frame com várias linhas, nessas linhas tenho uma frase que contém por padrão a palavra Anúncio, quero gerar um novo data frame que contenha somente as linhas que na frase possuem a palavra Anúncio. Já procurei em vários links de como fazer isso no R mas ainda não encontrei a solução"
s <- unlist(strsplit(gsub("[[:punct:]]", "", s), " "))
dados <- data.frame(s = sample(s, 200, TRUE), x = rnorm(200))

Now I’m gonna use grepl to find the word anúncio. As can be present both capitalized and not, also use tolower, to make sure there’s no such problem.

inx <- grepl("anúncio", tolower(dados$s))
anuncio <- dados[which(inx), ]
row.names(anuncio) <- NULL
anuncio
#        s          x
#1 Anúncio -0.2342417
#2 Anúncio -2.2457881
#3 Anúncio  0.7579141
#4 Anúncio  0.7771827
#5 Anúncio -1.5996622
#6 Anúncio  1.0020413

Browser other questions tagged

You are not signed in. Login or sign up in order to post.