How to use filter() to select only a part of the string?

Asked

Viewed 182 times

2

I want to filter by a term that doesn’t always appear alone and the number of combinations is huge. My variable LINHAII has several codes and I want to filter the occurrences that contains E119. However, filtering only returns unique occurrences when it is alone.

library(tidyverse)

dados <- filter(df_datasus, LINHAII %in% '*E119')

In cases where it appears *E119*I10X, for example, there is no return of this line.

How to filter just by a snippet of interest?

2 answers

7


The %in%serves only to search for element. It will look for elements that are exactly equal to some element of a vector. In your case it really just looks for exactly the same strings '*E119'.

If you want to perform a search within strings I recommend using the str_detect package stringr, that has good documentation and is already part of the tidyverse,

dados <- filter(df_datasus, str_detect(LINHAII,'E119'))

They will select any row on that variable LINHAII contains the term E119.
If you want values that have E119 in some specific position or with some value before/after just manipulate the regular expression, which is the second part of str_detect.

5

One option is the grepl, that returns TRUE/FALSE whether or not it finds a regular expression. The asterisk shall be preceded by \\ because it’s a metacharacter.

filter(df_datasus, grepl('\\*E119', LINHAII))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.