Filter Different texts at different positions in R

Asked

Viewed 331 times

5

Good afternoon. I have the following data:

NOME  <- c("MARIA 1001", "MARIA 1002A", "JOSE 1003B", "PEDRO 1003", "CARLOS 1019J", “ANTONIO 50”, “MARIA 80”)
VALOR <- c(10, 20, 30, 40, 50, 60, 70)
dados <- data.frame(NOME, VALOR)

I need to filter the lines that are between 1001 to 1019, regardless of their position (beginning, middle or end of the text).
My expected result is to delete only the lines "ANTONIO 50" and "MARIA 80". I would like help how to proceed to make this filter. Thank you.

2 answers

5


I would do so:

library(stringr)
library(dplyr)

dados %>%
  filter(str_extract(NOME, "\\d{1,}") %in% 1001:1019)

The function str_extract extracts a pattern from a string using regex. In this case, the default is: \\d{1,}, that is, at least 1 integer.

3

Try the following. First we use the gsub to obtain only the numbers in dados$NOME. Then we filter with a logical index.

num <- as.numeric(gsub("[^[:digit:]]", "", dados$NOME))
dados2 <- dados[1001 <= num & num <= 1019, ]
rm(num)    # já não é preciso

dados2 
#          NOME VALOR
#1   MARIA 1001    10
#2  MARIA 1002A    20
#3   JOSE 1003B    30
#4   PEDRO 1003    40
#5 CARLOS 1019J    50

Browser other questions tagged

You are not signed in. Login or sign up in order to post.