Strsplit regular expression

Asked

Viewed 212 times

2

How to assign E.R. to separate the city name ?

cid <- c(cidade1..SP.Brasil,cidade2...SP.Brasil,cidade3..SPDF.Brasil,cidade4...SPDF.Brasil)

In the sublime for example, this works:

\\.{3}[A-Z]{4}|\\.{3}[A-Z]{2}|\\.{2}[A-Z]{4}|\\.{2}[A-Z]{2}

But I cannot assign to a variable in rstudio.

pattern <- ".{3}[A-Z]{4}|.{3}[A-Z]{2}|.{2}[A-Z]{4}|.{2}[A-Z]{2}" 
pattern <- "\.{3}[A-Z]{4}|\.{3}[A-Z]{2}|\.{2}[A-Z]{4}|\.{2}[A-Z]{2}"
pattern <- "\\.{3}[A-Z]{4}|\\.{3}[A-Z]{2}|\\.{2}[A-Z]{4}|\\.{2}[A-Z]{2}"
pattern <- regex(".{3}[A-Z]{4}|.{3}[A-Z]{2}|.{2}[A-Z]{4}|.{2}[A-Z]{2}")
pattern <- regex("\.{3}[A-Z]{4}|\.{3}[A-Z]{2}|\.{2}[A-Z]{4}|\.{2}[A-Z]{2}")
pattern <- regex("\\.{3}[A-Z]{4}|\\.{3}[A-Z]{2}|\\.{2}[A-Z]{4}|\\.{2}[A-Z]{2}")

c <- strsplit(cid, pattern, fixed = TRUE)
  • You want to take only the stretch SP or SPDF?

  • I want to get only the names of the cities. I switched Fixed to FALSE and it worked, but now I have a list : "city1". Brazil" "city2". Brazil" ...

1 answer

1


I solved the problem without regex.

cid  <-  c("cidade1..SP.Brasil", "cidade2...SP.Brasil", "cidade3..SPDF.Brasil", 
"cidade4...SPDF.Brasil")

primeiro <- function(x){
  return(x[[1]])
}

unlist(lapply(strsplit(cid, split="..", fixed=TRUE), FUN=primeiro))
[1] "cidade1" "cidade2" "cidade3" "cidade4"

I used the string ".." as a separator from the original strings. However, the command strsplit will give you as output a list with 4 elements, where each element is a two position vector. Since the city is always the first position of this vector, I created a function called primeiro, that will return only the first element of each of these result vectors.

The commands lapply and as.vector are used respectively to apply the function primeiro in each element of the list created by strsplit and organize the final result of the algorithm into a vector.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.