Extract information from a string

Asked

Viewed 91 times

3

I have the following string

I would like to extract only prices within that string

thank you in advance

 a<-  " Scarpin Sofia Salto Bloco Slingback TurtleR$269,90   Scarpin Sofia 
 Nobuck Salto Bloco Slingback PretoR$269,90   Scarpin Sofia Nobuck Salto 
 Bloco Slingback Natural WoodR$269,90   Scarpin Sofia Nobuck Salto Bloco 
 Slingback New SalmonR$269,90   Scarpin Sofia Nobuck Salto Bloco Slingback 
 MandarineR$269,90   Scarpin Sofia Nobuck Salto Bloco Slingback 
 MostardR$269,90   Sandália Nobuck  Corda Salto Alto Gergelim e Verde LimeR$299,90  "

2 answers

5


Can extract prices with a combination of gregexpr and of regmatches.

In what follows I have two solutions to be used according to the structure of prices in Brazil (I am Portuguese).

If a price may not have the decimal part, use this first regular expression.

m <- gregexpr("\\$[[:digit:]]+,{0,1}[[:digit:]]{0,2}", a)
regmatches(a, m)
#[[1]]
#[1] "$269,90" "$269,90" "$269,90" "$269,90" "$269,90" "$269,90"
#[7] "$299,90"

If prices always have a comma followed by two digits, use this second regex.

m <- gregexpr("\\$[[:digit:]]+,[[:digit:]]{2}", a)
regmatches(a, m)
#[[1]]
#[1] "$269,90" "$269,90" "$269,90" "$269,90" "$269,90" "$269,90"
#[7] "$299,90"

4

If you need a date.frame with names and prices:

library(magrittr) # para os operadores de fluxo

dados <- strsplit(a, "(?<=[0-9] )", perl = TRUE) %>%
         unlist() %>%
         strsplit("R\\$") %>%
         do.call(rbind.data.frame, .)
names(dados) <- c("item", "preço")

And a data wipe:

dados$item %<>% gsub("\n ", "", .) %>%
                gsub("  ", " ", .)

dados$preço %<>% gsub(" ", "", .) %>%
                 gsub(",", ".", .) %>%
                 as.character() %>%
                 as.numeric()

dados <- dados[complete.cases(dados), ]

> dados
                                                      item preço
1               Scarpin Sofia Salto Bloco Slingback Turtle 269.9
2         Scarpin Sofia Nobuck Salto Bloco Slingback Preto 269.9
3  Scarpin Sofia Nobuck Salto Bloco Slingback Natural Wood 269.9
4    Scarpin Sofia Nobuck Salto Bloco Slingback New Salmon 269.9
5     Scarpin Sofia Nobuck Salto Bloco Slingback Mandarine 269.9
6       Scarpin Sofia Nobuck Salto Bloco Slingback Mostard 269.9
7   Sandália Nobuck Corda Salto Alto Gergelim e Verde Lime 299.9

Browser other questions tagged

You are not signed in. Login or sign up in order to post.