Regex: how to select whole sentence (no digits) and insert quotation marks

Asked

Viewed 64 times

2

I got this result from a . csv

9   Abraço de tamanduá
9   Abraço fraterno
9   Correr/partir pro abraço (futebol)

I am applying several regex so that the end result is an SQL statement like this:

INSERT INTO `expressoes`(`id_palavra`, `expressao`) VALUES (9, 'Abraço de tamanduá'), (9, 'Abraço fraterno'), (9, 'Correr/partir pro abraço (futebol)')

As I am Newbie in regex, made a keybinding in vscode to run it with a shortcut (using find and replace all vscode). So I can run several regex commands in sequence.

So in the first command I did:

find: \n{1,}
replace all: ), (

Resulting:

9   Abraço de tamanduá), (9 Abraço fraterno), (9    Correr/partir pro abraço (futebol)

The second command:

find: (?<=[0-9])
replace all: ,

Resulting

9,  Abraço de tamanduá), (9,    Abraço fraterno), (9,   Correr/partir pro abraço (futebol)

Then commands 3, 4 and 5:

find: ^
replace all: (

find: $
replace all: )

find: \s // isso não funciona pelo plugin `ssmacro` que executa o comando regex
replace all: ␣ // corrigir `tab` com um espaço simples

only to correct the beginning and end with parentheses. Resulting:

(9, Abraço de tamanduá), (9, Abraço fraterno), (9, Correr/partir pro abraço (futebol))

Now, insert the single quotes into the entire sentence I’m not getting. I tried ([a-z,A-z,çéáíãõ]+) (\w+) also did not give. I got a next with this:

find: ([^\d\W]+á*é*í*ó*ú*õ*ã*ç*(\s)*/*)
replace: '$1'

but it hasn’t worked out yet. resulting:

(9, 'Abraç''o ''de ''tamanduá'), (9, 'Abraç''o ''fraterno'), (9, 'Correr/''partir ''pro ''abraç''o '('futebol'))

Q: Any idea to properly format simple quotes?

Q: And you still have this (...) as in (futebol) at the end of some sentences I have no idea on how to isolate it.

  • You want to remove from the sentence everything that comes before the beginning of the sentence (the space and the digit)?

1 answer

2


It would be easier to use some language to read the CSV, separate the fields and create the Inserts, but if you want to do with regex, come on.

One option is to use (\d+)\s+(.+):

  • \d+: one or more digits
  • \s+: one or more spaces
  • .+: one or more characters ("anyone", except line breaks - that is, goes to the end of the line)

Digits and characters after the space are in parentheses to form capture groups, so I can catch them later.

In replacement you use:

INSERT INTO `expressoes`(`id_palavra`, `expressao`) VALUES ($1, '$2');

The passages $1 and $2 equal to the capture groups (the first is the digits and the second is the rest of the text).

Only this will generate several lines with a INSERT in each, and not just a single INSERT as you wanted. But if you want, you can replace it with VALUES ($1, '$2'), and add the INSERT INTO etc... at the beginning manually.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.