Do Regex.Split on what is not between "..."

Asked

Viewed 236 times

4

Which regular expression I use to get the comma character that is not inside the fields "...", example:

line1, "line2", "hello,world", 215, X + Y
     ^        ^              ^    ^

I want to get only the nominees, I’m using the expression (?!.*"), but it doesn’t work.

  • Those that are not in quotes " "?

  • Yes, I want to take the commas that aren’t in quotes.

  • See if it helps: (["'])(?:(?=(\\?))\2.)*?\1

  • That’s what’s picking up the fields "..." not the commas

2 answers

5


The big problem with this is that the regular expression parser doesn’t know what a quote opens and a quote closes. For example:

"line2", "hello,world"

"line2" would be a group, ", " would be another and "hello,world" would be another, which would make it impossible to have a regular expression that solves everything. That is, you need to count the whole group, with or without quotation marks.

My suggestion is you count the commas together with each group, ie:

(("[\w\s,]*")(,)?)|([\w\s\+]*(,)?)

What do you mean:

Count everything inside quotes finished by 0 or 1 comma, or what has no quotes finished by 0 or 1 comma.

Regular expression visualization

See here working.

Once this is done, the comma will always be in the second group and what should be really important for its application in the first.

3

I set up a different REGEX than of @gypsy, but the analysis is the same:

Capture the previous group to identify the posterior comma.

REGEX:

(?|(['"])[^\1]+?\1(,)?|([^,])(,))

See working.

The comma had also always been in the second group.

  • The strange thing is that its expression works on other platforms, but on . NET Framework is not validated: http://prntscr.com/a3lno4

Browser other questions tagged

You are not signed in. Login or sign up in order to post.