Regular expression to ignore quote separator

Asked

Viewed 279 times

3

I have to read a csv file with delimiter ;.

Only it has lines with values """Analize;Distribuição""", only that it creates another column in this ;. How can I read so he won’t accept ; in quotes as delimiter?

Example of a line:

105;3650;Fernando;"""Analize;Distribuição""";0;Finalizado

The result generated is:

105
3650
Fernando
"""Analize
Distribuição"""
0
Finalizado

I wish it were:

105
3650
Fernando
"""Analize;Distribuição"""
0
Finalizado
  • wants to replace the ; for ,, that’s it?

  • Yes, it is in this case and csv this sepado by ;, and has a written value """um texto qualquer; outro texto qualtquer""". And it turns out that he recognizes this ; as a delimiter also and ends up creating an extra column. I was seeing and has ways of not changing the value of ;and change the logic of split(), but I still can’t solve

3 answers

3


Probably a CSV-specific library is the most suitable, since dedicated libraries usually treat these special cases better than regex, but anyway, it follows a suggestion.


You can use this regex:

var pattern = "\"{3}\\w+;\\w+\"{3}|[^;]+";
MatchCollection matches = Regex.Matches("105;3650;Fernando;\"\"\"Analize;Distribuição\"\"\";0;Finalizado", pattern);
Console.WriteLine("{0} campos encontrados", matches.Count);
foreach (Match match in matches)
{
    Console.WriteLine(match.Groups[0]);
}

She uses alternation (the character |), meaning "or". That is, regex has two possibilities:

  1. \"{3}\\w+;\\w+\"{3}: 3 quotation marks followed by \w+ (one or more alphanumeric characters), followed by ;, \w+ and 3 more quotes
  2. [^;]+: one or more characters other than ;

Note that I used + instead of *. The quantifier + ensures that there must be at least one character, already the * zero characters (adjustment as required).

regex first tries to check if it has the case with quotes, and if it doesn’t, toggle tries the second option ([^;]+).

Then just go through the pouch to obtain the respective excerpts.

The exit is:

6 campos encontrados
105
3650
Fernando
"""Analize;Distribuição"""
0
Finalizado

1

I imagine there are two problems, the first would be to replace ; for ,, this can be solved with a texto.Replace(";", ",");. The other would be to return these occurrences, in this case would be more or less like this

    var texto = "asfdgsdfgsdfg\"\"\"texto;texto\"\"\"fawsgasdfasdfasd\"\"\"texto;texto\"\"\"sadfgsdfgsdfg\"\"\"texto;texto\"\"\"fasdfasdfasdfasdf\"\"\"texto;texto\"\"\"sadf";
    var pattern = @"""""""\w*;\w*""""""";
    var linhas = Regex.Matches(texto,pattern);
    System.Console.WriteLine(linhas.Count);

When you have problems with regex you can use the regexstorm it works well for . net, since apparently regex implementations in languages differ from each other.

If you want to replace ; for , in each of the occurrences, you can use something in that sense

    var texto = "asfdgsdfgsdfg\"\"\"texto;texto\"\"\"fawsgasdfasdfasd\"\"\"texto;texto\"\"\"sadfgsdfgsdfg\"\"\"texto;texto\"\"\"fasdfasdfasdfasdf\"\"\"texto;texto\"\"\"sadf";
    var pattern = @"""""""\w*;\w*""""""";
    var textoModificado = Regex.Replace(texto, pattern,
    encontrato =>
    {
        return encontrato.Value.Replace(";", ",");
    });
    System.Console.WriteLine(texto);
    System.Console.WriteLine(textoModificado);

  • Jose Paulo, I edited the question now was half meaningless, take a look and see if there is any solution to help. Thank you

  • I edited the answer, see if it helps

1

People I managed to solve with the code below

// mangled code horribly to fit without scrolling
public static class CsvSplitter
{
    public static string[] SplitWithQualifier(this string text,
                                              char delimiter,
                                              char qualifier,
                                              bool stripQualifierFromResult)
    {
        string pattern = string.Format(
            @"{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
            Regex.Escape(delimiter.ToString()),
            Regex.Escape(qualifier.ToString())
        );

        string[] split = Regex.Split(text, pattern);

        if (stripQualifierFromResult)
            return split.Select(s => s.Trim().Trim(qualifier)).ToArray();
        else
            return split;
    }
}

Question answered here

Browser other questions tagged

You are not signed in. Login or sign up in order to post.