Line break in CSV file causes error during import

Asked

Viewed 700 times

-1

The application needs to read a . csv file that is automatically generated by an external application (We have no control over this).

The file . csv has 60 "columns" (after reading and separation of values).

The problem is that in some cases, a line break occurs in the file. Example of line with break:

test;test;test;"breaking of

line"; test;

When the right thing would be:

test;test;test;"line break"; test;

In the example the application would expect 6 values, however, because of this line break the application only receives 4 values and ends up accusing that:

The index was outside the matrix boundaries.

Follow file read code.

using (var reader = new StreamReader(caminho do arquivo)) {
    while (!reader.EndOfStream) {
        var linha = reader.ReadLine();
        var valores = linha.Split(';');
        var minhaClasse = new MinhaClasse(){
            Valor1 = valores[0],
            Valor2 = valores[1],
            Valor3 = valores[2],
            Valor4 = valores[3],
            Valor5 = valores[4],
            Valor6 = valores[5]
        }
    }
}

I have no control of when or which field will come with line breaking, how to solve this so I can read the whole line of. csv without worrying about line breaking.

I must change the way I’m reading the file?

2 answers

0


I do not know if it is the most correct way but it was the one that answered me. (Obs. The answer is only to be able to finish the question).

Using the library

using Microsoft.VisualBasic.Fileio;

I read the file using the Textfieldparser. Stayed that way.

//Realiza a leitura do arquivo.
using (TextFieldParser parser = new TextFieldParser(Caminho do arquivo)){

    //Informa qual o delimitador dos campos.
    parser.Delimiters = new string[] { ";" };

    while (!parser.EndOfData)
    {
        //Lê todos os campos da linha e os retorna como uma matriz
        var valores = parser.ReadFields();
    }
}

Again, I do not know if it is the most appropriate way but it was the one that solved my problem because the line breaks are ignored.

0

Try using a regular expression to remove line breaks in the variable linha. For example, below the var linha, add this code here:

linha = Regex.Replace(input, @"\r\n?|\n", linha);

That way, before you run the linha.Split(';') the line will be "clean" (no line breaks).

  • In this case it does not work because there is no line break " r n". What exists in the file are 2 lines for a single information (open as txt vera 2 lines for a value). So when I use Reader.Readline() it doesn’t understand that the value continues at the bottom line, it ends up reading as 2 lines of distinct values.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.