Transform a string into multiple substrings whose contents are between apostrophes

Asked

Viewed 312 times

7

I am creating a program in C# that reads a text file and one of the lines of that file has the following format:

'Column 1' 'Column 2' 'Column 3'

I want to turn that line into one array of strings so that the answer looks like this:

    Colunas[0] = "Coluna 1"
    Colunas[1] = "Coluna 2"
    Colunas[2] = "Coluna 3"

That is, I want it to identify each string within an apostrophe and store it in the array. I tried to do this by reading the whole line using the following code:

    string Linha = Leitor.ReadLine(); //Leitor é o StreamReader que lê o arquivo

And then I tried the method linha.Split

    var NomesColunas = linha.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

But then the result was as follows: {"Column", "1", "Column", "2", "Column", "3" }. I tried to use the apostrophe as char to do Split but is giving compilation error, I’m not hitting the syntax.

  • How did your Split? Edit your answer and put it there, it’s better than in the comments, I think.

  • Thanks for the tip! I put there, I think it was better explained.

  • What a most inconvenient file format...

3 answers

5


Using regex would look something like:

'([^']+)'

This way he’ll take "unlimited"

An example would be this (I don’t have much understanding of C#, any flaw can criticize/correct):

using System;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        string linha = "'Oi 1' 'tchau 2' 'hello 3' 'good bye 4'";
        string regex = @"'([^']+)'";
        MatchCollection match = Regex.Matches(linha, @regex);

        string[] dados = new string[match.Count];

        for (int i = 0; i < dados.Length; i++)
        {
            dados[i] = match[i].Groups[1].Value;
            Console.WriteLine(dados[i]);
        }
    }
}

Note that I used string[] instead of List, because I understood that this is how I was working, but this does not change the understanding, just adjust.

  • 1

    It worked perfectly! I didn’t know the regular Xpressions (I’m new to programming). I took the opportunity to study the subject to understand the code you wrote!

4

Obviously you have an apostrophe to use as char would have to escape it to not confuse the compiler and mix what is character delimiter and character ('\''). But since you have more than one character as default then you have to use one string and not a char.

You have to remove the initial and final apostrophe and then separate (Split()) by default which is "an apostrophe, a space, an apostrophe". Thus:

using static System.Console;
using System;
                    
public class Program {
    public static void Main() {
        var texto = "'Coluna 1' 'Coluna 2' 'Coluna 3'";
        texto = texto.Substring(1, texto.Length - 2);
        var items = texto.Split(new string[] {"' '"}, StringSplitOptions.None);
        foreach (var item in items) WriteLine(item);
    }
}

Behold working in the ideone. And in the .NET Fiddle. Also put on the Github for future reference.

This should be the simplest way. It has how to develop a more elegant algorithm, but it will not be so simple.

  • Got it, thanks for the tip. Only that there is a but...not necessarily just a space between the strings I want to read, can have more. If I edit all the input files to leave only one space between the apostrophes I will lose a lot...

  • That’s not in the question, I can only answer what was asked. Is there a limit? Can you fill in the array of patterns with other forms "' '", "' '", etc. If it can be very much there needs another solution. One of them would be to take out these spaces before, perhaps with a Replace(), or it may be interesting to make a method lexer own, but I don’t know if it pays, because to have performance would not be so simple to do.

  • I didn’t really mention it in the questions, I apologize...limit does not have, I want to read the line regardless of the number of spaces between the expressions. The solution with regex worked perfectly.

2

The answer above has what you need, I implemented here a little different.

using System;

public class Program
{
    public static void Main()
    {
        string linha = "'Coluna 1' 'Coluna 2' 'Coluna 3'";

        linha = linha.Replace("' ",",").Replace("'","");

        string[] linhas = linha.Split(',');
        foreach(var item in linhas)
        {
            Console.WriteLine(item);
        }
    }
}

https://dotnetfiddle.net/W2XOu8

  • 1

    Thanks for the help. However for a case where there is more than one space between the expressions the result is not perfect. As it turned out I forgot to mention this in the question, I apologize for that. The solution with regex worked the way I needed it.

  • The Regex really helps in many things like this, but is often a little slower. the important thing is that you solved your problem.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.