Remove part of string until nth occurrence of a character

Asked

Viewed 1,103 times

7

I have a string that always returns in any of these formats:

"0001>0002>0003>0004>0005"
"abcdef>ghi>jkl>mnopqr>stuvx"

Always has the character > dividing something.

There’s a way I could erase everything after the third >?

For example, I have "0001>0002>0003>0004>0005" and I just want to "0001>0002>0003".

Or do I have "abcdef>ghi>jkl>mnopqr>stuvx" and I just want to "abcdef>ghi>jkl".

  • That one string always has this pattern, that is, it is always so of the return or may vary?

7 answers

12


The right and efficient way to do this:

using static System.Console;
public class Program {
    public static void Main() {
        var texto = "0001>0002>0003>0004>0005";
        var posicao = -1;
        for (var i = 0; i < 3; i++) {
            posicao = texto.IndexOf(">", posicao + 1);
            if (posicao == -1) break;
        }
        if (posicao > -1) texto = texto.Substring(0, posicao);
        WriteLine(texto);
        texto = "0001>0002";
        posicao = -1;
        for (var i = 0; i < 3; i++) {
            posicao = texto.IndexOf(">", posicao + 1);
            if (posicao == -1) break;
        }
        if (posicao > -1) texto = texto.Substring(0, posicao);
        WriteLine(texto);
    }
}

Behold working in the ideone. And in the .NET Fiddle. Also put on the Github for future reference.

The other ways are quite inefficient by generating multiple memory allocations by pressing the Garbage Collector and creating pauses in the application. People do without thinking later complain that they do not know because the application is slow, consuming a lot of memory.

It is still possible to avoid the only allocation that is made using Span. I did not use it because it can only be used in certain situations and what the AP put in the question is not enough to know if his situation is candidate or not for its use. If it is the Span can take the substring right in the string original without any allocation, which is the best of worlds.

  • 2

    Who negative can say what is wrong in the answer?

  • I’d also like to know why they were negative but they didn’t explain why!

  • @Joãomartins I found a mistake in mine by a mistake, and I fixed, if the person negative so now can take the negative, or can say where there is more error that I correct, I always want to do the best, always want to teach the right so people can program better. Some people like the gambit. A person reversed the vote and was fair, could only have commented.

  • 2

    +1 About "do without thinking", is one of the problems I often see, people make the code using the first ready function they find in the language without thinking at all about the allocation of memory. Your solution uses exactly what you already have in memory with virtually no overhead, with zero allocation. Maybe not even substring allot, if C# does copy-on-write (I don’t know the function Internals, but I think it’s very likely).

  • 2

    @Joãomartins yours, although not very efficient, is correct and works in all situations (that I could notice), the other answers give error and either break or present incorrect result. One of them had too many votes for a wrong answer, I’ll post it on each one, but something tells me they’re going to consider it abusive to have you make a mistake. Some people complain that they do not comment on the error, others complain that they comment on the error. There’s no pleasing everyone, so I like who wants the right.

  • @Bacco has only one allocation and this has only one way to solve with a new resource, but it depends on what the person is going to use cannot be done. I’m putting it in the answer.

  • @Maniero your solution has an error in the format: https://dotnetfiddle.net/a1IJ7Z its final format is abcdef>ghi>jkl> the requested is abcdef>ghi>jkl

  • 1

    @Hudsonph true, corrected. When I fixed the problem of catching the ending and not the beginning I had to have touched it and forgot. Thank you, it helped to stay right. If you or someone else find something else you can say.

  • @Maniero. Logic and execution are perfect +1, but you’ve reinvented the wheel. Regex does just that..

  • 1

    @Augustovasques Only it makes with absurd more overall efficiency and I spent less time to do than I would with Regex.

  • @Maniero. I did not question efficiency, so much so that I gave +1. Is that with Regex are two lines of code and if you want to modify the search pattern just change a string.

Show 6 more comments

4

Among so many alternatives, I made one more, I did not validate its efficiency in relation to the others, but it follows anyway:

string dados = "0001>0002>0003>0004>0005";
//string dados = "0001>0002>0003";

if (dados.Split('>').Length > 3)
{
    int index = dados.IndexOf('>', dados.IndexOf('>', dados.IndexOf('>') + 1) + 1);

    string antes = dados.Substring(0, index);
    string depois = dados.Substring(index + 1);
    Console.WriteLine(antes);
    Console.WriteLine(depois);
}
else
    Console.WriteLine(dados);

the IndexOf, within the various overloads, has one that is first the desired character and then from which position it will "validate"

Validation was also performed to identify if there are at least 3 occurrences of ">"

2

Implement the following method to generalize the behavior you want:

private string DevolvePartes(string strTexto, int intPartes, string strSeparador)
{
    return string.Join(strSeparador, strTexto.Split(
        new string[] { strSeparador }, StringSplitOptions.RemoveEmptyEntries).Take(intPartes));
}

Then just use, in any circumstances, as follows:

string strTexto = "0001>0002>0003>0004>0005";
string strTextoPartido = DevolvePartes(strTexto, 3, ">");

This way you can use other separators, not just ">", and separate by the parts you understand.

  • 1

    This is a good solution, I tried to give a more 'didactic'' .

  • 1

    I like the logic of that code.

2

I’m sorry the taxpayers who used the method overloads String.Split, or other devices, but using a Regex.Match is the simplest and most efficient way to address this problem. Where:

^[^>]*>([^>]*>)([^>]*)

is the default to generate the required match to capture.

Detailed pattern search code:

// string entrada = "0001>0002>0003>0004>0005";
string entrada = "abcdef>ghi>jkl>mnopqr>stuvx";

// Cria o Regex com o seu padrão de busca
Regex reg = new Regex("^[^>]*>([^>]*>)([^>]*)"); 

// Faz a busca e guarda em correspondencia. 
Match correspondencia = reg.Match(entrada);

// se houve sucesso na busca guarda o padrão em resultado senão guarda a string "nada foi encontrado."
string resultado = (correspondencia.Success)? correspondencia.Value : "nada foi encontrado.";

//Exibe o resultado da busca.
Console.WriteLine(resultado);

Same code only as a function:

// Inicie o Regex dentro da classe e fora da função já que vai ser reutilizado varias vezes.
Regex reg = new Regex("^[^>]*>([^>]*>)([^>]*)");

public string Busca(string entrada) 
{
    Match correspondencia = reg.Match(entrada);
    return (correspondencia.Success) ? correspondencia.Value : "";
}
  • Perhaps those who are more aggressive with regex syntax may prefer the pattern ^([^>]*>){2}[^>]* for the same search.

  • 1

    Tb can use ^([^>]+>){1,2}[^>]+, in case of be less than three > in string, and use + force to have something between the > - although it has not been mentioned whether you always have something or if you can have >>>, for example. Anyway, it is a valid solution, I only question the "simplest and efficient", since who is not familiar with regex can take longer to do and/or understand, and efficient, just testing to see (with few strings it makes no difference, but usually regex adds a overhead and depending on the case, it will not always be more efficient)

  • @hkotsubo, thanks for the addendum. I will take this into account in the future,

1

One more solution:

var texto = "0001>0002>0003>0004>0005";

for (int i = texto.Length - 1, sinal = 0; i >= 0 && sinal < 2; i--)
{
    sinal += texto[i] == '>' ? 1 : 0;
    texto = texto.Substring(0, i);
}

Console.WriteLine(texto);
Console.ReadKey();

This removes characters from the string...

1

With Array

class Program
{
    static void Main(string[] args)
    {
        string test = "0001>0002>0003>0004>0005";

        var testSplit = test.Split('>');

        string testJoin = string.Empty;

        for (int i = 0; i < 3; i++)
        {
            testJoin += testSplit[i] + ">";
        }

        Console.WriteLine(testJoin.Trim('>'));

        Console.ReadKey();
    }
}

If you can turn it into a list, it’s easier

class Program
{
    static void Main(string[] args)
    {
        string test = "0001>0002>0003>0004>0005";

        List<string> list = new List<string>(test.Split('>'));

        list.RemoveRange(3, list.Count - 3);

        Console.WriteLine(string.Join(">", list));

        Console.ReadKey();
    }
}
  • 1

    Thanks, it helped a lot!

  • @Emmanuelsales happy to have helped :)

  • There is a problem with this answer: https://dotnetfiddle.net/N5gtEW. It would be nice to arrange this to at least justify acceptance.

1

As his string will always be divided by the character >, we can say that we have a pattern, with this, we can divide using the method Split().

// Aqui temos nossa string original dividida pelo caractere '>'
string padrao = "0001>0002>0003>0004>0005";
// Aqui nós dividimos em um array de strings que ficará no seguinte formato:
// "0001"
// "0002"
// "0003"
// "0004"
// "0005"
string[] padraoDividido = padrao.Split('>');

Now that we have everything divided, we will check if we have more than 3 characters, we will concatenate the strings:

string padraoFormatado = string.Empty;

if (padraoDividido.Length >= 3)
    padraoFormatado = string.Format("{0}>{1}>{2}", padraoDividido[0], padraoDividido[1], padraoDividido[2]);
  • 1

    There is a problem in this answer: https://dotnetfiddle.net/ZVxTWU. It would be interesting to arrange to justify so many positive votes.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.