Regex take from one point to the other within a text

Asked

Viewed 4,008 times

5

I have the following text::

From: .... blabla bla
Message: blablabalab

//linha em branco

From: .... blabla bla
Message: blablabalab

//linha em Branco

From: .... blabla bla
Message: blablabalab

How do I get my regex to pick up where the From and for before starting the next From?

So far I have the following regex: From\s\-\s\w{3}\s\w{3}([^\n]*\n+)+. What I want is for every part of the text that contains From until you start another From stay in a group. Only my regex tah picking everything up by the end of the text.

Does anyone know how I can do it?

  • 1

    And it needs to be RegEx?

  • What’s another way? @bigown

  • I just didn’t respond in any other way because you say you want to RegEx and because I didn’t fully understand the pattern. Maybe I could use a simple Split. Maybe he won’t solve it and would need to make a bond by figuring out the breaking point with IndexOf and take what matters with Substring. I think more purposeful, although I understand that other people see themselves better with RegEx. Of course without a clear pattern, any solution is difficult.

  • Well, after your question whether you need to use regex, Regex.Split. Then I modified the regex to take only the part of From. I use regex because no From contains dates and they always change. So I have exactly what I wanted. Thank you.

  • Nice to have settled, but I think the string.Split() would already solve. Put the answer.

5 answers

6

I put together another regular expression for you that looked like this:

From:\s*([\.\w\d\s]*)\nMessage:\s*([\.\w\d\s]*)\n

I made a proof of concept here.

5


I managed to solve using the Regex.Split(), with the regex @"From\s\-\s\w{3}\s\w{3}[^\n]*\n+".

1 - I am using regex because I want to pick up from the first From - Fri Mar 13 10:58:58 2015 Until the beginning of the next. And in this intevalo there are others From in the middle of the message, only without the date.

then it was like this:

string[] split = Regex.Split(texto, rgxSplit);

It will return me an array with all the texts that are between one From - ... and another.

4

Has the method Regex.Split that can be used for this.

using System.Text.RegularExpressions;
....

public static void Main() {
        string texto = @"From: .... blabla bla
        Message: blablabalab

        //linha em branco
        From: .... blabla bla
        Message: blablabalab

        //linha em Branco
        From: .... blabla bla
        Message: blablabalab";

        string[] pedacos = Regex.Split(texto, "From:\\s+");
        foreach (var pedaco in pedacos){
            Console.WriteLine(pedaco);
        }
        Console.ReadLine();
}

But in this case it is not necessary to use this method because it does not involve regular expressions, so the Split traditional can be best used here.

If multiple delimiters are required, the following can be done:

string[] pedacos = texto.Split(new string[] { "From: ", "Message: " }, StringSplitOptions.RemoveEmptyEntries);
foreach (var pedaco in pedacos){
    Console.WriteLine(pedaco);
}

Functional example here.

  • In my case I have two From, such a From - Fri Mar 13 10:58:58 2015 and before starting the other in the same pattern, I have a From:. So that’s why I’m wearing Regex.Split()

3

Good people, I decided it was a shame I never did a show in c#. It goes from there (the moment is solemn!) I installed the mono and tried to see if the Perl idea was applicable. It is. I was so glad that I decided to write new answer!

These being my first 10 lines of c#, constructive suggestions are welcome.

To differentiate, I enriched the regular expression to separate components (from/message):

using System;
using System.Text.RegularExpressions;

class Program{
    static void Main(){
       string text = @"From:.....(Cut&paste exemplo da pergunta)....balab\n";

       Regex r = new Regex(@"(From:(.*)\n((.|\n)+?\n)(?=From:|$))");
       MatchCollection m= r.Matches(text);
       foreach (Match k in m) {
 //       Console.WriteLine("##Full# " + k.Groups[1].Value);
          Console.WriteLine("##From# " + k.Groups[2].Value);
          Console.WriteLine("##Mesg# " + k.Groups[3].Value);
       }
    }
 }

after $ gmcs regexp.cs ; regexp.exe the exit was:

##From#  .... blabla bla
##Mesg# Message: blablabalab

//linha em branco

##From#  .... blabla bla
##Mesg# Message: blablabalab

//linha em Branco

##From#  .... blabla bla
##Mesg# Message: blablabalab

0

Okay, okay, not quite c# or . net but here’s a regular expression

/(.+?\n)(?=From:|$)/s

that I think is within the spirit of the statement. Example of use:

perl -n0E 'for $mail ( m/(.+?\n)(?=From:|$)/sg ){ 
                  print "===\n", $mail
           }' file

with the example of the question gives

===
From: .... blabla bla
Message: blablabalab

//linha em branco

===
From: .... blabla bla
Message: blablabalab

//linha em Branco

===
From: .... blabla bla
Message: blablabalab

Browser other questions tagged

You are not signed in. Login or sign up in order to post.