Getting text within another text

Asked

Viewed 263 times

0

I want to get a List(Of String) of text blocks, but there is a problem:

{ --isso é um bloco;
  echo "Aqui tem um } no meio";}

and would like to get that way:

--isso é um bloco;
echo "Aqui tem um } no meio";

but you’re coming back to my method:

--isso é um bloco;
echo "Aqui tem um 

and just there, there’s some way then to get several blocks except when they are inside the characters "..."?

Here’s my method code:

Dim tmpBlocks() As String = theFile.Split({"{"}, StringSplitOptions.RemoveEmptyEntries)
Dim Blocks As New List(Of String)

For i As Integer = 0 To tmpBlocks.Length - 1
    Blocks.Add(tmpBlocks(i).Split(New [String]() {"}"}, 
    StringSplitOptions.RemoveEmptyEntries)(0))
Next

' Blocks é a variável de cara bloco

OBS: I accept answers in C#, VB and Regular Expressions (Regex).

  • You can post your method?

  • What is the expected response to the input { "\"}" }? Or to the entrance { """}" }? You’re doing this to parse JSON or some programming language?

  • 1

    My own language @ctgPi

  • 1

    @qmechanik ready.

  • Is not this what you want to do?

  • 1

    If I understand it right, not exactly - consider the entrance { echo "{"; }.

  • @qmechanik not exactly, a new bug appears but with the {. https://dotnetfiddle.net/DwGiAk

Show 2 more comments

1 answer

2


Without knowing the grammar specification of your programming language, it is difficult to solve the general case - you should consider seriously stop everything you’re doing, pick up a book of compilers, a book of formal languages and automata, and afterward have studied both topics, study a library type ANTLR, which generates parsers for C# and Java.


If you absolutely insist on writing a compiler without studying the theory behind it, the first thing you need to know is that your language is not regular, but context-free, and therefore cannot be processed by regular expressions (Google has a lot of information if you want to study the subject).

You’ll need to do something along that line here if you make a point of writing the code in the box (and note that I did not handle escapes inside strings, {} inside {}, ...).

  • My language is working perfectly, I just found this annoying bug, and I would like to at most try to solve it. This language is quite similar to C#, separating each statement by the character ;, and the string is the classic one manipulated by quotation marks, but I just want the method (some, any one) to separate the character ; inside a string, just this.

  • It worked, but there would be some way to get several blocks of { } in a Array? ideas?

  • Can you be more specific with an example? I think it would be ideal if you asked a new question (and put a link to this one, if you think it’s appropriate).

  • I don’t know if it’s necessary, your code worked perfectly, but what if I have a file with several blocks, and I want to get them in each element of an array? possible?

  • 1

    You can use .Substring() to throw away the block that Parse() found - the version I had started writing accepted an additional parameter in the Parse(), int p, instead of initializing with p = 0; so you could specify the value of p to look for a block in another part of the string.

  • I didn’t quite understand, look what I tried ~~> https://dotnetfiddle.net/MNGtBc

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.