REGEX - Small details that don’t match

Asked

Viewed 134 times

4

I have that expression:

(?:[ \t]*[a-z][)]\s*)?([^\r\n<]+(?:(?:\r?\n(?!\s*[a-z][)])|<(?!br\s*\/?>(?:\s*<br\s*\/?>)*\s*(?:\s+[a-z][)]|\s*$)))[^\r\n<]*)*)(?:<br\s*\/?>\s*)*

Which matches this text and removes the letters a),b),c),d),e) and the <brs> only at the end:

<strong>Preencha</strong> a lacuna e assinale a alternativa correta. <br /><br />
I - capacitação técnico-profissional: Comprovação do licitante de possuir em seu quadro permanente, na data prevista para entrega da proposta, ________________, detentor de atestado de responsabilidade técnica por execução de obra ou serviço de características semelhantes, limitadas estas exclusivamente às parcelas de maior relevância e valor significativo do objeto da licitação, vedadas as exigências de quantidades mínimas ou prazos máximos (Lei 8.666/1993 Art N° 30).<br />
<br />
a)<strong>profissional</strong> de nível superior<br />
b)profissional de nível superior ou outro devidamente reconhecido pela entidade competente<br />
c)profissional capacitado<br />
d)profissional de nível minimamente técnico<br />
e)profissional especializado no objeto da <strong>licitação</strong>

Currently she leaves like this:

strong>Preencha</strong> a lacuna e assinale a alternativa correta.<br /><br />
I - capacitação técnico-profissional: Comprovação do licitante de possuir em seu quadro permanente, na data prevista para entrega da proposta, ________________, detentor de atestado de responsabilidade técnica por execução de obra ou serviço de características semelhantes, limitadas estas exclusivamente às parcelas de maior relevância e valor significativo do objeto da licitação, vedadas as exigências de quantidades mínimas ou prazos máximos (Lei 8.666/1993 Art N° 30).
a)<strong>profissional</strong> de nível superior
profissional de nível superior ou outro devidamente reconhecido pela entidade competente
profissional capacitado
profissional de nível minimamente técnico
profissional especializado no objeto da <strong>licitação</strong>

Can be seen here https://regex101.com/r/MDstG4/4

But as seen in this link, when inserting some formatting tag at the beginning of the question, or at the beginning of the answers not home. See the <strong> as cut back, at the beginning of the question and the letter a) that is included in the first answer. It should come clean, like the other answers.

Remembering that question, and each answer I’m picking separately to insert into a field in the database.

The attempt is to take:

  1. Take everything up to the a) and delete all brs only at the end.
  2. Take a),b)... until the next letter deletes all brs only at the end.

ASP code. To using so, because time is 4 answers, time 5.

questao=Request.Form("editor")

Set re = New RegExp'RegEx
re.Global = true
re.IgnoreCase = true
re.Pattern = "(?:[ \t]*[a-z][)]\s*)?([^\r\n<]+(?:(?:\r?\n(?!\s*[a-z][)])|<(?!br\s*\/?>(?:\s*<br\s*\/?>)*\s*(?:\s+[a-z][)]|\s*$)))[^\r\n<]*)*)(?:<br\s*\/?>\s*)*"    

Set matches = re.Execute(questao)
If (matches.Count) Then

    For m = 1 To matches.Count - 1

    '4 respsotas
    if (matches.Count-1)=4 then
        pergunta=matches(0).SubMatches(0)
        resposta_a=matches(1).SubMatches(0)
        resposta_b=matches(2).SubMatches(0)
        resposta_c=matches(3).SubMatches(0)
        resposta_d=matches(4).SubMatches(0)
    end if

    '5 respostas
    if (matches.Count-1)=5 then
        pergunta=matches(0).SubMatches(0)
        resposta_a=matches(1).SubMatches(0)
        resposta_b=matches(2).SubMatches(0)
        resposta_c=matches(3).SubMatches(0)
        resposta_d=matches(4).SubMatches(0)
        resposta_e=matches(5).SubMatches(0)
    end if

    Next
End If
Set matches = Nothing
Set re = Nothing
  • 2

    My opinion, although not directly related to solving the problem, is that you are solving the problem in the wrong way. If you build this html, then add additional tags to make it easier to capture the elements. Parsing specific html, especially for more elaborate cases like yours, regex rule is not the general way. There are html parsers precisely for this reason.

  • It’s not me, it’s users who understand nothing about it.

  • In fact, I’m almost to the solution, missing only those details that I can not adjust.

  • How about this? https://regex101.com/r/Xoyimh/1

  • @Marcelouchimura seems perfect, but in Count maches in Asp does not work. I will put the code in the question for you see.

  • @Rod This regex satisfies? (?![a-z]?\)).+(?=<br\s*\/) And the demo. I saw that ASP is very similar to VBA, if necessary I can create an example of VBA... Because I do not know ASP

  • @danieltakeshi looks perfect, I’ll test it. Thanks anyway.

  • I see here that the question picks up two maches. it would have to be a maches, regardless of the line breaks you have.

  • Only the question? The options in letters can be separated?

  • @danieltakeshi Yes, a Mach for the question, regardless of how many brs you have in the middle, only exclude the ones from the end, and a Mach for each letter, you can also have brs in the middle, but not at the end. Because then I can take separately and record in the bank. Thanks for answering my friend.

Show 5 more comments

2 answers

0

Why don’t you facilitate the code by mounting more of a regular, simpler expression?

To capture the question items,

Dim patternQuestao = "[a-z]\)(.+?)\<br\s+\/\>"

To capture the issue’s title,

Dim patternTitulo = "((.|[\r\n])+?)[a-z]\)"

And to remove the line break,

Dim patternQuebra = "\<br\s+\/\>\s*$"

For example, your code could turn to the following:

Set r = new RegExp
r.IgnoreCase = True

'--- coleta o titulo
Dim titulo
r.Pattern = patternTitulo
r.Global = False
Set matchQuestao = r.Execute(entrada)
If Not matchQuestao Is Null And matchQuestao.Count = 1 Then
    titulo = matchQuestao.Matches(0).Submatches(0)
End If

'--- retira as quebras do titulo
r.Pattern = patternQuebra
r.Global = True
titulo = r.Replace(titulo, "")

'--- coleta as questoes
Dim resposta_a, resposta_b, resposta_c, resposta_d, resposta_e
r.Pattern = patternQuestao
r.Global = True
r.Multiline = True
Set matchQuestoes = r.Execute(entrada)
If Not matchQuestoes Is Null And matchQuestoes.Count > 0 Then
    With matchQuestoes
        resposta_a = .Matches(0).Submatches(0)
        resposta_b = .Matches(1).Submatches(0)
        resposta_c = .Matches(2).Submatches(0)
        resposta_d = .Matches(3).Submatches(0)
        If matchQuestoes.Count > 4 Then
            resposta_e = .Matches(4).Submatches(0)
        End If
    End With
End If

'---- retira as quebras de linha das questoes
r.Pattern = patternQuebra
r.Global = True
resposta_a = r.Replace(resposta_a, "")
resposta_b = r.Replace(resposta_b, "")
resposta_c = r.Replace(resposta_c, "")
resposta_d = r.Replace(resposta_d, "")
resposta_e = r.Replace(resposta_e, "")
  • Thank you very much, but I am disappointed, I have been insisting on this for a long time. I can not at all. Your code has several problems with Asp 3. The idea is great, but I could not hit, I modified almost everything to try to run and I did not.

  • in addition, the br can only be removed at the end of the rule, the middle.

  • for example capturing the title https://regex101.com/r/fn1d7g/4

  • If I get a regex that takes brs only from the end of the string I think I remember what I want to do.

0

Regex

This regular expression solves the question of example: (?:[a-z]\))?([\s\S]+?)(?:<br\s*\/>\s*)?(?=[a-z]{1}\)|$)

Where group 1 of each match is captured without the starting letters of the alternatives.

With only global option (/g), no multiline (/m)

Where the demo in Regex101 can be seen at this link and the Debuggex at this link

Explanation

Matches all characters before [a-z]\), for example: a), b), ..., z), or the end of the string $.

Whether or not it begins with [a-z]\) and whether or not it ends with <br />, both out of group 1 capture.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.