REGEX - Allow HTML at the beginning of the string

Asked

Viewed 79 times

0

This expression is cutting HTML only when I put it at the beginning of the expression. How to fix this?

(?:[ \t]*[a-z][)]\s*)?([^\r\n<]+(?:(?:\r?\n(?!\s*[a-z][)])|<(?!br\s*\/?>(?:\s*<br\s*\/?>)*\s*(?:\s+[a-z][)]|\s*$)))[^\r\n<]*)*)(?:<br\s*\/?>\s*)*

https://regexr.com/3q1ph

questao=Request.Form("questao")

'RegEx
Set re = New RegExp
re.Global = true
re.IgnoreCase = true
re.Pattern = "(?:[ \t]*[a-z][)]\s*)?([^\r\n<]+(?:(?:\r?\n(?!\s*[a-z][)])|<(?!br\s*\/?>(?:\s*<br\s*\/?>)*\s*(?:\s+[a-z][)]|\s*$)))[^\r\n<]*)*)(?:<br\s*\/?>\s*)*"

Set matches = re.Execute(questao)
If (matches.Count) Then

    'PERGUNTA

    pergunta=(matches(0).SubMatches(0)) 
    Response.Write(pergunta)

    'RESPOSTAS
    For m = 1 To matches.Count - 1
        Response.Write(matches(m).SubMatches(0))
        resposta_a=matches(1).SubMatches(0)
        resposta_b=matches(2).SubMatches(0)
        resposta_c=matches(3).SubMatches(0)
        resposta_d=matches(4).SubMatches(0)
        resposta_e=matches(5).SubMatches(0)
    Next
End If

Set matches = Nothing
Set re = Nothing
%>
  • 3
  • Maybe I expressed myself wrong. I want to include the "<" of Strong in the expression. See that it is outside.

  • But what is the purpose of REGEX? Perhaps there is a simpler solution.

  • This expression separates the question and all the answers (a,b,c,d,e), then I will give an input in the bank, but when, for example format her first word, the beginning of htlm ("<") is cut. Got it?

  • 2

    Put the text on is to be applied to regex here in the question, so that its purpose is clearer. And also detail what at this point you get for what you wanted you to get. The more detailed the more likely you are to find a satisfactory answer.

  • @Isac Has a link in the question that goes straight to expression and text all assembled. If you click on it will understand.

  • 2

    I saw the link, but it doesn’t invalidate what I said. Not only do I have to open a link on a new page, to try to guess what you’re trying to capture, and therefore guess what’s not right. The more you clarify the better results you get.

  • @Isac’s okay buddy, I’m sorry.

  • These questions come from a relational database? Because if they come I think this idea of putting together the questions in question everything in the same field is a bad strategy of how the bank was planned, I would personally make a table for the questions related to a table for the options, so in case of removing options or edit-las would not need to be doing "parser" with regex

  • @Guilhermenascimento is just the opposite of what you’re saying. I take the whole question, play in an editor type fckeditor, format it, separate with REGEX and insert in the bank in separate fields. This is my idea. field_question, answer_1, answer_2 and so on.

  • @Rod this was the intention even, but I do not think invalido try to resort to this, I have even a suggestion maybe I formulate an answer tomorrow.

Show 6 more comments

1 answer

1


The problem is in this first set of negation that excludes the symbol < (as well as the Carriage Return \r and the line feed \n):

                             ↓
(?:[ \t]*[a-z][)]\s*)?([^\r\n<]+ ...

Just remove the < which will work, but I have the impression that this regex is polluted. With only ([^\r\n])+ already captures everything in groups:

string = '<strong>Acerca</strong> dos atos notariais é correto afirmar:\n'
+'\r'
+'a) O testamento público não pode ser celebrado por relativamente incapaz maior de 16 e menor de 18 anos, sem a participação de assistente.\n'
+'b) Não é possível a lavratura de pacto antenupcial no regime da separação parcial de bens, mesmo quando os noivos pretendam alterar ou disciplinar algum aspecto específico do regime de bens, pois esta avença descaracterizaria preceito de ordem pública\n'
+'c) O aspecto temporal da emissão do documento é o critério essencial na diferenciação entre traslado e certidão\n'
+'d) Os requisitos formais a serem observados pelo Tabelião nas escrituras públicas e nas atas notariais são exatamente os mesmos, pois não há diferenças extrínsecas entre estes instrumentos públicos.\n'
+'e) Os requisitos formais a serem osasabservados pelo Tabelião nas escrituras públicas e nasasas atas notariais são exatamente os mesmos, pois não há diferenças extrínsecas entre estsases instrumentos <strong>públicos.</strong>'

matches = string.match(/([^\r\n])+/g)

for(var item of matches){
   console.log(item);
}

  • The question is posed in a fckeditor and this negation "<" takes all <br /> from the end of the responses that is generated. Removing it worked like you said, but the answers are coming with <br /> at the end.

  • I wanted to put the code I’m doing here, but I don’t know where I put it.

  • That detail I didn’t know. I’ll reevaluate here...

  • There is a link to edit the question. Put the code there.

  • yes. I did the following now, I took out the "<" you said and replaced it in Asp right in the fields to clear the <br>

  • Now I will see if I can do the following: the question comes all messy when I take the pdf and I put it in the fckeditor, I would like to put it neatly so that REGEX can do what I want to insert in the right bank. Because it needs to be in the pattern.

  • can do this with a jquery load event?

  • Try that regex: ((.*?)\n){1}

  • It was no, the question filled in the answers. Each piece entered a different answer.

  • When you paste in the Editor, it looks like this, exactly like this image? https://i.stack.Imgur.com/34Ynp.jpg

  • No. looks exactly like this https://regexr.com/3q2ad is seeing the spaces before the letters?

  • to marry my regex I have to remove these spaces before, I can do it in hand, but if automate it would be better.

  • I managed to do without regex, give a test at this link.

Show 8 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.