Doubt with Regex with new line

Asked

Viewed 7,006 times

8

I created the following expression:

"<strike>.*?</strike>" 

to get all the text taxed, but due to the source code having line break (as in ex. below) is not working.

<p style="margin-top: 0; margin-bottom: 0"><a name="6"></a><strike>Art. 
6º São direitos sociais a educação, a saúde, o
trabalho, o lazer, a segurança, a previdência social, a proteção à maternidade e à
infância, a assistência aos desamparados, na forma desta Constituição.</strike></p>

<p style="margin-top: 0; margin-bottom: 0">
<strike><a name="art6"></a>Art.     6<sup>o</sup> São direitos sociais a educação, a saúde, o
trabalho, a moradia, o lazer, a segurança, a previdência social, a proteção à
maternidade e à infância, a assistência aos desamparados, na forma desta
Constituição.<a href="Emendas/Emc/emc26.htm#1">(Redação dada pela Emenda
Constitucional nº 26, de 2000)</a></strike></p>

I’m using the regex in the Notepad location++.

How do I make for the regex also catch the line break?

  • 2

    This will depend on your programming language.

  • 1

    What language you are using and how you are instantiating your regular expression?

  • 1

    I’m not using any language, but the Notepad++

  • This information is very important, because the syntax changes according to the environment you will use the regular expression.

  • 1

    When you have an answer, post it as an answer. In this case, you have found a solution, prepare an answer explaining what the solution is and, if not naturally clear, explaining why.

4 answers

5

Most Engines use the . as "any character other than line break". There is usually the option m, that switches on multiline mode and removes the restriction on .. Usually the syntax is this:

/<strike>.*?</strike>/m

But it varies from language to language and from each implementation. Look at the engine documentation you’re using for more details.

3

This depends on the language you are running. Each has a way of specifying that the dot should include end-of-line characters.

The concept of dot all causes the dot to consider line breaks, but we must not forget that by default some languages do not interpret multiple lines, so we must specify that the expression is multiline.

Java

In Java you add java.util.regex.Pattern.DOTALL and java.util.regex.Pattern.MULTILINE by creating the Pattern:

Pattern.compile("\\s+", Pattern.MULTILINE + Pattern.DOTALL);

Javascript

Javascript does not exist, but according to this answer SOEN you can use [\s\S] instead of the point to achieve the same goal.

  • \s includes blanks, including line breaks and tabulations
  • \S includes what is not white space (the opposite)

Soon, [\s\S] includes all characters.

PHP

In PHP you can use modifiers s (dotall) and m (multiline). Example: .

<?php
$subject = "abcdef";
$pattern = ''/(.*)/sm'';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>

Python

In Python we have the constants re.DOTALL and re.MULTILINE:

import re
regex = re.compile(pattern, flags = re.MULTILINE | re.DOTALL)
  • 1

    +One in yours and Bernal’s. Although his question is for Notepad++, soon this question here will be indexed by Google, and we know what will happen.

  • "some languages do not interpret multiple lines, so we must specify that the expression is multiline." <- This is wrong... Java, Javascript, PHP or Python have no problem with multiple lines... Multiline only changes the behavior of ^ and $ to match with the beginning / end of the line instead of the beginning / end of the string. And you do not need the Multiline flag in this case. See https://www.regular-expressions.info/anchors.html#multi

3


Just mark the box ". consider line break".

See the image:

inserir a descrição da imagem aqui

1

the expression that satisfies the condition was thus

 <strike>(.|\n)*?</strike>

could also use the expression I quoted earlier :

<strike>.*?</strike>

and mark in search mode to find the checkbox ". maches newline"

Browser other questions tagged

You are not signed in. Login or sign up in order to post.