Capture and filter result

Asked

Viewed 131 times

3

I have a string

<div></div>
[........]
<p>Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim</p>
<p></p>
[........]

As I do for the contents of the first <p>, namely the Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim

and from that content remove everything between parentheses (including parentheses) and finally take the text that precedes the first endpoint, leaving the final result:

Ola meu nome é pseudomatica, etc.

2 answers

6

Take the value of the first <p/>

One practical way is to take this HTML and generate a DOM of it through the PHP class Domdocument:

$html = '<div></div>
<p>Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim</p>
<p></p>';

$dom = new DOMDocument;
$dom->loadHTML($html);

// de todos os p, fica com o texto do primeiro
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue; 

// divide o texto por '.' e fica com a primeira parte
$texto = explode(".", $p)[0];

Example in Ideone:

var_dump(explode(".", $p)[0]); // string(48) "Ola meu nome é pseudomatica (sou normal), etc"

Remove parentheses and their contents

Then you can use a regular expression to remove text in brackets including parentheses:

$texto = explode(".", $p)[0];
$textoFinal = preg_replace("/\([^)]+\)/","", $texto);

Example in Ideone:

var_dump(preg_replace("/\([^)]+\)/","", $texto));  // Ola meu nome é pseudomatica , etc

2

I confess that if it were not the example of how the string should look after captured and cleaned would have been almost impossible to answer.

And these "placeholders" [....] made it even more difficult.

Well, first you have to locate all the text, whatever it is until an end point:

preg_match( '/<p>(.*?\.).*?<\/p>/', $string, $text );

If you find this paragraph, the variable $text will have two indexes: In the first everything that was married and in the second only what is inside the <p>.

Captured, you clean up:

preg_replace( '/\s\(.*?\)/', '', $text[ 1 ] );

Cleaning is done by locating a space, followed by an opening-parenthesis, with anything inside and a close-parenthesis.

Located this fragment, it is all removed and resulting astring:

Ola meu nome é pseudomatica, etc.

The complete code:

$string = '<div></div>
[........]
<p>Ola meu nome é pseudomatica (sou normal), etc. Meu nome é assim pq sim</p>
<p></p>
[........]';

if( preg_match( '/<p>(.*?\.).*?<\/p>/', $string, $text ) != 0 ) {

    echo preg_replace( '/\s\(.*?\)/', '', $text[ 1 ] );
}
  • and how can I get a div from the class she owns?

  • That would be another question, but in order not to let you down, I suggest you read this guide. It is simply the best material about Regular Expressions available in Portuguese.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.