Preg Replace to remove awesome font from the texts

Question

Preg Replace to remove awesome font from the texts

Asked 5 years, 10 months ago

Viewed 69 times

1

I need to ride a preg_replace PHP to remove all font-awesome of a string. for example.:

Texto de teste <i class="fab fa-accusoft"></i> Teste Lorem Ipsim

I’d have to change to

Texto de teste Teste Lorem Ipsim

I have tried several regular expressions but I am a denial in this I still do not understand the basic functioning. If anyone could explain to me how to set up this regex I would be most grateful.

You want to remove from the document still on the server-side, or just not visible on the screen?

– hugocsl

2019/09/02 at 19:14

2 answers

3

Maybe it’s easier to use one parser HTML, such as the Domdocument, for example:

$dom = new DOMDocument;
$dom->loadHtml($html); // $html é uma string contendo o HTML
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//i[contains(@class, 'fa-')]") as $i) { // procura elementos i com classe fa-*
    $i->parentNode->removeChild($i);
}
echo $dom->saveHTML();

In case, I look for elements i containing a class whose name begins with fa-, and remove them from HTML.

With regex is a little more complicated, but we can assume that the tag i is always in this format (<i class="classes"></i>): no content between opening and closing the tag, and always with the names of the classes in the attribute class (and no other attribute). So regex could be something like:

$str = 'Texto de teste <i class="fab fa-accusoft"></i> Teste Lorem Ipsim <i class="fab fa-accusoft"></i> abc etc';
echo preg_replace('/<i class="[^"]*\bfa-[^"]*"><\/i>/', '', $str);

I use a character class denied ([^"]), which means anything that nay be it ". So I check if inside the quotes there is a class whose name is fa-(qualquer-coisa).

I use the shortcut \b (word Boundary) to ensure that before the f no alphanumeric character exists, thus avoiding false-positives (such as a name alfa-alguma-coisa).

The result is that tags i are removed:

Texto de teste  Teste Lorem Ipsim  abc etc

But note that spaces before and after the tag are not removed (although this will make no difference in HTML). But if you really want to remove them, you can switch to:

preg_replace('/ *<i class="[^"]*\bfa-[^"]*"><\/i>/', '', $str);

Notice that now, before the <i there is a space followed by the quantifier *, meaning "zero or more occurrences". Thus, the spaces before the tag i are also removed.

If you only have these simple cases, this regex should be enough. If you have more complicated cases, then it will probably be better to use DOMDocument even. Regex is not the best tool to work with HTML.

Browser other questions tagged php regex preg-replace

You are not signed in. Login or sign up in order to post.

by Sam • **79,597** points · Answer 1 · 2019-09-02T19:36:59+00:00

The problem in using the DOMDocument() is that it adds "trash" to the result. For example, when removing the tag from the string:

$string = 'Texto de teste <i class="fab fa-accusoft">xxx</i> Teste Lorem Ipsim <i class="fab fa-accusoft">xxx</i>';

The result will be:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Texto de teste  Teste Lorem Ipsim </p></body></html>

It even removes the tag you want, but notice that it adds tags DOCTYPE, html, body, and p.

To solve this, you would have to use other functions to remove all trash and return only the string without the fontawesome tags.

You will have to remove double spaces left by the "hole" where the fontawesome tag was, remove possible spaces at the end or beginning of the string and remove the other tags mentioned above.

For this, use the parameters LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD in the loadHTML and the rest you remove with Replaces and trim():

<?
$string = 'Texto de teste <i class="fab fa-accusoft">xxx</i> Teste Lorem Ipsim <i class="fab fa-accusoft">xxx</i>';
$doc = new DOMDocument();
$doc->loadHTML($string, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);

$selector = new DOMXPath($doc);
foreach($selector->query('//i[contains(attribute::class, "fa-")]') as $e){
    $e->parentNode->removeChild($e);
}

$string = trim(preg_replace('/<p>|<\/p>/', '', str_replace('  ', ' ', $doc->saveHTML())));

echo $string;
?>

Check on IDEONE

Without the treatments I mentioned above, see the result as it turns out:

IDEONE