help with php regular expression

Asked

Viewed 67 times

0

I need to remove all tags along with the content and closing of the tag, when it’s script by tag. Example $variable="Something";

I want to remove all this from html

<script>alguma funcao javascript</script>

The idea is like this

$variavel = preg_replace('<script*>*</script>', '', $variavel);

But it didn’t work. Explaining the regex how it should look: He’ll give replace when he starts with

  • Use the strip_tags function.

  • strip tags don’t suit me, because I just want to remove the script tags, the other ones don’t, and I don’t want to remove just the tag, in case the script wants to remove it’s all tag, content all together

2 answers

1


Parsing HTML with regex is not recommended. In the case of the example it is +/- simple and something like /<script[^>]*>[^<]*<\/script>/ (link) could work.

Better solution would be to use the strip_tags that is done for this and takes as second argument the tags allowed.

An example would be like this:

$html = "
<p>algo</p>
<script>alguma funcao javascript</script>
<div>algo</div>
";

$html_limpo = strip_tags($html, '<p><div>');
echo $html_limpo;

The result would be:

<p>algo</p>
alguma funcao javascript
<div>algo</div>
  • But I need to release all tags, except the script

  • @Megaanim gave 2 examples, one with regex that does what you want: removes the script and content; and another where you have to enter the tags "safe".

  • @MegaAnim https://ideone.com/uzwO8P

  • 1

    Thanks, it worked xD

  • @Megaanim great. If you want you can click and mark the answer as accepted.

  • Must use the /i, because if not the ScRiPt would pass, but anyway if this is to prevent XSS is useless.

  • Oops, it looks like something went wrong.... <script type="text/javascript"> passed

  • @Megaanim https://ideone.com/dE6F2F It depends on how you’re leaking that content, it works for me. You can show all the script you want to remove and what you want to fix with it?

  • Well I’ve made the biggest mess here by testing several rsrs functions.. I got it with this $str = preg_replace('/<script[^>]>([\s S]?)</script[ >]*>/', ', $str);

  • Sergio, to better understand what I’m doing, is a page translator, I need to remove the tags and the content of the tags script, to decrease the size of the page code, since I will replace the content within an iframe, the scripts in this case have already been loaded, then removing them won’t affect her functioning. But I don’t think I’ll be able to do my project, because there are other complications, removing the script, style, and noscript was not enough to decrease the page size enough, because I still need to read the characters of the page one by one to detect where the tags are

  • @Megaanim regarding the question and the doubt I think you can mark one of the answers as accepted, even if at the end of your case you have solved otherwise, because both answers try to solve what you ask yourself. Regarding your problem it is difficult to help more without knowing all the code that is on the page.

  • I still can’t get a better answer... I never found this option here

  • @MegaAnim https://pt.meta.stackoverflow.com/q/1078/129

Show 8 more comments

0

/!\ I don’t recommend using this!

You can use REGEX:

/<script.*?>(.*?<\/script.*?>)?/i

Therefore:

<ScrIpT>alert()</ScrIpT>
<script type=\"text/javascript\">alguma funcao javascript</script>
<Script src="site.com">

Will be removed.


Now it depends on your goal with this REGEX. If it is to protect from XSS forget, it would still be possible:

<img onerror="alert('XSS')">
<svg/onload=alert('XSS')>

That’s only two of thousands of other examples.

  • not for xss not... I am building a translator, and I need to remove the script tags because they should not have the converted content, I want to convert only visible text on the screen. then first remove script, and style, then turn the rest into an array with all the separate texts, convert text by text and then replace directly within the iframe of the site you want to convert

Browser other questions tagged

You are not signed in. Login or sign up in order to post.