Remove HTML tags with regexp in query

Asked

Viewed 264 times

3

I am trying to query Mysql 5.7 using regexp to ignore HTML tags, so far I have the following query:

SELECT * FROM question WHERE question.enunciation REGEXP '[^<|</]center[^\>]' = true

Whereas I have that data in the comic book:

ID  ENUNCIATION
1   <center>Random question</center>
2   center Text
3   Text center
4   <center>Random center question</center>
5   <p>Random question</p> <strong>center</strong>

When I run this query, I expected you to return this data:

ID  ENUNCIATION
2   center Text
3   Text center
4   <center>Random center question</center>
5   <p>Random question</p> <strong>center</strong>

But only the following data is returning:

ID  ENUNCIATION
4   <center>Random center question</center>
5   <p>Random question</p> <strong>center</strong>

I’m missing in regexp? If so, how can I solve?

I would like to resolve this without having to create a function in Mysql.

A possible solution that I had until now was to save the information in this column in another removing the HTML tags and filter in this new column.

Basically I want to make a filter as if it were a LIKE but that ignored the value searched whenever it was between tags and if there is this value without being a tag should bring in the result

  • Ignore "tags" or the specific tag "<center>"?

  • would ignore tags even, the center was an example, basically my problem is the following: why it might be an HTML tag and it wouldn’t make sense for me to bring him an issue without the term he researched.

  • Try: '<(/)?.{1,}>' = false

  • @Sam tried as follows: SELECT * FROM Question WHERE Question.enunciation REGEXP '<(/)?. {1,}>' = false and Question.enunciation like '%center%' But if the value is: 1 <center>Random center Question</center> It does not come, it would need in case the filter only ignores what is inside the tag itself, but if it has the value without tag, should bring, thank you very much for trying to help me

  • @Fabiobueno It would be interesting you [Edit] the question and put these examples there as well. For example, if it is abc <center> texto</center>, you want me to return abc? (for me it was not very clear). The texts always have only one tag, or they can have tags inside others (and in this case, ignore everything within these tags? ). Can you have cases of poorly formed tags? And tags that don’t have closure, like <br>, <img>, etc, it has to be removed too? Anyway, depending on what it is, it is easier to treat it outside of SQL, even more than regex in Mysql is kind of limited...

  • @hkotsubo thank you so much for the tip, I’ll try to tidy up, but one solution I’ve had so far is to save in a new column the ENUNCIATION without the htmls tags you can have in it and filter in that column.

  • After editing, I did not understand why line 4 is returned and line 1 is not, because both have the whole text inside the tag. And line 5 is underformed, is that right? The tag can only be center or is any tag?

  • The difference is that on line 1 there is only the term center as tag, and on line 4 the term center exists without being a tag, so I want to bring this value, because there is the searched term. It can be any tag and I already packed, was wrong tag, thanks.

  • But item 4 also has all the text inside the tag center. Even so it should be returned, just because it also has the name of the tag in the middle of the text?

  • Yes, I need you to ignore the tags, but if you have the value without being a tag the same should return

  • 2

    In this case I really recommend that you do it outside of SQL, for 2 reasons: 1- processing HTML this way is easier using specific libs (parsers or dedicated libraries, etc). 2- it is even possible do a regex that detects these cases (and works for the simplest cases), but has to use some features that Mysql does not support (unless you install a lib for that). But I think it’s not worth it, I find it easier to process HTML outside the bank even...

  • @hkotsubo Thank you so much for the help, I think this would really be the best solution

Show 7 more comments
No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.