2
How do I delete part of a Python text ?
I have the following string
"""
texto textinho
outro texto
<div dir 'ltr'><div><div> bla bla ....
"""
I want to delete all HTML.
I’m using Python2.7
2
How do I delete part of a Python text ?
I have the following string
"""
texto textinho
outro texto
<div dir 'ltr'><div><div> bla bla ....
"""
I want to delete all HTML.
I’m using Python2.7
1
You can use a regular expression to delete everything between the "<" and ">" markers at once.
>>> string = """
... texto textinho
... outro texto
...
... <div dir 'ltr'><div><div> bla bla ....
...
... """
>>>
>>> import re
>>> print re.sub(r"<.+?>", "", string)
texto textinho
outro texto
bla bla ....
Note in particular the replacement by ""
- empty string-e in the use of ?
in the regular expression, which causes it to stop finding at the first sign of closing tag (>
) - otherwise the expression would take all the text from the opening of the first tag, until closing the last.
0
I settled with the following REGEX
import re
m = re.findall("[<][\w|\W]*[>]*", str(corpo), re.IGNORECASE)
for i in m:
corpo = corpo.replace(i, "")
That erases EVERYTHING YOU HAVE:
<QUALQUER_COISA> ISSO TAMBÉM <OPA ISSO TAMBÉM>
Thanks for the help.
-1
I got it like this
import re
string = """
texto textinho
outro texto
<div dir 'ltr'><div><div> bla bla ....
"""
r = re.search("[<].*[>]",string)
# retorna "<div dir 'ltr'><div><div>"
r.group(0)
result = string.replace(r.group(0),"")
#result vai conter ' \n texto textinho \n outro texto\n\n bla bla ....\n\n '
Browser other questions tagged python
You are not signed in. Login or sign up in order to post.
Thank you William, I will test and put the results.
– Welington Carlos
This way is not good: without the use of "?" in the
.*
- (see my answer) - the expression returns the committed space between all the tags of the string from the string to the last one. If you fix this, the way you are doing, you would delete only the first tag - all the others would be there. If you iterate over all the return groups, using string.replace for each group, it would work, but it would be much slower - since puthonThere would be to search and replace, and create a new version of the string, for each tag in HTML.– jsbueno