Delete text part with Python

Asked

Viewed 2,056 times

2

How do I delete part of a Python text ?

I have the following string

 """ 
    texto textinho 
    outro texto

    <div dir 'ltr'><div><div> bla bla ....

 """

I want to delete all HTML.

I’m using Python2.7

3 answers

1

You can use a regular expression to delete everything between the "<" and ">" markers at once.

>>> string = """ 
...     texto textinho 
...     outro texto
... 
...     <div dir 'ltr'><div><div> bla bla ....
... 
...  """
>>> 
>>> import re
>>> print re.sub(r"<.+?>", "", string)

    texto textinho 
    outro texto

     bla bla ....

Note in particular the replacement by "" - empty string-e in the use of ? in the regular expression, which causes it to stop finding at the first sign of closing tag (>) - otherwise the expression would take all the text from the opening of the first tag, until closing the last.

0

I settled with the following REGEX

import re  
m = re.findall("[<][\w|\W]*[>]*", str(corpo), re.IGNORECASE)

for i in m:
    corpo = corpo.replace(i, "")

That erases EVERYTHING YOU HAVE:

    <QUALQUER_COISA> ISSO TAMBÉM <OPA ISSO TAMBÉM> 

Thanks for the help.

-1

I got it like this

import re

string = """ 
    texto textinho 
    outro texto

    <div dir 'ltr'><div><div> bla bla ....

 """

r = re.search("[<].*[>]",string)

# retorna "<div dir 'ltr'><div><div>"
r.group(0)

result = string.replace(r.group(0),"")

#result vai conter ' \n    texto textinho \n    outro texto\n\n     bla bla ....\n\n '
  • Thank you William, I will test and put the results.

  • This way is not good: without the use of "?" in the .* - (see my answer) - the expression returns the committed space between all the tags of the string from the string to the last one. If you fix this, the way you are doing, you would delete only the first tag - all the others would be there. If you iterate over all the return groups, using string.replace for each group, it would work, but it would be much slower - since puthonThere would be to search and replace, and create a new version of the string, for each tag in HTML.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.