Replace text Beautifulsoup

Asked

Viewed 156 times

3

I have the following html:

<p>Um acidente no cruzamento da rua ... </p>
<div id="marca"></div>
<p>Um acidente no cruzamento ......</p>
<div id="marca2"></div>

I’m trying to do it like this:

def text_view(self):

    soup = BeautifulSoup(self.text)
    try:
        marca1 = BeautifulSoup(self.get_images())
        soup.find("div", {"id": "marca"}).replaceWith(marca1)
    except:
        pass

    try:
        marca2 = BeautifulSoup(self.get_images())
        soup.find("div", {"id": "marca2"}).replaceWith(marca2)
    except:
        pass

    return soup

But it only replaces the text of the first div. What can it be?

  • What is self.get_images? What does it return? And the second code (findAll) is still so, with id? Note that in your code the div owns the class marca2, not the id. Or it was just a mistake when asking the question?

  • 1

    this self.get_images() returns an html list of images. I corrected the code. After other attempts I haven’t been able to yet ):

  • ok, and this list of images has ids? I noticed that you are using the same values (the return of self.get_images()) in two different places. If there are repeated ids, it can be a problem - although I’m not sure if it already occurs in Beautifulsoup itself or only later when rendering HTML.

1 answer

2


According to that question on Soen, when you insert an object BeautifulSoup inside another object BeautifulSoup, the find stops working properly - it stops the search before the time (it’s a bug in the implementation of find). A workaround would be to do all searches first, and all replacements afterwards:

soup = BeautifulSoup(self.text)

# Primeiro busca tudo
ponto1 = soup.find("div", {"id": "marca"})
ponto2 = soup.find("div", {"id": "marca2"})

# Depois substitui tudo
marca1 = BeautifulSoup(self.get_images())
marca2 = BeautifulSoup(self.get_images())

ponto1.replaceWith(marca1)
ponto2.replaceWith(marca2)

Another - as quoted in response to the linked question - would remake the object BeautifulSoup before making the second find:

soup = BeautifulSoup(self.text)
try:
    marca1 = BeautifulSoup(self.get_images())
    soup.find("div", {"id": "marca"}).replaceWith(marca1)
except:
    pass

# Refaz o objeto
soup = BeautifulSoup(soup.renderContents())

try:
    marca2 = BeautifulSoup(self.get_images())
    soup.find("div", {"id": "marca2"}).replaceWith(marca2)
except:
    pass
  • It worked like a charm

Browser other questions tagged

You are not signed in. Login or sign up in order to post.