How to create the <!DOCTYPE html> tag in Beautiful Soup (bs4)

Asked

Viewed 91 times

0

I wish to create the tag in Beautiful Soup (bs4), and developed the following:

from bs4 import Doctype

tag = Doctype('html')

I did the above excerpt. But it does not create the tag .

How to proceed?

2 answers

1


Create Doctype with beautifulsoup elements:

>>> from bs4 import Doctype
>>> tag = Doctype('html')
>>> type(tag)
<class 'bs4.element.Doctype'>
>>> print(tag)
'html'

Insert into an HTML:

>>> from bs4 import Doctype
>>> from bs4 import BeautifulSoup

>>> html = '''<html><body></body></html>'''
>>> soup = BeautifulSoup(html, 'html.parser')

>>> tag = Doctype('html')
>>> type(tag)
<class 'bs4.element.Doctype'>
>>> tag
'html'
>>> soup.insert(0, tag)
>>> soup
<!DOCTYPE html>
<html><body></body></html>
  • I think it got a little confusing the second snippet of code, where it apparently mixed real code and output from the program. I believe that doing something similar to what was done in the first excerpt (putting ">>> " preceding the code and nothing preceding the output) is ideal

  • Sorry. I ran out of network I couldn’t edit. ;)

1

If in fact the intention is to generate files .html I believe that

You can install html5lib with Pip:

pip install html5lib

And then use the html5lib, thus:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<p></p>', 'html5lib')

soup.body.append(soup.new_tag("a", href="https://answall.com"))

print(soup)

Of course the way out will be something like:

b'<html>\n <head>\n </head>\n <body>\n  <p>\n  </p>\n  <a href="https://answall.com">\n  </a>\n </body>\n</html>'

But to solve it would be enough to concatenate a string with the doctype of HTML5, for example:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<p></p>', 'html5lib')

soup.body.append(soup.new_tag("a", href="https://answall.com"))

source = soup.prettify("utf-8")

with open("output.html", "wb") as file:
    file.write(b'<!DOCTYPE html>\n')
    file.write(source)

print(source)

I don’t know html5lib deep, but maybe I should do something with this.

  • This alternative is the one I’m using. But I want to create the object via bs4

  • @britodfbr cannot understand the necessity of this, except if it was a custom doctype itself, being HTML native doctype whether or not it exists inside the object seems irrelevant to me.

  • I’ll update a mass with 168k files, and I’m creating a script to automate editing.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.