0
I wish to create the tag in Beautiful Soup (bs4), and developed the following:
from bs4 import Doctype
tag = Doctype('html')
I did the above excerpt. But it does not create the tag .
How to proceed?
0
I wish to create the tag in Beautiful Soup (bs4), and developed the following:
from bs4 import Doctype
tag = Doctype('html')
I did the above excerpt. But it does not create the tag .
How to proceed?
1
Create Doctype with beautifulsoup elements:
>>> from bs4 import Doctype
>>> tag = Doctype('html')
>>> type(tag)
<class 'bs4.element.Doctype'>
>>> print(tag)
'html'
Insert into an HTML:
>>> from bs4 import Doctype
>>> from bs4 import BeautifulSoup
>>> html = '''<html><body></body></html>'''
>>> soup = BeautifulSoup(html, 'html.parser')
>>> tag = Doctype('html')
>>> type(tag)
<class 'bs4.element.Doctype'>
>>> tag
'html'
>>> soup.insert(0, tag)
>>> soup
<!DOCTYPE html>
<html><body></body></html>
1
If in fact the intention is to generate files .html
I believe that
You can install html5lib with Pip:
pip install html5lib
And then use the html5lib
, thus:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p></p>', 'html5lib')
soup.body.append(soup.new_tag("a", href="https://answall.com"))
print(soup)
Of course the way out will be something like:
b'<html>\n <head>\n </head>\n <body>\n <p>\n </p>\n <a href="https://answall.com">\n </a>\n </body>\n</html>'
But to solve it would be enough to concatenate a string with the doctype of HTML5, for example:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p></p>', 'html5lib')
soup.body.append(soup.new_tag("a", href="https://answall.com"))
source = soup.prettify("utf-8")
with open("output.html", "wb") as file:
file.write(b'<!DOCTYPE html>\n')
file.write(source)
print(source)
I don’t know html5lib
deep, but maybe I should do something with this.
This alternative is the one I’m using. But I want to create the object via bs4
@britodfbr cannot understand the necessity of this, except if it was a custom doctype itself, being HTML native doctype whether or not it exists inside the object seems irrelevant to me.
I’ll update a mass with 168k files, and I’m creating a script to automate editing.
Browser other questions tagged python beautifulsoup
You are not signed in. Login or sign up in order to post.
I think it got a little confusing the second snippet of code, where it apparently mixed real code and output from the program. I believe that doing something similar to what was done in the first excerpt (putting "
>>>
" preceding the code and nothing preceding the output) is ideal– Jefferson Quesado
Sorry. I ran out of network I couldn’t edit. ;)
– britodfbr