1
Issue: Develop the Myhtmlparser class as an Htmlparser subclass which, when fed with an HTML file, displays the names of the start and end tags in the order they appear in the document, and with a indentation proportional to the depth of the element in the document tree structure. Ignore HTML elements that do not require an end tag, such as p and br.
The HTML file used: https://easyupload.io/d45c52
The exit must be:
html start
head start
title start
title end
head end
body start
h1 start
h1 end
h2 start
h2 end
ul start
li start
...
a end
body end
html end
What I did:
from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs): #mostra valor do atributo href, se houver
print (tag, "start")
def handle_endtag(self, tag):
print(tag, "end")
infile = open("w3c.html", "r")
content = infile.read()
infile.close()
myparser = MyHTMLParser()
myparser.feed(content)
My way out was:
html start
head start
title start
title end
head end
body start
h1 start
h1 end
p start
br start
p end
h2 start
h2 end
...
a start
a end
body end
html end
How to fix the code to achieve indentation on output?