2
In my project I need to read an HTML file that in source code has a structure of an xml. I need to read this HTML file, get the value of xml tags that have there do a whole process to save this data in my database....
Read an xml, my system reads nicely, but I need my system to be able to read an HTML file as well.
How can I do that? I have no idea where to start.
Structure of my HTML file
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body><certidao>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
<subtag></subtag>
</certidao>
</body></html>
I need to read everything inside the root tag certidao
and disregard HTML tags
The html page is saved on the computer and there is no need to access the link but the file path.
Reading HTML or XML is analogous. You have an XML reading code?
– Leonel Sanches da Silva
To make it easier for gypsies, is there a way that I can remove the html tags by leaving the certificate tag ? What happens is that in the source code of the HTML page, that is, the content of the HTML page, is an xml, that is, everything that is inside the certificate tag....
– Érik Thiago
The system cannot open the file, edit it by removing the unwanted part and then save the file as an XML?
– Andrew Paes
It is because the logic I have here @Andrewpaes reads an xml, only the logic has changed, now I need to read an html file that has as content xml.
– Érik Thiago
It would not be possible to use the
XDocument
orXmlDocument
to read the file? From there, it is simply necessary to extract the contents of the<certidao>
.– brazilianldsjaguar
Unless
XmlDocument
needs the<?xml?>
upstairs.– brazilianldsjaguar
I already use Xdocument @brazilianldsjaguar. Only that when it comes to iterating on the elements, it goes straight through and does not read the tags.
– Érik Thiago
I get it. Can you post this code? (and sorry Portuguese, not my native language!)
– brazilianldsjaguar
I’m doing the same here in that question of mine only giving the Replaces to take the html tags and leaving only the xml ones, that is, the ones inside the certificate.
– Érik Thiago