Code abstraction with multiple Replaces

Asked

Viewed 45 times

1

I have this code pad:

 xml = xml.Replace("<html>", "");                                    
 xml = xml.Replace("<head>", "");
 xml = xml.Replace("</head>", "");
 xml = xml.Replace("<body>", "<certidoes>");
 xml = xml.Replace("</body>", "");
 xml = xml.Replace("</html>", "</certidoes>");

My question is: Is there any way to abstract this block in a way that is more friendly and simple ?

I need you to remove html tags and tag certificates.

Note: xml = is an xml I have in the content of the HTML page

1 answer

2


Well, I would try something that is not so much simpler, but is much cleaner and more effective, because your file may have other tags or "junk" inside the tags.

I’ve seen your other questions about this HTML file of yours that has an XML inside.

string arquivoHTML = @"<html xmlns='http://www.w3.org/1999/xhtml'>
                        <head>
                        <meta charset='UTF-8' />
                        </head><body>
                        </body>
                        </html>";

arquivoHTML = arquivoHTML.Replace("<body>", "|").Replace("</body>", "#");
arquivoHTML = arquivoHTML.Substring(arquivoHTML.IndexOf('|')).Replace("|", "<certidoes>");
arquivoHTML = String.Concat(arquivoHTML.Substring(0, arquivoHTML.IndexOf('#')), "</certidoes>");

Console.WriteLine(arquivoHTML);
  • Dude, your solution worked well, but the <certidoes> tag is not in the HTML file, so I put it in replace. Within this your solution, is there any way to put it? And it would be possible to put a brief explanation of what is happening ?

  • What do you mean put it? You want to insert the <certidoes> tag into an HTML and then remove the HTML. Can you explain to me the whole process you are doing? So we can try to come up with a simpler solution.

  • 1

    Well, what I do is take a string even if it’s multiline ( @ lets you write multilines ), so I simulate that you opened a file and passed it all to a string. Then I take this string and replace the <body> tags to find it easier. Then I remove everything before "|" and after "#". Resulting only the <certidoes> tag and what is inside it.

  • Andrew I’ll explain, I hope to be clear.. kkk. What happens is that in order for me to be able to read my xml, I need to have a root tag. This root tag is the certificates. But when I download the HTML file the tag <certidoes> is not in the document, so what do I do ? I give a replace that way: xml = xml.Replace("<body>", "<certidoes>"). So I remove the HTML tag <body> and tag <certidoes> in place. And likewise it happens in this line: xml = xml.Replace("</html>", "</certidoes>"). I think I can explain!

  • Simple guy! This helps a lot! But I hope I was clear in the other comment!

  • And inside the <body> tag you have an XML, right? So then why do you need to create this <certidoes> tag instead of simply taking the html and leaving the XML tags that are already there, and with this feed XML in C# a collection that will be the certificates?

  • But if you want I edit the answer to include the <certidoes tag>

  • Because in the logic of scanning xml, at the time of reading, when the reader arrives at the <certidao> tag he ignores it and cannot read the xml. So I need a root tag so that it ignores it, that is, the <certidoes tag> and read everything that is in the <certidao tag>.

  • I’d like to, man, if I could, it would be really good!

  • IT WORKED MAN! IT WAS WORTH TOO !

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.