Code abstraction with multiple Replaces

Question

Code abstraction with multiple Replaces

Asked 10 years ago

Viewed 45 times

1

I have this code pad:

 xml = xml.Replace("<html>", "");                                    
 xml = xml.Replace("<head>", "");
 xml = xml.Replace("</head>", "");
 xml = xml.Replace("<body>", "<certidoes>");
 xml = xml.Replace("</body>", "");
 xml = xml.Replace("</html>", "</certidoes>");

My question is: Is there any way to abstract this block in a way that is more friendly and simple ?

I need you to remove html tags and tag certificates.

Note: xml = is an xml I have in the content of the HTML page

1 answer

Browser other questions tagged c# html xml regex

You are not signed in. Login or sign up in order to post.

by Andrew Paes • **268** points · Answer 1 · 2015-07-15T13:13:13+00:00

2

Well, I would try something that is not so much simpler, but is much cleaner and more effective, because your file may have other tags or "junk" inside the tags.

I’ve seen your other questions about this HTML file of yours that has an XML inside.

string arquivoHTML = @"<html xmlns='http://www.w3.org/1999/xhtml'>
                        <head>
                        <meta charset='UTF-8' />
                        </head><body>
                        </body>
                        </html>";

arquivoHTML = arquivoHTML.Replace("<body>", "|").Replace("</body>", "#");
arquivoHTML = arquivoHTML.Substring(arquivoHTML.IndexOf('|')).Replace("|", "<certidoes>");
arquivoHTML = String.Concat(arquivoHTML.Substring(0, arquivoHTML.IndexOf('#')), "</certidoes>");

Console.WriteLine(arquivoHTML);

Dude, your solution worked well, but the <certidoes> tag is not in the HTML file, so I put it in replace. Within this your solution, is there any way to put it? And it would be possible to put a brief explanation of what is happening ?

– Érik Thiago

2015/07/15 at 13:32
What do you mean put it? You want to insert the <certidoes> tag into an HTML and then remove the HTML. Can you explain to me the whole process you are doing? So we can try to come up with a simpler solution.

– Andrew Paes

2015/07/15 at 13:44
1

Well, what I do is take a string even if it’s multiline ( @ lets you write multilines ), so I simulate that you opened a file and passed it all to a string. Then I take this string and replace the <body> tags to find it easier. Then I remove everything before "|" and after "#". Resulting only the <certidoes> tag and what is inside it.

– Andrew Paes

2015/07/15 at 13:47
Andrew I’ll explain, I hope to be clear.. kkk. What happens is that in order for me to be able to read my xml, I need to have a root tag. This root tag is the certificates. But when I download the HTML file the tag <certidoes> is not in the document, so what do I do ? I give a replace that way: xml = xml.Replace("<body>", "<certidoes>"). So I remove the HTML tag <body> and tag <certidoes> in place. And likewise it happens in this line: xml = xml.Replace("</html>", "</certidoes>"). I think I can explain!

– Érik Thiago

2015/07/15 at 13:52
Simple guy! This helps a lot! But I hope I was clear in the other comment!

– Érik Thiago

2015/07/15 at 13:54
And inside the <body> tag you have an XML, right? So then why do you need to create this <certidoes> tag instead of simply taking the html and leaving the XML tags that are already there, and with this feed XML in C# a collection that will be the certificates?

– Andrew Paes

2015/07/15 at 13:56
But if you want I edit the answer to include the <certidoes tag>

– Andrew Paes

2015/07/15 at 13:57
Because in the logic of scanning xml, at the time of reading, when the reader arrives at the <certidao> tag he ignores it and cannot read the xml. So I need a root tag so that it ignores it, that is, the <certidoes tag> and read everything that is in the <certidao tag>.

– Érik Thiago

2015/07/15 at 13:58
I’d like to, man, if I could, it would be really good!

– Érik Thiago

2015/07/15 at 13:59
IT WORKED MAN! IT WAS WORTH TOO !

– Érik Thiago

2015/07/15 at 14:06

Show 5 more comments