Format String after it is converted from HTML

Asked

Viewed 427 times

3

I made a code that turns all HTML into a String, however, when doing this the code is coming like this:

<div class=\"page\">\r\n<div class=\"bloco\">\r\n   <table id=\"canhoto\">\r\n

The characters r n I can already remove, only now I need to find a way to remove those bars that are for example in the class of div, I would like to leave it like this: class="page", but they are all like this: class= "page", I would like to treat it somehowYou were gonna make sure she didn’t get this way, and stay the right way.

string HTMLemString = RenderizaHtmlComoString("~/Views/Item/Item.cshtml", id);
        var regex = new Regex("(\\<script(.+?)\\</script\\>)|(\\<style(.+?)\\</style\\>)|(<link[^>]*>)",
            RegexOptions.Singleline | RegexOptions.IgnoreCase);
        HTMLemString = regex.Replace(HTMLemString, "");
        HTMLemString = HTMLemString.Replace("\0", "");

The part I deal with the code is this.

string CSSdocumento = CSSemString();
        Byte[] bytes;

        using (var ms = new MemoryStream())
        {
            using (var doc = new Document())
            {
                using (var writer = PdfWriter.GetInstance(doc, ms))
                {
                    doc.Open();
                    var HTMLconversão = @HTMLemString;
                    var CSSconversão = @CSSdocumento;


                    using (var msCss = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(CSSconversão)))
                    {
                        using (var msHtml = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(HTMLconversão)))
                        {
                            iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
                        }
                    }

                    doc.Close();
                }
            }

            bytes = ms.ToArray();
        }

        var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "teste.pdf");
        System.IO.File.WriteAllBytes(testFile, bytes);

And above the code where I Gero the PDF.

  • shows the code.. and what is the final purpose. because it may not make sense to create a function to convert html to string..

  • Oh yes, it is for me to convert to PDF then I am using iTextSharp, only when it creates the pdf, it is generated blank, hence I’m thinking it is something related to html, as it is coming in string. I am using this example: http://stackoverflow.com/questions/25164257/how-to-convert-html-to-pdf-using-itextsharp

  • I don’t think so.. but I’m not sure, you’re using itextsharp for some reason?

  • Yes, in the case there, I Gero a string of html, and I use itextsharp to string the pdf, like the link of the example I posted.

1 answer

0

https://stackoverflow.com/questions/2822843/itextsharp-html-to-pdf

from what I saw there seems to be a bug with that.. there’s a solution in that answer:

 Document document = new Document();
    try
    {
        PdfWriter.GetInstance(document, new FileStream("c:\\my.pdf", FileMode.Create));
        document.Open();
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        Response.Write(htmlText);
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)
        {
            document.Add((IElement)htmlarraylist[k]);
        }

        document.Close();
    }
    catch
    {
    }
  • So but in this how would I put CSS? because in the other I pass the entire CSS inside a string.

  • poes the built-in css into html e.g. because you have css on the side?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.