While exporting the HTML file to PDF using iTextSharp and Xmlworker error occurs in some situations saying that certain tag is not closed and searching found this post How to Convert HTML to Valid XHTML? (but it is in javascript) that the conversion should be in m XHTML format because it is sure that the tags are properly formatted.
My application queries an SQL table from where it returns saved HTML files and when I try to turn them into PDF error occurs saying that certain tag is not closed, below is the code I use to export to PDF:
public ActionResult GetPdfFileZiped(ProcessamentoRegistros pProcessamentoRegistros)
XMLWorkerHelper.GetInstance().ParseXHtml(pw, doc, srHtml);
ocorre erro pois a estrutura do HTML eventualmente não está bem formatada
pProcessamentoRegistros.IdProcessamentoDiario = 1;
pProcessamentoRegistros.IdRegistro = 1;
pProcessamentoRegistros.IdServico = 2;
ProcessamentoRegistros _processamento = _IRepositorio.ObterProcessamentoRegistros(pProcessamentoRegistros);
var doc = new Document(PageSize.A4.Rotate());
var stream = new MemoryStream();
var pw = PdfWriter.GetInstance(doc, stream);
var minhaStringHTML = @_processamento.DocumentoHtml.Trim();
using (var srHtml = new StringReader(minhaStringHTML))
XMLWorkerHelper.GetInstance().ParseXHtml(pw, doc, srHtml); // <-- AQUI OCORRE ERRO
using (var compressedFileStream = new MemoryStream())
using (var zipArchive = new ZipArchive(compressedFileStream, ZipArchiveMode.Update, false))
var zipEntry = zipArchive.CreateEntry("MeuPDFZipado.pdf");
using (var originalFileStream = new MemoryStream(stream.ToArray()))
using (var zipEntryStream = zipEntry.Open())
return new FileContentResult(compressedFileStream.ToArray(), "application/zip") { FileDownloadName = "Filename.zip" };
For example, below the img tag is not closed and I have no control in its formatting, the error occurs in some other tags:
<IMG border="0" src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/caixa.gif" width=180 height=44>
Below is the full HTML::
<META NAME="GENERATOR" Content="Microsoft Visual Studio 6.0">
<script language=javascript>
//function MudarPagina() {
// window.history.back();
<!--body bgcolor=white onBlur=MudarPagina();-->
<body bgcolor=white>
<FORM method="post" style="BACKGROUND-COLOR: white">
<!--FORM name="Imprimir" method="post" style="BACKGROUND-COLOR: white"-->
<td align=center><a href="javascript:window.print();"><IMG src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/botimprimir.gif" border=0></a>
<a href="javascript:window.history.back();"><IMG src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/botvoltar.gif" border=0></a></td>
<table width="75%" CELLSPACING=0 CELLPADDING=10 border=1 align=center bordercolorlight="#FFFFFF" bordercolordark="#CCCCCC">
<TABLE WIDTH=100% BORDER=0 CELLSPACING=0 CELLPADDING=0 style="color: black" class=txtcentral>
<td align=left><IMG border="0" src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/caixa.gif" width=180 height=44></td>
<tr><td colspan=2> </td></tr>
<td align=rigth><span style="font-size: 13pt" align=center><strong>Certificado de Regularidade do FGTS - CRF</strong></span></td>
<TABLE WIDTH=100% BORDER=0 CELLSPACING=0 CELLPADDING=0 style="color: black" class=txtcentral>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<TD width=22%><font style=" font-family: Verdana;font-size:10pt"><strong>Inscrição:</strong></font></TD>
<TD ><font style=" font-family: Verdana;font-size:8pt">08439659/0001-50</font></TD>
<td width=22% valign=top nowrap><font style=" font-family: Verdana;font-size:10pt"><strong>Razão Social:</strong></font></TD>
<td><font style=" font-family: Verdana;font-size:8pt">CPFL ENERGIAS RENOVAVEIS S A</font></TD>
<td width=22% nowrap><font style=" font-family: Verdana;font-size:10pt"><strong>Nome Fantasia:</strong></font></TD>
<td ><font style=" font-family: Verdana;font-size:8pt">CPFL RENOVAVEIS</font></TD>
<td width=22% valign=top><font style=" font-family: Verdana;font-size:10pt"><strong>Endereço:</strong></font></TD>
<td ><font style=" font-family: Verdana;font-size:8pt">AV DOUTOR CARDOSO DE MELO 1184 ANDAR 7 / VILA OLIMPIA / SAO PAULO / SP / 4548-004</font></TD>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<TD colspan=2 style="text-align: justify"><font style=" font-family: Verdana;font-size:10pt">A Caixa Econômica Federal, no uso da atribuição que lhe confere o Art. 7, da
Lei 8.036, de 11 de maio de 1990, certifica que, nesta data, a empresa acima identificada
encontra-se em situação regular perante o Fundo de Garantia do Tempo de Serviço - FGTS.
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<td style="text-align: justify" colspan=2><font style=" font-family: Verdana;font-size:10pt">O presente Certificado não servirá de prova contra cobrança de quaisquer débitos referentes
a contribuições e/ou encargos devidos, decorrentes das obrigações com o FGTS.</font>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<td colspan=2><font style=" font-family: Verdana;font-size:10pt"><strong>Validade: </strong>28/02/2017 a 29/03/2017</font></TD>
<tr><td colspan=2> </td></tr>
<td colspan=2><font style=" font-family: Verdana;font-size:10pt"><strong>Certificação Número: </strong>2017022805233090232330</font></TD></TR>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<TD colspan=2><font style=" font-family: Verdana;font-size:10pt">Informação obtida em 15/03/2017, às 17:14:51.</font></TD>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<TD style="text-align: justify" colspan=2><font style=" font-family: Verdana;font-size:10pt">A utilização deste Certificado
para os fins previstos em Lei está condicionada à verificação de
autenticidade no site da Caixa: <strong>www.caixa.gov.br</strong></font></TD>
<script language=javascript>
How do I get around this problem ? Can I parse HTML and turn it into XHTML ? Is there any other alternative free to convert this HTML to PDF along with the tags Styles ?
For information purposes only, see your link another issue that brought you to this new problem.
– George Wurthmann