-1
I’m trying to create a program that applies ocr to a non-ocr pdf, using the library Itextocr
in the .Net
.
It turns out I’m having the following error:
"Pdfparser.exe" (CLR v4.0.30319: Defaultdomain): Loaded "C: Windows Microsoft.Net Assembly GAC_32 mscorlib v4.0_4.0.0.0__b77a5c561934e089 mscorlib.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Defaultdomain): Loaded "E: Pdfparser Pdfparser bin Debug Pdfparser.exe". Symbols loaded. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Windows.Forms v4.0_4.0.0.0__b77a5c561934e089 System.Windows.Forms.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System v4.0_4.0.0.0__b77a5c561934e089 System.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Drawing v4.0_4.0.0.0__b03f5f7f11d50a3a System.Drawing.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Configuration v4.0_4.0.0.0__b03f5f7f11d50a3a System.Configuration.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Core v4.0_4.0.0.0__b77a5c561934e089 System.Core.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Xml v4.0_4.0.0.0__b77a5c561934e089 System.xml.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.pdfocr.tesseract4.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.pdfocr.api.dll". Loading of ignored symbols. Module is optimized and debugger option 'Only My Code' is enabled. " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.io.dll". Loading symbols ignored. Module is optimized and debugger option 'Only My Code' is enabled. " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL Accessibility v4.0_4.0.0.0__b03f5f7f11d50a3a Accessibility.dll". " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.kernel.dll". Loading symbols ignored. Module is optimized and debugger option 'Only My Code' is enabled. " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Tesseract.dll". The module has been created no symbols. C: Users client Downloads deposito-imagens 0001.jpg "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Bouncycastle.Crypto.dll". "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Common.Logging.Core.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Common.Logging.dll". Loading symbols ignored. Module is optimized and debugger option 'Only My Code' is enabled. in the Configuration <common/logging> found - suppressing logging output "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "InteropRuntimeImplementer.LeptonicaApiSignaturesInstance". "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "Interopruntimeimplementer.Tessapisignaturesinstance". The program "[14980] Pdfparser.exe" was closed with code -1073741795 (0xc000001d) 'Illegal Instruction'.
I’m venturing into C#, but the code is as follows::
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Drawing;
using System.Windows.Forms;
using System.IO;
//ItextOCR
using iText.Kernel.Pdf;
using iText.Pdfocr;
using iText.Pdfocr.Tesseract4;
namespace PDFParser
{
public partial class ConvertePesquisavel : UserControl
{
private static readonly Tesseract4OcrEngineProperties tesseract4OcrEngineProperties = new Tesseract4OcrEngineProperties();
public static string PATH_ROOT = System.Environment.CurrentDirectory;
private static string TESS_DATA_FOLDER = PATH_ROOT + @"\TESSDATA";
private string OUTPUT_PDF;
private static IList<FileInfo> LIST_IMAGES_OCR = new List<FileInfo>();
private void btnOcrSelecionarArquivo_Click(object sender, EventArgs e)
{
ocrFileDialog.Filter = "Arquivo jpg | *.jpg";
if (ocrFileDialog.ShowDialog() != DialogResult.Cancel)
{
lblOcrArquivoSelecionado.Text = "Documento selecionado: \n" + ocrFileDialog.FileName;
this.btnConverterOcr.BackColor = Color.Green;
}
}
void OcrConvert()
{
var tesseractReader = new Tesseract4LibOcrEngine(tesseract4OcrEngineProperties);
tesseract4OcrEngineProperties.SetPathToTessData(new FileInfo(TESS_DATA_FOLDER));
var properties = new OcrPdfCreatorProperties();
properties.SetPdfLang("por");
var ocrPdfCreator = new OcrPdfCreator(tesseractReader);
OUTPUT_PDF = ocrFileDialog.FileName + "ocr.pdf";
LIST_IMAGES_OCR.Add(new FileInfo(ocrFileDialog.FileName));
Console.WriteLine(LIST_IMAGES_OCR[0]);
using (var writer = new PdfWriter(OUTPUT_PDF))
{
ocrPdfCreator.CreatePdf(LIST_IMAGES_OCR, writer).Close();
}
}
private void btnConverterOcr_Click(object sender, EventArgs e)
{
OcrConvert();
}
}
The program appears when the function Ocrconvert is called.
Any help is welcome!
Could format the code for a better view?
– Danizavtz
Thanks, guys, I was able to solve the problem. I was actually in the version of .NET. I created a new project with version 4.8 and everything worked out. Thank you!
– Guilherme Santos