I cannot convert a file to ocr with itextocr

Asked

Viewed 12 times

-1

I’m trying to create a program that applies ocr to a non-ocr pdf, using the library Itextocr in the .Net.

It turns out I’m having the following error:

"Pdfparser.exe" (CLR v4.0.30319: Defaultdomain): Loaded "C: Windows Microsoft.Net Assembly GAC_32 mscorlib v4.0_4.0.0.0__b77a5c561934e089 mscorlib.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Defaultdomain): Loaded "E: Pdfparser Pdfparser bin Debug Pdfparser.exe". Symbols loaded. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Windows.Forms v4.0_4.0.0.0__b77a5c561934e089 System.Windows.Forms.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System v4.0_4.0.0.0__b77a5c561934e089 System.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Drawing v4.0_4.0.0.0__b03f5f7f11d50a3a System.Drawing.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Configuration v4.0_4.0.0.0__b03f5f7f11d50a3a System.Configuration.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Core v4.0_4.0.0.0__b77a5c561934e089 System.Core.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL System.Xml v4.0_4.0.0.0__b77a5c561934e089 System.xml.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.pdfocr.tesseract4.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.pdfocr.api.dll". Loading of ignored symbols. Module is optimized and debugger option 'Only My Code' is enabled. " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.io.dll". Loading symbols ignored. Module is optimized and debugger option 'Only My Code' is enabled. " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "C: Windows Microsoft.Net Assembly GAC_MSIL Accessibility v4.0_4.0.0.0__b03f5f7f11d50a3a Accessibility.dll". " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug itext.kernel.dll". Loading symbols ignored. Module is optimized and debugger option 'Only My Code' is enabled. " Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Tesseract.dll". The module has been created no symbols. C: Users client Downloads deposito-imagens 0001.jpg "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Bouncycastle.Crypto.dll". "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Common.Logging.Core.dll". Ignored symbol loading. Module is optimized and option 'Only My Code' debugger is enabled. "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "E: Pdfparser Pdfparser bin Debug Common.Logging.dll". Loading symbols ignored. Module is optimized and debugger option 'Only My Code' is enabled. in the Configuration <common/logging> found - suppressing logging output "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "InteropRuntimeImplementer.LeptonicaApiSignaturesInstance". "Pdfparser.exe" (CLR v4.0.30319: Pdfparser.exe): Loaded "Interopruntimeimplementer.Tessapisignaturesinstance". The program "[14980] Pdfparser.exe" was closed with code -1073741795 (0xc000001d) 'Illegal Instruction'.

I’m venturing into C#, but the code is as follows::

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Drawing;
using System.Windows.Forms;
using System.IO;

//ItextOCR

using iText.Kernel.Pdf;
using iText.Pdfocr;
using iText.Pdfocr.Tesseract4;

namespace PDFParser
{
    public partial class ConvertePesquisavel : UserControl
    {
        private static readonly Tesseract4OcrEngineProperties tesseract4OcrEngineProperties = new Tesseract4OcrEngineProperties();
        public static string PATH_ROOT = System.Environment.CurrentDirectory;
        private static string TESS_DATA_FOLDER = PATH_ROOT + @"\TESSDATA";
        private string OUTPUT_PDF;

        private static IList<FileInfo> LIST_IMAGES_OCR = new List<FileInfo>();       
               

        private void btnOcrSelecionarArquivo_Click(object sender, EventArgs e)
        {
            ocrFileDialog.Filter = "Arquivo jpg | *.jpg";

            if (ocrFileDialog.ShowDialog() != DialogResult.Cancel)
            {
                lblOcrArquivoSelecionado.Text = "Documento selecionado: \n" + ocrFileDialog.FileName;
                this.btnConverterOcr.BackColor = Color.Green;
                
            }
        }

        void OcrConvert()
        {
            var tesseractReader = new Tesseract4LibOcrEngine(tesseract4OcrEngineProperties);

            tesseract4OcrEngineProperties.SetPathToTessData(new FileInfo(TESS_DATA_FOLDER));
            var properties = new OcrPdfCreatorProperties();
            properties.SetPdfLang("por");

            var ocrPdfCreator = new OcrPdfCreator(tesseractReader);

            OUTPUT_PDF = ocrFileDialog.FileName + "ocr.pdf";


            LIST_IMAGES_OCR.Add(new FileInfo(ocrFileDialog.FileName));

            Console.WriteLine(LIST_IMAGES_OCR[0]);

            using (var writer = new PdfWriter(OUTPUT_PDF))
            {                
                ocrPdfCreator.CreatePdf(LIST_IMAGES_OCR, writer).Close();
            }
        }

private void btnConverterOcr_Click(object sender, EventArgs e)
        {
            OcrConvert();
        }
    }

The program appears when the function Ocrconvert is called.

Any help is welcome!

  • Could format the code for a better view?

  • Thanks, guys, I was able to solve the problem. I was actually in the version of .NET. I created a new project with version 4.8 and everything worked out. Thank you!

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.