3
I’m looking for a solution that consists of an automatic form of reading PDF’s, basically I get hundreds of invoices and wanted a way to automate. What I tried:
Programs that convert to txt, which is not as effective because it messes up some values
Programs that take by the coordinate, sometimes change from x,y coordinate, for example, usually a PDF snippet has a line, but sometimes when it has two, mock the layout.
I’m trying to find some pattern, maybe like an ID, read this documentation http://webcheatsheet.com/php/reading_clean_text_from_pdf.php I wanted to see if I could get the dictionary, suddenly the amount I want, on all the invoices would have the same dictionary. Does anyone have any idea of a library that I could salvage the dictionary and the text? I believe that the most complete library is Pdfparser - pdfparser.org that supports more encoding and the most it supports is extracting Metadata