0
I need to read a. pdf file and generate reports based on the data obtained from that file.
I once made a specific system for a printer, which interpreted the responses of the candidates of a public competition. The system interpreted the markings (of the answers) of the students and crossed the data with the feedback of the test, generating a report based on these data, etc.
Anyway, it turns out that "read" a . pdf, which contains texts, numbers, etc.. is something much more complex.
What is the best way to attack this type of problem? Can anyone help me with some material or any other clue.
One of the alternatives to reading a pdf or image is using Opencv (a visual computing library that has a Port for Java). Depending on what you are going to do and how the information is available in the document, you may need to use some more specific tool, such as the Tesseract, for example.
– Marcus Martins
In fact, I think your problem is best described as extracting text from a document. See an implementation using Opencv for extract text from business cards. It would be better to post a template of the document in question using the tag [tag:opencv] showing how the information is arranged that the community can help more.
– Marcus Martins