2
It is even possible to read, however, it is only feasible if the PDF maintains a "clean" format (with well defined rows and columns, no multiline, etc.). Even so a change in layout can break all the code done for reading the PDF.
In most cases a viable solution would be to transform the PDF into another format: HTML, TXT, Xls, etc.
Here has a good online tool for PDF to HTML conversion that would make it easy to read in various languages (including C#). See an example of how your document would look:
Document converted to HTML:
Since the document does not have tables with defined default, the conversion makes HTML difficult to read, for example with the Htmlagilitypack
One of the tools for converting PDF into a "readable" format for a programming language is the Able2extract
See the settings and how your document was converted to XLS:It is the best option for conversion because it allows you to align/select only the required text
Setup: Select only table and columns for conversion
A free tool to extract PDF data: PDF Multitool Utility
Converted table, now just create code for reading the XLS
Surely the code to read XLS is much more practical than for PDF
string con = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\temp\test.xls;Extended Properties='Excel 8.0;HDR=Yes;'"
using(OleDbConnection connection = new OleDbConnection(con))
{
connection.Open();
OleDbCommand command = new OleDbCommand("select * from [Sheet1$]", connection)
using(OleDbDataReader dr = command.ExecuteReader())
{
while(dr.Read())
{
var row1Col0 = dr[0];
Console.WriteLine(row1Col0);
}
}
}
Some of the various examples available on the WEB: Here and Here
Very good, thanks for the tip! , My main problem is that I need the person to enter the pdf in the system because it is updated every month, so it has to be a very simple procedure.
– Ronaldo Asevedo
You’re welcome @Ronaldoasevedo mark the answer as
aceita
if it is satisfactory.– rubStackOverflow
My main problem is that I need the person to enter the pdf in the system because it is updated every month, so it has to be a very simple procedure.
– Ronaldo Asevedo
I understand, reading PDF is not something simple due to the complexity of it, see your case. There are Sdks, mostly paid, that can read Pdfs, but as I explained it is complicated to read the data reliably. I have already tried to do something similar to what you asked, in my case the best solution was to convert the PDF. A doubt the PDF will always be the same "model" or there will be others? @Ronaldoasevedo
– rubStackOverflow
There are several tables, but they do not change, only values or more items are added, I added in the main post another table called SICRO that is more complex.
– Ronaldo Asevedo