Text extraction problem using Pdfreader API

Question

Text extraction problem using Pdfreader API

Asked 11 years, 1 month ago

Viewed 73 times

1

I’m messing with the class Pdfreader to extract text from a PDF document. I made a very simple document to make a test where only give include in the file PDFreader.class.php and passed the PDF path with the cited call in the example file inside the folder examples.

When I try to run this file to return the PDF text the following error appears:

Notice: Undefined index: Font in C: Setti www dg t2 Pdfreader Pdfpage.class.php on line 317

Follow the code in my file:

<html xmlns="http://www.w3.org/1999/xhtml" lang="pt-br" xml:lang="pt-br">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<body>
<?php
include ('PDFreader.class.php');

$PDF = new PDFreader();
try {
    $PDF->open('t1t.pdf');
    $text = $PDF->readText();
}
catch(PDFexception $e) {
    echo '<p style="color: #FF0000; font-weight: bold; text-align: center;">';
    echo "$e</p>\n";
}

echo "<h2>Decoded text</h2>
<p>\n";
foreach ($text as $row) {
    echo "$row<br />\n";
}
echo "</p>\n";
?>
</body>
</html>

How to fix this?

1 answer

Browser other questions tagged php

You are not signed in. Login or sign up in order to post.

by brasofilo • **6,560** points · Answer 1 · 2014-06-11T16:59:42+00:00

The code you posted works on OS X. On package we see the following:

My emphasis: Might work on Windows.
And being the last update of 2010, I think it works on your system (C:\) it’s gonna be... complicated.

Options:

Making this search, I found a Q&A in Stack Overflow with several suggestions: Is there a PDF parser for PHP?. In addition to the visible responses, there are a couple of deleted ones whose links may be useful here: