Being a PDF an image, to extract the texts is necessary an OCR package (it is necessary to keep in mind that these packages may not have 100% of hit), there are several of them in python, for what you want has a very interesting that works in python 2.7 and 3.4, textract.
Take an example:
import textract
text = textract.process("orcamento.pdf")
print (text)
Clicar para incluir o cabeçalho
EXEMPLO DE ORÇAMENTO: Exemplos de Itens Detalhados
OBSERVAÇÃO : Este é somente um exemplo. Nem todos os orçamentos terão todos os exemplos listados abaixo. Favor usar somente os itens que dizem
respeito ao seu projeto proposto.
I. SALÁRIOS
Diretor Executivo
Diretor de Projeto
Contador
Editor Sênior
Editor
Salário Anual
5000
4000
2000
750
500
Porcentagem
50%
100%
50%
20%
45%
I used this pdf for example, of course I copied only part of the result, just for demonstration.
Obs.:
- In your case, you would have to download the pdf to a local directory and carry out the example process.
- To install in python 3, see this link.
textract.exceptions.Shellerror: The command
pdftotext oi.pdf -
failed with Exit code 127 ------------ stdout -------------- ------------- stderr -------------– Luan pedro
'cause I’m the one :(
– Luan pedro
@Luanpedro It would be interesting to present the context in which this happens, how about showing a fragment of code?
– Sidon
My Cod > https://pastebin.com/pdMbxhXa Newsletter used > https://www.sendspace.com/file/08khoa ERROR: http://prntscr.com/kwg16g My topic about this > https:///pt.stackoverflow.com/questions/329889/pegar-valores-emboletim-usando-python-ocr
– Luan pedro
@Luanpedro Putz... keep going after images is complicated, better put here. But if you take the time I will check. :-)
– Sidon