Python lib to convert PDF/EPUB to txt

Asked

Viewed 269 times

0

I’ll use the NLTK, but I need some plain text. The material I have is in epub and pdf, so I need a lib that converts one of these formats to txt. Does anyone know if you have anything like this? Thank you

1 answer

0


I’ve used several libs for this, the one that gave me the best results, was Tika, to install:

pip install tika

Utilizing:

from tika import parser

file = 'path/to/file'

# Fazendo parse
data = parser.from_file(file)

# Imprimindo o conteudo do parse
text = data['content']
print(text)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.