0
I’ll use the NLTK, but I need some plain text. The material I have is in epub and pdf, so I need a lib that converts one of these formats to txt. Does anyone know if you have anything like this? Thank you
0
I’ll use the NLTK, but I need some plain text. The material I have is in epub and pdf, so I need a lib that converts one of these formats to txt. Does anyone know if you have anything like this? Thank you
0
I’ve used several libs for this, the one that gave me the best results, was Tika, to install:
pip install tika
Utilizing:
from tika import parser
file = 'path/to/file'
# Fazendo parse
data = parser.from_file(file)
# Imprimindo o conteudo do parse
text = data['content']
print(text)
Browser other questions tagged python nltk
You are not signed in. Login or sign up in order to post.