6
I made a Scrapping in Python that takes a URL of any PDF, reads and returns, but in some Pdfs I’m having the problem of coming with some characters like this:
".\nO xc3 xb3rg xc3 xa3o tamb xc3 xa9m divulga result nGH xc3 x80QLWLYR x03GRV x03FDQGLGDWRV x03TXH x03VH x03GHFODUDP x03FRP x03GH xc3 x80FLrQFLD x03H x03GRV x03SHGLGRV x03 nde special service granted. The competition is aimed at providing of 150 vacancies for the class (Class A) of delegate Pol xc3 xadcia Civil, whose vacancies are xc3 xa3o n nprovidas as a order of clasVL xc3 x80FDomR x03H x03D x03QHFHVLGDGH x03GR x03VHUYLoR X11 on"
From what I can see, this happens when there is some accent, column or even trace in the document..
I also noticed that if you have picture, it returns strange characters! Someone has some solution or idea that can help me?
have tried
.unicode('utf-8') (utf8)
I don’t really remember...– RFL
Guys, thanks for the help. There really is Encode and Decode that helps to solve these characters in UTF-8... But in the text still has a portion of text that does not work, would be in these excerpts: "GHFODUD".
– Vanessa Nunes