0
I am trying to create a script that reads a PDF identify some points call, then mark the pages that are these keys, and split the page itself. And then make a merge thus generating a new summary PDF.
import PyPDF2
pdf_file = open('nomefatura.pdf','rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
page = pdf_reader.getPage(1)
text = page.extractText()
print(text)
But that’s the way out:
5G=AK;"A?" BCP;<>;<. -/78.7:-1Y. ,7.77-ZABT;"A?" SJB? @>? 6Y-ZABT;"A?" L=>GK==J;K ! N==" *=J;K ! N==" *=J;K ! N===" *=J;K ! 5%4FHGRCGP$D6DGD54RHDG636VC3C7CC7C -1+>7C3CC -1+>C3CC5%4FHGRCG%4C$8D6DGD54RHG64%CGC36G64%CGC3V56GG%6F635D7R3FG -1+>7C3CC -1+>C3CC5%4FHGRCG%44$8D6DGD54RHFR%DHV3H4R%DVC3C6GH563GC7R3GCG -1+>C3CC -1+>C3CC -1+>C3CC%C3CC%CC%CHCGRCG%46$8D6DGDGDGD54RH54R%VG54R%VVG4R%VVGC6%%1+>7C3CC -1+>C3CC5%4FHGRCG%4R$8D6DGD54RHH6D%D653VC6D%DRC3GC6GF%4C53FR7R3R3FG -1+>7C3CC -1+>C3CCNO$ Pj#DV3C4N@ (Tpj#66G3D4(*#+10,I2*0,1-#f#T02#46%5D4E6C46&Y#. -,1)2_<2bc0Y#+-#NO$ #g4kh#0#NO@ (T#gC3Gkh#,=-#Y=-#)0Z+YY+I+Y#fY#1+)2\+Y%4? LBSB? @>? <"3GAB>BR;<"?" A="Y=J="JBTG? (")'))"]F9"$H')"RB=">? J? L;@?" LBO;"S;C"044%3>? @ABC? @>;"=;"SJB? @>?" 0./D"E9FF";G"9)H#%@+#/-)+#I0#. -*Z>01+)#Y<+Y#>2;+bc0Y#I0#>-,;+#I2Y1l,. 2+#'-. m#Z)0.2Y+#I2;21+)#-#. nI2;-#I0#<*+#-Z0)+I-)+#o<0#Z)0Y10#0Y10#Y0)'2b-#,+#Y<+#)0;2=-%#? -,/0b+#1-I-Y#0#\+b+#Y<+#0Y. ->/+B !" #$%&''#$%# -I-#-#9)+Y2>#p4G#7# 0>0\q,2.+ #7# -I-#-#9)+Y2>#p64#7#? >+)-#7# -I-#-#9)+Y2>#pR4# 0>0+)#7# -I-#-#9)+Y2>#p4D#7#L2#7#$83#8P3#$? 3#P$3#"$3#9&3#$83#"M3#ML3#PA#p5G#7#W2Z^+r#7#? nI2;-#,+. 2-,+>#DRp#46#7#&>;+)#7#"M#gY01-)#Rh3#$8#gY01-)#RRh3#"$#gY01-)#66h3#ML#gY01-)#6Gh3#pV4#7#! 8#? LP8#7# -I-#-#9)+Y2>#pHG#7# 0>0.-#FG#7#? nI2;-#,+. 2-,+>#FG3#pdv#7#?+_)2I;0#7#$8#gY01-)#R4hp#6F#7#! Q #9)+Y2>#7#$83#PA3#"M3#8P3#P$#p#8P76D#7#$0).-10>9+,. -Y#. -,'0,2+I-YB#s#9&$#9+,. -#I+#&+Jq,2+#s#9+,. -#I-#@-)I0Y10#s#9&@($ ($#t#9+,. -#I-#(Y1+I-#I-#(YZe)21-#$+,1-#s#$+,1+,I0)#s#9&@8&P&#t#9+,. -#I-#(Y1+I-#I-#8+)X#s#9&@($(#t#9+,. -#I-#(Y1+I-#I0#$0);2Z0#s#9P9#t#9+,. -#I0#9)+Ye>2+#s#9+,. -#! @ (P#s#O@! 8P!" (#s#? (? P(Q#s#9)+I0Y.-#s#! 1+a#s#9+,. -#"0).+ ,12>#s#9+,. -#$+)+#s# )2_+,. -#s#9&@$!? P(Q#s#9+,. --_#s#8+;NX.2>#s#9+,. -#I-#9)+Y2>#s#?+ 2]+#(. -,q*2.+ #N0I0)+>#s#9+,)2Y<>#s#&,+10>#4RR4N+1<)+N+1<)+#I0#8+;+0,1-B#D6DGFFR5HF !"#$%&%&'%()+,-#"+)./01123456#7#8+)10#9#:;<+#9)+,.+ #7#$=-#8+<>-7$8? @8AB#C6%D64%D64ECCF76F#7#! %(%B#44G%FCH%CFG%44C? >20.10B#5%4FHGRCG? 8NE? @8AB#4CDCDH6CCCC4GD(*2YY=-B#C6ECGE6C6C#8-Y1+;0*B#4DECGE6C6CP0 0)m,. 2+B#"&! E6C#80)e-i-B#C4ECDE6C#+#RCECDE6CQ`_21-#+<1-*X12.-B#CCCCCV4RRRG4HR5GC4G8X;2,+#6#I0#6
I did a lot of research and couldn’t find a solution yet.
At first as I understood it is not possible to make a PDF download itself, for being binary, 'rb'
, if I just tell you to do the good reading too.
Attempts to convert:
print(text.encode('latin-1'))
print(text.encode('utf-8'))
print(text.encode('ISO-8859-1'))
Opa, already yes, some PDF does the reading correctly. I just forgot to comment again here, I found another solution for what I needed to do.
– Dieinimy Maganha