Python Help Convert PDF to Excel

Asked

Viewed 25 times

0

Galley,

I am unable to make this code below work in any way... I am trying to convert a PDF file in Excel.

import re
import pdfplumber
import pandas as pd
from collections namedtuple

titulos_hvi =   namedtuple ('1a, 2a, 3a, 4a, 5a, 6a, 7a, 8a, 9a, 10a, 11a, 12a, 13a, 14a, 15a, 16a, 17a, 18a, 19a')

with pdfplumber.open(r'C:\Users\User1\pdf1.pdf') as pdf:
    page = pdf.pages[0]
    text = page.extract_text()

dados_re = re.compile(r'\d{20} [\d,]+\. \d{2}') 

line_items = []
for line in text.split('\n'):
    if lote_re.match(line):
        lote, dois_pontos, *num_lote = line.split() 

    line = dados_hvi_re.search(line):
    if line: 
        1a = line.group(1)
        2a = line.group(2)
        3a = line.group(3)
        4a = line.group(4)
        5a = line.group(5)
        6a = line.group(6)
        7a = line.group(7)
        8a = line.group(8)
        9a = line.group(9)
        10a = line.group(10)
        11a = line.group(11)
        12a = line.group(12)
        13a = line.group(13)
        14a = line.group(14)
        15a = line.group(15)
        16a = line.group(16)
        17a = line.group(17)
        18a = line.group(18)
        19a = line.group(19)
        line_items.append(titulos_hvi(num_lote, 1a, 2a, 3a, 4a, 5a, 6a, 7a, 8a, 9a, 10a, 11a, 12a, 13a, 14a, 15a, 16a, 17a, 18a, 19a,))

df = pd.DataFrame(line_items)
df.head()´´´
  • People, just to complement, is giving syntax error also: ```from Collections namedtuple Syntaxerror: invalid syntax

  • this error there you fix simply with the correct syntax: "from Collections import namedtuple" - now, if it is the error you are having, it should be in the question, not as a comment.

  • you will get more syntax errors in the lines with variavids "1a, 2a" - these are not valid Python variable names: variables cannot start with digits. But you should use a list instead of doing it there.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.