I am unable to make this code below work in any way... I am trying to convert a PDF file in Excel.
import re
import pdfplumber
import pandas as pd
from collections namedtuple
titulos_hvi = namedtuple ('1a, 2a, 3a, 4a, 5a, 6a, 7a, 8a, 9a, 10a, 11a, 12a, 13a, 14a, 15a, 16a, 17a, 18a, 19a')
with pdfplumber.open(r'C:\Users\User1\pdf1.pdf') as pdf:
page = pdf.pages[0]
text = page.extract_text()
dados_re = re.compile(r'\d{20} [\d,]+\. \d{2}')
line_items = []
for line in text.split('\n'):
if lote_re.match(line):
lote, dois_pontos, *num_lote = line.split()
line = dados_hvi_re.search(line):
if line:
1a = line.group(1)
2a = line.group(2)
3a = line.group(3)
4a = line.group(4)
5a = line.group(5)
6a = line.group(6)
7a = line.group(7)
8a = line.group(8)
9a = line.group(9)
10a = line.group(10)
11a = line.group(11)
12a = line.group(12)
13a = line.group(13)
14a = line.group(14)
15a = line.group(15)
16a = line.group(16)
17a = line.group(17)
18a = line.group(18)
19a = line.group(19)
line_items.append(titulos_hvi(num_lote, 1a, 2a, 3a, 4a, 5a, 6a, 7a, 8a, 9a, 10a, 11a, 12a, 13a, 14a, 15a, 16a, 17a, 18a, 19a,))
df = pd.DataFrame(line_items)
People, just to complement, is giving syntax error also: ```from Collections namedtuple Syntaxerror: invalid syntax
– Gabriel143
this error there you fix simply with the correct syntax: "from Collections import namedtuple" - now, if it is the error you are having, it should be in the question, not as a comment.
– jsbueno
you will get more syntax errors in the lines with variavids "1a, 2a" - these are not valid Python variable names: variables cannot start with digits. But you should use a list instead of doing it there.
– jsbueno