0
Good afternoon. I am facing a problem while converting a CSV file to an Excel file, via openpyxl. The code structure aims to convert a PDF to Excel, and paste the PDF information into a Sheet from an already pre-formatted Excel spreadsheet.
What I tried:
import PyPDF2
import pandas as pd
from openpyxl import Workbook, load_workbook
import string
import csv
pdfFileObj=open(r".\pasta_460\pdf_460.pdf",'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
paginas_pdf = []
for page in pdfReader.pages:
ddd = page.extractText()
paginas_pdf.append(ddd)
df = pd.DataFrame(paginas_pdf)
df.to_csv(r".\pasta_460\pdf_em_csv_460.csv",encoding='utf-8')
book = load_workbook(r".\teste_template_planilha.xlsx")
writer = pd.ExcelWriter(r".\teste_template_460_modelada.xlsx", engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, 'Sheet12')
df.to_excel(writer, 'Sheet12', startrow=1, startcol=1, header=False, index=False)
writer.save()
Even worked and generated the spreadsheet with Sheet12 containing the data I proposed, however, the data comes out all in a single line of Excel, I believe it is because the data is in CSV stored inside a list (paginas_pdf), but I’m not finding solution to this problem.
I would like the data to go out on Sheet12 line by line, that is, the delimiter ":" breaks the information and puts line by line.
To save to excel this would be enough
df.to_excel("output.xlsx")
erasing all lines ofbook =
down.– Paulo Marques
Even using
df.to_excel("output.xlsx")
the data comes out all on a single line. I’ve trieddf.replace('\n', ' ')
and also keeps the structure of single lines, as if it were an array. The question of the code ofbook =
down is the intention to create the spreadsheet in the same spreadsheet that I indicated as a template.– gabriel_santos
Run df.head() and update the post
– Paulo Marques
@Paulomarques, see if with the images it is easier to identify the problem. I appreciate the attention
– gabriel_santos
Before the line
paginas_pdf.append(ddd)
additems = ddd.split("\n")
then replacepaginas_pdf.append(ddd)
forpaginas_pdf += items
. Must solve.– Paulo Marques
You’re a genius! ahahha Thanks for the help. The solution worked perfectly.
– gabriel_santos