0
I have a PDF file that contains a table, I need to remove it and turn it into data.frame. I’m trying to use the package pdftools.
I’m trying to use the following code, which ends up returning me to the table, but I don’t know how to format it to become one data.frame
library(pdftools)
pdf <- pdf_text("https://www.imf.org/~/media/Files/Publications/WEO/2020/January/English/text.ashx?la=en")
pdf <- capture.output(cat(pdf))
the PDF link has a file with several contents besides the table, edit the pdf with only the page that interests you or convert the PDF to an Excel spreadsheet, it is much easier.
– Izak Mandrak
Try the package instead
tabulizer, functionextract_tables. But the pdf actually has 6 datasets in the same table 1. These datasets are separated by blank lines. The result oftab <- extract_tables("https://etc"); tab <- tab[[1]]must be processed to obtain one or more tables.– Rui Barradas
Thanks for the tips! I’ve tried the package
tabulizer, he extracts the table, but ends up recognizing one less column and do not know how to fix this problem.– Alexandre Sanches