0
I have a PDF file that contains a table, I need to remove it and turn it into data.frame
. I’m trying to use the package pdftools
.
I’m trying to use the following code, which ends up returning me to the table, but I don’t know how to format it to become one data.frame
library(pdftools)
pdf <- pdf_text("https://www.imf.org/~/media/Files/Publications/WEO/2020/January/English/text.ashx?la=en")
pdf <- capture.output(cat(pdf))
the PDF link has a file with several contents besides the table, edit the pdf with only the page that interests you or convert the PDF to an Excel spreadsheet, it is much easier.
– Izak Mandrak
Try the package instead
tabulizer
, functionextract_tables
. But the pdf actually has 6 datasets in the same table 1. These datasets are separated by blank lines. The result oftab <- extract_tables("https://etc"); tab <- tab[[1]]
must be processed to obtain one or more tables.– Rui Barradas
Thanks for the tips! I’ve tried the package
tabulizer
, he extracts the table, but ends up recognizing one less column and do not know how to fix this problem.– Alexandre Sanches