Remove PDF file table in R

Asked

Viewed 113 times

0

I have a PDF file that contains a table, I need to remove it and turn it into data.frame. I’m trying to use the package pdftools.

I’m trying to use the following code, which ends up returning me to the table, but I don’t know how to format it to become one data.frame

library(pdftools)

pdf <- pdf_text("https://www.imf.org/~/media/Files/Publications/WEO/2020/January/English/text.ashx?la=en")
pdf <- capture.output(cat(pdf))
  • 1

    the PDF link has a file with several contents besides the table, edit the pdf with only the page that interests you or convert the PDF to an Excel spreadsheet, it is much easier.

  • 2

    Try the package instead tabulizer, function extract_tables. But the pdf actually has 6 datasets in the same table 1. These datasets are separated by blank lines. The result of tab <- extract_tables("https://etc"); tab <- tab[[1]] must be processed to obtain one or more tables.

  • Thanks for the tips! I’ve tried the package tabulizer, he extracts the table, but ends up recognizing one less column and do not know how to fix this problem.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.