Remove PDF file table in R

Question

Remove PDF file table in R

Asked 5 years, 6 months ago

Viewed 113 times

0

I have a PDF file that contains a table, I need to remove it and turn it into data.frame. I’m trying to use the package pdftools.

I’m trying to use the following code, which ends up returning me to the table, but I don’t know how to format it to become one data.frame

library(pdftools)

pdf <- pdf_text("https://www.imf.org/~/media/Files/Publications/WEO/2020/January/English/text.ashx?la=en")
pdf <- capture.output(cat(pdf))

1

the PDF link has a file with several contents besides the table, edit the pdf with only the page that interests you or convert the PDF to an Excel spreadsheet, it is much easier.

– Izak Mandrak

2020/01/23 at 19:57
2

Try the package instead tabulizer, function extract_tables. But the pdf actually has 6 datasets in the same table 1. These datasets are separated by blank lines. The result of tab <- extract_tables("https://etc"); tab <- tab[[1]] must be processed to obtain one or more tables.

– Rui Barradas

2020/01/23 at 23:32
Thanks for the tips! I’ve tried the package tabulizer, he extracts the table, but ends up recognizing one less column and do not know how to fix this problem.

– Alexandre Sanches

2020/01/24 at 13:14

No answers

Browser other questions tagged r pdf

You are not signed in. Login or sign up in order to post.