2
I am doing a job for college and would like to get the income and audience of each game of the Brazilian championship of recent years. CBF graduates into a series of links, an example is the Borderline. For other similar problems I use the package tabulizer
, as in the code below
library(tabulizer)
url <- 'https://conteudo.cbf.com.br/sumulas/2014/1421b.pdf'
d <- extract_tables(url, encoding = "UTF-8")
For tables created in PDF it works perfectly, but for this type of pdf (which was probably printed, scanned and then saved in pdf) does not work, the code returns a list with 0 elements. Any ideas or packages I can use?
@Flavio Silva
in this case the problem is not to extract data from a pdf, but to extract data from the image. Note that there is no structure in this pdf, only the image. You need some program that extracts text from images.– Flavio Barros