2
Is there a way for me to search an excerpt, or a word in a PDF that’s on the internet? I researched about CURL, some libraries, but found nothing. More or less this way:
I have a website and in it the guy would insert a name for example: John. After that my website script would check inside the file: http://www.bu.ufsc.br/ArtigoCientifico.pdf there is the name John, and I would return to me whether or not there is.
Is there a way to do that? Does anyone know a library or can give me a north?
That’s right, thanks man!
– João Neto
Just one more question, let’s assume that I have a page, but the PDF link is not clear on it, for example: www.bu.ufsc.br/downloadArtigo.php? id=19898454 and only by clicking on this link that the pdf is downloaded. Whether I use file_get_contents works the same way or not?
– João Neto
file_get_contents will work the same way in this case, because it collects the content without caring about the extension.
– Jairo Correa
Hmmm, got it, thank you very much/
– João Neto
Now I’m going to ask you another question, if you have an answer, I’ll create the question and you answer it. I noticed on the site that the PDF download ID is generated automatically, and I didn’t want to have to insert it into the system every time. So is there any way I can put a file_get_content only with the link until the id= and it download the same way, some kind of gambiarra?
– João Neto
In case the id identifies which document to download, you need to know the id, but if there is a page that contains a link to the article and it is not a page generated dynamically by javascript, you can get the content using
file_get_contents
on the page with the link and find the link withpreg_match
orpreg_match_all
or if it is a very complex page using https://code.google.com/p/phpquery/– Jairo Correa
Gee, that would be ideal, but the damn page uses Javascript.
– João Neto
Since the page is mounted by javascript I don’t know how to do it with php, I only know http://casperjs.org/ in conjunction with http://phantomjs.org/ they are not for php but you can create scripts from them and run in php with functions like
exec
, however its hosting should allow adding and running programs, phantomjs is a standalone base program without graphical interface that loads pages and executes javascript, and casperjs is a frontend for phantomjs with facilities to do things.– Jairo Correa
I’ll take a look. Thanks!
– João Neto