2
Is there a way for me to search an excerpt, or a word in a PDF that’s on the internet? I researched about CURL, some libraries, but found nothing. More or less this way:
I have a website and in it the guy would insert a name for example: John. After that my website script would check inside the file: http://www.bu.ufsc.br/ArtigoCientifico.pdf there is the name John, and I would return to me whether or not there is.
Is there a way to do that? Does anyone know a library or can give me a north?
That’s right, thanks man!
– João Neto
Just one more question, let’s assume that I have a page, but the PDF link is not clear on it, for example: www.bu.ufsc.br/downloadArtigo.php? id=19898454 and only by clicking on this link that the pdf is downloaded. Whether I use file_get_contents works the same way or not?
– João Neto
file_get_contents will work the same way in this case, because it collects the content without caring about the extension.
– Jairo Correa
Hmmm, got it, thank you very much/
– João Neto
Now I’m going to ask you another question, if you have an answer, I’ll create the question and you answer it. I noticed on the site that the PDF download ID is generated automatically, and I didn’t want to have to insert it into the system every time. So is there any way I can put a file_get_content only with the link until the id= and it download the same way, some kind of gambiarra?
– João Neto
In case the id identifies which document to download, you need to know the id, but if there is a page that contains a link to the article and it is not a page generated dynamically by javascript, you can get the content using
file_get_contentson the page with the link and find the link withpreg_matchorpreg_match_allor if it is a very complex page using https://code.google.com/p/phpquery/– Jairo Correa
Gee, that would be ideal, but the damn page uses Javascript.
– João Neto
Since the page is mounted by javascript I don’t know how to do it with php, I only know http://casperjs.org/ in conjunction with http://phantomjs.org/ they are not for php but you can create scripts from them and run in php with functions like
exec, however its hosting should allow adding and running programs, phantomjs is a standalone base program without graphical interface that loads pages and executes javascript, and casperjs is a frontend for phantomjs with facilities to do things.– Jairo Correa
I’ll take a look. Thanks!
– João Neto