Find out python link

Asked

Viewed 1,067 times

2

I’m making a Scrapping web in python and sometimes I come across some links and/or Buttons that are not with the real address of the url so you will be redirecting if you click.

In this case, if I click, download a PDF file, but I just want to get the URL of the file.

In the case of the link: sometimes a javascript appears

In my current problem: is a button without form..

*** When I download and see what the url is, I can’t access it directly (by copying and pasting in the address bar of browser)

I’m using Selenium and requests!

Does anyone have any idea what this is and how to fix it?

  • Share the code you’ve already written

1 answer

1

I recommend using Beautiful Soup to manipulate HTML
You can try using JSON tbm

From what I understand you have to have an HTTP base
can try to install a firebug addon for firefox and analyze behavior
when you click on the PDF button to download it you can analyze what was done in firebug. for example I click on a button to download and with firebug open I can see that a POST was made and I took this POST to understand what I need to manipulate when making a GET.
www.meusite.com/list-pdf
clickei on the download button
in firebug:
POST download.do? id25/pdf1/file.pdf
all I have to do is use:
www.meusite.com/download.do? id25/pdf1/file.pdf

  • I liked the firebug tip! I already use Beautiful Soup but it does not show the link either... The link I want, It is more or less like this on the site: <input type="button" value="Visualizar" onclick="javascript:__doPostBack('ctl00$ConteudoPagina$VerPub','Select$0')" class="button"> I realized that when the click happens it generates a new javascript script on the page that defines one of the parameters of the URL. I was trying to do by request (in this case are posts) but it’s not happening.. I think I’ll have to do everything in Selenium anyway :/

  • you have the website link !?

  • I do, it’s this: http://www.santoandre.sp.gov.br/publicacao/edica/consultaedicao.aspx Look at the "Launch" link and don’t show me the link, just a js with a doPostBack that I don’t know what it is.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.