How do I get the final URL of a JS redirect?

Question

How do I get the final URL of a JS redirect?

Asked 8 years, 3 months ago

Viewed 697 times

3

I was trying to make a code to get the final url of redirecting some links, I managed to do for most of the links I needed, incoming for this I could not: https://redir.lomadee.com/v2/987163d4

All other links worked with urllib2 or requests.

    s = requests.Session()
    r = s.get(lili[i], headers=headers)
    if lili[i] != r.url:
        print i, r.url

or

    response = urllib2.urlopen(lili[i])
    if lili[i] != response.geturl():
        print i, response.geturl()

Does anyone know how to fix this? I wouldn’t want to use Selenium for that, it’s not feasible (too long).

1 answer

Browser other questions tagged python redirecting redirect

You are not signed in. Login or sign up in order to post.

by Miguel • **29,306** points · Answer 1 · 2017-04-23T21:00:51+00:00

Curious this strategy in this kind of services, is very well played to avoid precisely what you want to do.

Here’s what’s going on. It seems, but it is not a redirection (code 301), that is, when analyzing the body of the answer I could see (luckily) what was going on:

setTimeout(location.href='https://www.walmart.com.br/dvd-automotivo-pioneer-avh-3880-com-usb-frontal-e-tela-de-7/3820066/pr?utm_term=22696088&utm_campaign=lomadee&utm_medium=afiliados&utm_source=lomadee&lmdsid='+new Date().getTime().toString().slice(8,12)+'29157007', 500);

Now this is a redirect but it is only delegated after the page is already on this side (client side) and the javascript is interpreted, hence with requests you can not see this to be performed, this serves to "guarantee" that the request was made from a browser.

Here’s a workaround to get the url, in this specific service, where it goes next with requests (with urllib2 would be the same thing):

import requests, re

req = requests.get('https://redir.lomadee.com/v2/987163d4')
redi_url = re.findall('(?<=location.href=["\'])https?://.+?(?=["\'])', req.text)

if redi_url:
    print(redi_url[0]) # https://www.walmart.com.br/dvd-automotivo-pioneer-avh-3880-com-usb-frontal-e-tela-de-7/3820066/pr?utm_term=22696088&utm_campaign=lomadee&utm_medium=afiliados&utm_source=lomadee&lmdsid=

Here I believe that some colleagues(s) who are more suited to regular expressions can help me, in this context does not seem to be the best way to use regex (total body of response here, the setTimeout that redirects is really close), and feel free to edit the answer.