Detect if URL is a URL shortener

Asked

Viewed 177 times

-1

I have a text input in an HTML form where the system user inserts a link. This link is copied from the URL. I need any kind of link accepted, except for those shortened Urls, like "goo.gl.". How can I create a function to validate and verify this? It is possible?

  • 1

    Do you have a list of shortened url’s that you won’t accept? If not, the first step is to list all you find so you can create an efficient regular expression.

  • You can also create a rule: "domain with at least x characters" which are in the format "domain/key" = lock. You can look for other criteria, observing the shorteners that exist, to avoid "false positive" and have a manageable and efficient validation.

  • 1

    Another way is to send a request to the posted URL, if it is redirected, you can get the real URL, or block, because it is not a direct link.

  • I agree with the replies of @user5978 . The big problem is that maybe the site itself that the user posts the link does a redirect. In this case it could be checked whether the redirect was made to the same domain, but it would still be risky. Let’s say I entered the url: http://foo.com/bar that redirects users from Europe to http://foo.com.eu/bar, the system would block this type of url, and this is a type of redirect that even google does to handle the access of users from different countries

  • Yes, if you opt for a generic solution, the system will have to have a certain "ineligibility" to deal with false positives. That’s the "price" for not having to constantly update the list of shorteners, hehe.

  • This site knows whether or not it is a "shortened" URL. http://www.checkshorturl.com/expand.php. This is not an API, but your script can submit the URL, and analyze the return.

Show 1 more comment

2 answers

2


A suggestion is to request the URL. If it detects redirect, it may be a "shortener", but it may also not be and would then be preventing legitimate Urls.

From this point, compare with the number of characters. If it is less than X it increases the probability of being a shortener and discards larger Urls that redirect but are not shorteners.

It looks good so far, but in practice, it doesn’t work.

Let’s go to a real example:

tinyurl.com <- is a URL shortener
globe with. <- is smaller than tinyurl.com but is not a shortener.
R7.with <- is even smaller..

So the logic of comparing the size of the URL or domain is not valid.

Perhaps you should ask yourself, what is the purpose of detecting if it is a URL shortener?

If it’s because they redirect a URL, then it could just validate the redirect, regardless of whether it’s a shortener or not.

  • As already commented here, solution has, even if it is in brute force, which is not such a bad solution, since there is an extremely high number of sites of the type. However, a generic solution would be best, but there are some problems regarding the criteria to be used, as you yourself noted.

  • The amount of shorteners is enormous. Underneath, there must be hundreds of thousands. To say that doing X or Y is a solution to something that is impossible to do, is at least leading to error. An example of "error" in stackoverflow itself: http://stackoverflow.com/questions/9557116/regex-php-code-to-check-if-a-url-is-a-short-url

  • This proves that if it is not "feasible" it is at least possible to implement: http://www.checkshorturl.com/expand.php.

  • Note that they are very distinct terms. To say that "it is not feasible" does not mean "it is not possible". This site that posted only expands urls that are redirected. It doesn’t mean that it actually "detects" a url shortener... If you prove here that it is feasible, I myself will give a negative on what I posted.. rsrsr if possible .. rsrs

  • Are you saying that it is impossible to validate if a URL is allowed, based on a list of Urls? I don’t know, I believe that many applications work that way. It’s like validating if an email is already registered to a user, based on a list of other entries. The list needs to be extremely large, so that it makes validation unviable, yet there is the possibility of using indexes. Anyway, what makes this a not very good solution, is the work of "hunting" new shorteners.

  • Again, I have not said at any time that it is impossible. Only that it is UNWORKABLE. Again, the term "unworkable" is different from "impossible"..

  • I said that the site proves that it is possible, not that you said it is impossible :D. As for the issue of feasibility, I don’t see what makes it difficult to register a list of urls in a database, and when a user provides a URL, do a search, to check if it is in the "black list". I venture to say that many sites do similar things.

  • There are several sources for this: http://bit.do/list-of-url-shorteners.php https://www.techmaish.com/list-of-230-free-url-shorteners-services/ https://www.hashtags.org/platforms/twitter/list-of-url-shorteners/

  • 2

    Just to finish and make it clearer. It’s impossible to create a list of shorteners because you don’t even know the amount. The world is not limited to sites in the US, Europe and Brazil. And even in these three regions there are tens, if not hundreds of thousands of shorteners and every day there are 1 or 20 new ones.. Anyway, there is no pattern, there is no control.. Nothing.. -> unviable.

  • 2

    Are these lists of links you’ve been through a joke? rsrsrs As you said in the comment above.. there may be hundreds of thousands.. It’s not 20, 40, 200, 300... it’s more than 100, 300,000. in the West alone.

  • This number is a bit exaggerated. But even so, in an indexed table, it would work perfectly. But in reality, only more popular services would be needed, If another URL starts to appear frequently in the entries, just add it also in the list. As I have no idea, what is the context of the problem, there is no way to state whether any "mistakes" would be acceptable.

  • I agree with @Danielomine. Creating a table is not a definitive solution, but it is not discarded. It can be used together with another. Example: Javascript has a short list of banned shorteners, which will prevent "normal users from publishing", while on the server side the same list may exist, but that is not the focus. The server side can request the URL through Curl. Facebook does this, Google does this. This way he could check if the goo.Gl/xxxxx was for meusite.com, he would get the last one location obtained. In addition, I would seek to http-equiv="refresh" and etc.

  • PS: This method also prevents you from creating your pseudo-shortener site. Imagine that meusite.com is banned and cannot share. Then I create the goo.Gl for him, the goo.Gl is on the list and will already be banned, saving processing. If I create meusite2.com, redirecting to meusite.com, it will also be prevented, because the system does not depend on the list.

Show 8 more comments

2

What you can do is insert into a array or MySQL (depending on the type of language you will use) some of the URL’s that are shortened. For example goo.Gl, tinyurl.com, etc..

A code made with PHP for example:

    <?php
    $urls = array('goo.gl', 'tinyurl.com'); //Insere as URL's que são encurtadoras

    $url_input = trim(strtolower($_POST['url'])); //Pega a URL que o usuário informou

$filtra_url = parse_url($url_input);

$url = $filtra_url['host'];


    if(in_array($url, $urls)){
    echo 'Você informou uma URL encurtada. Não permitimos isso!';
    }else{
    echo 'URL autorizada';
    }
    ?>

Browser other questions tagged

You are not signed in. Login or sign up in order to post.