Check if URL exists

Asked

Viewed 3,408 times

1

I would like to know how to valid (know if there are) URL’s of social networks, I am using the AngularJS, AJAX and http requests but I can get the status of a URL I created in a mock, but I can’t verify a URL if it’s external.

$http({
  method: 'GET',
  // url: 'http://private-e5528d-alugueme.apiary-mock.com/api/v1/categories/1'
  // url: '/'
  // url: 'https://twitter.com/pmargreff'
}).then(function successCallback(response) {
  console.log(response);
}, function errorCallback(response) {
  console.log(response);
});

When my URL is the first one that comes in the console response is:

Object { data: Object, status: 200, headers: fd/<(), config: Object, statusText: "OK" }

When I try to get a public URL like Stack Overflow or my own twitter profile the answer is as follows:

Object { data: null, status: -1, headers: fd/<(), config: Object, statusText: "" }

Despite this, if I check in the Network tab of my browser the URL has been checked and its status there is 200 when there is, or 404 if the URL is invalid. First I thought it was some kind of lock itself AngularJS, and tried the validity via AJAX as follows:

$.ajax({
  // url: 'http://private-e5528d-alugueme.apiary-mock.com/api/v1/categories/1',
  // url: '/',
  // url: 'https://twitter.com/pmargreff',
  type:'HEAD',
  error: function()
  {
    alert('não existe');
  },
  success: function()
  {
    alert('existe');
  }
});

I had the same type of response, when I try to valid my pool I get, already external url I can’t and my network tab continues to show correct results.

I tried with predecessors and the result was the same:

$.get(url)
    .done(function() { 
      alert('existe');
    }).fail(function() { 
      alert('não existe');
    })

Am I making a mistake in the code or are errors caused by the sites themselves that block the request ? And if it’s the second option, I have a way around that?


I’m trying not to use the facebook and twitter Apis, so I’d like an answer that doesn’t use those options.

2 answers

3


Directly you can not do this.

All websites in modern browsers can only have requests for themselves. To allow another site to connect to yours you need to define the Access-Control-Allow-Origin, can read more about it here.


What solution?

The solution would be to add the necessary Header (Access-Control-Allow-Origin: *), for example.

PHP:

header('Access-Control-Allow-Origin: *');  

But... As you noticed: who should enter the Header is who is requested, so you should change the Twitter codes and add the Access-Control-Allow-Origin. It should not be possible!


Game over?

Not exactly. This limitation only occurs in the client-side, meaning your site cannot connect to another. But, this does not prevent your site server from connecting to another server.

So you can do this:

PHP:

function verificarURL($url) {

    // Inicia CURL
    $curl = curl_init($url);
    curl_setopt_array($curl, [
            // Permite obter retorno:
            CURLOPT_RETURNTRANSFER => 1,

            // Define para retornar false se for <200 >=400:
            CURLOPT_FAILONERROR => 1,
            
            // Autoriza seguir o `Location`:
            CURLOPT_FOLLOWLOCATION => 1,

            // Limita o número de `Location:` a ser seguido:
            CURLOPT_MAXREDIRS => 2,

            // Adiciona o `Referer` baseado no Location:
            CURLOPT_AUTOREFERER => 1,

            // Verifica o SSL do website (Previne contra MITM):
            CURLOPT_SSL_VERIFYPEER => 1,
            CURLOPT_SSL_VERIFYHOST => 2,
            
            // Define o local do CA (as autoridades confiaveis, pode baixar em https://curl.haxx.se/ca/cacert-2017-06-07.pem):
            CURLOPT_CAINFO => __DIR__ . DIRECTORY_SEPARATOR . 'cacert-2017-06-07.pem',

            // Limita para protocolos HTTP/HTTPS (Previne contra outros protocolos, como `file//`, inclusive em redicionamento):
            CURLOPT_REDIR_PROTOCOLS => CURLPROTO_HTTPS | CURLPROTO_HTTPS,

            // Limita para TLSv1.2:
            CURLOPT_SSLVERSION => CURL_SSLVERSION_TLSv1_2,

            // Define um timeout em segundos (contra Slow HTTP Attack e afins):
            CURLOPT_TIMEOUT => 4,
            CURLOPT_CONNECTTIMEOUT => 2,
            //CURLOPT_LOW_SPEED_LIMIT =>
            //CURLOPT_LOW_SPEED_TIME =>
        ]
    );
    
    // Executa a requisição:
    $dados = curl_exec($curl);
    // Fecha o CURL
    curl_close($curl);

    // Se o HTTP CODE for menor que 200 e maior que 400 ele será false;
    return $dados !== false;
}

verificarURL('http://seusite.com');

/!\ SECURITY:

Most Curl security issues have already been fixed and are minimally safe for public use, where the user informs the $url.

However, there are still some problems. Your IP (from the server) will be exposed to the target of Curl, obviously this may be a problem if you use Cloudflare and the like, which hide the IP of your server. Another problem is that the redirect (and also the domain itself) can point to another local network device, for example https://malicioso.com sends a Location: 192.0.0.1, your code will follow and will say "192.0.0.1" exists, which may be relevant.


There is another alternative?

Unfortunately you need to make this request on server-side, can not make the client do it.

But... You can "outsource" the service using Yahoo!

Yahoo has a resource called Xpath, at least here’s what I found on this, you can see at https://developer.yahoo.com/yql/guide/yql-select-xpath.html. Remember that Xpath is not Yahoo, but Yahoo allows you to do Xpath, if you want to read more about it here too, in short Xpath allows XML manipulation.

In this case you can make a request using the following query:

SQL/YQL:

This API has been discontinued, you must use the htmlstring, see here, but it’s very unstable.

select * from html where url="http://seusite.com"

So this will return (because your.com site exists.!):

{"query":{"count":1,"created":"2016-04-18T12:16:44Z","lang":"pt-BR","results":{"body":{"script":{"language":"JavaScript","src":"js/redirect-min.js","type":"text/javascript"}}}}}

The results will tell if it exists or not.

Therefore:

$(':button').click(function() {
  var url = $(':input').val();

  $.ajax({
    url: 'https://query.yahooapis.com/v1/public/yql?q=select * from html where url="' + url + '"&format=json',
    type: "get",
    dataType: "json",
    success: function(data) {
      alert(data.query.results != null ? 'Existe' : 'Nao existe');
    }
  });
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" value="http://stackexchange.com">
<button>VERIFICAR</button>

This will perform the query shown above and compare the result, if it’s null is because there is no.

However, this has false-negatives, such as the https://facebook.com, which records as non-existent. This would not occur in the first solution.

  • Well formulated answer, alternatives, demonstrative example. Grateful :)

0

For security reasons the browser does not allow connections to other servers. Just as by default, servers will deny access to your resources in this way.

1) If your the website you want to check is your property, you could enable the CORS in it.

2) You could use JSONP providing access to external resources.

3) These limitations are restricted to the client-side, you could through ajax, access a resource in your own domain, sending by parameter the url you want to check, and this back-side feature would check the url and return a true or false for example:

$.ajax({
  url: 'http://seudominio.com.br/ValidadorSite/',
  method: 'get',
  data: {
    urlParam1: "sitequevocequerverificar.org.br"
  },
  success: function(data) {
    if (data.url1) {
      alert("Site 1 existe");
    }
  }
});

C#

URL url;
url = new URL("urlParam1");
HttpURLConnection con = (HttpURLConnection ) url.openConnection();
System.out.println(con.getResponseCode());

The good thing about this last practice is that you could send a set of url to be checked and perform a specific action on top of each return "exists or does not exist".

Browser other questions tagged

You are not signed in. Login or sign up in order to post.