How to get all the href from another external page

Asked

Viewed 234 times

0

I would like to access an external page(google for example), and inside it running a script to capture all hrefs, I read about the fact that browsers don’t allow using GET to get html, but I believe there must be a way to do this, I have this code for now. I read about the googlebot, and I’d like to try doing this on JS.

$.ajax({   
    url: 'http://google.com',
    type: 'GET',
    success: function(res) {
        $('a').each(function() {
            alert($this.href);
        });
    }
});
  • 1

    "but I believe there must be a way to do this" - if the site does not want to allow this (closed CORS) then does not give, for security/privacy reasons. The only option is to do on the server.

  • There is no way to simulate a user, or browser, since I can access normally, there is a way to simulate this ??

  • 1

    If there was a way to simulate the user then you could trigger clicks on ads or youtube Likes via ajax and cheat the system. It’s to prevent that. Take a look at CORS https://answall.com/a/145493/129

1 answer

0

You will not be able to do this only with the frontend for the reasons explained in the comments of its publication. Using nodejs you can do this. In the code below I get all tags a of a website and then get the content of the href attribute of each.

var request = require('request');
var cheerio = require('cheerio');
var searchTerm = 'screen+scraping';
//URL DO SITE 
var url = 'http://www.meusite.com.br';

request(url, function(err, resp, body){
  //CARREGA O HTML
  $ = cheerio.load(body);
  links = $('a'); //Pega todas as tags a, exatamente como o jquery

  //Passa por todas as tags obtidas no trecho acima.
  $(links).each(function(i, link){
    //Na tag pega o atributo href e imprime no console.
    console.log($(link).text() + ':\n  ' + $(link).attr('href'));
  });
});

Browser other questions tagged

You are not signed in. Login or sign up in order to post.