Web scraping with pure Javascript

Asked

Viewed 695 times

4

I want to make a web scraping that reads an XML page and takes a certain value that is in "name", but, I do not know exactly if it is possible - I only found in how to do with Nodejs -, is it possible to do with pure JS? No external libraries and/or frameworks?

2 answers

3

There is nothing to stop you from downloading an XML and analyzing its content. The only problem doing this in a browser would be the same origin policy, that would prevent you from accessing arbitrary addresses via Javascript.

  • It will be for an addon script, when I understand better, I will try and if it does not work, I do gambiarras, it is interdisciplinary work, I think it will not apply.

  • What exactly are the requirements?

  • My idea for addon `and read the dates and titles of the posts, which are on this site: http://feed43.com/ufs.xml and, if you have any on the date that the user will select, show the title and link.

  • It’s... There are no safe ways or ways that don’t involve creating a proxy server to do what you want to do in a browser. I think it’s time to change plans. :(

  • @Pabloalmeida Extensions are not limited to the same origin policy https://developer.chrome.com/extensions/xhr

3

It is possible yes. For example:

var parser = new DOMParser();
var tmplXML = document.getElementById("tmplXML");
var blobXML = new Blob([tmplXML.innerHTML], { type: 'text/xml' });
var urlXML = URL.createObjectURL(blobXML);

var httpRequest = new XMLHttpRequest();

httpRequest.open("GET", urlXML, true);
httpRequest.onreadystatechange = function(){
  if (httpRequest.readyState == 4){
    if (httpRequest.status == 200) { 
      var xml = httpRequest.responseXML;
      console.log(xml.getElementsByTagName("p")[0].innerHTML);
    }
  }
}
httpRequest.send();
<template id="tmplXML">
  <?xml version="1.0" encoding="UTF-8"?>
  <text>
    <p>Lorem ipsum dolor sit amet</p>
    <p>Nihil cumque vero</p>
    <p>Impedit quibusdam fuga</p>
    <p>Magnam ad maiores omnis</p>
    <p>Aliqua omnis laborum</p>
  </text>
</template>

However, as Mr Pablo has already said, it may be that the policy of the same origin makes his work more difficult.

Source: Ajax reading XML

  • 1

    great example, however I took the liberty to join your two snippets, as I started to create an XML in memory so that your AJAX request does not return an error.

  • I will start studying these methods, if I succeed, finish the post today. + D

Browser other questions tagged

You are not signed in. Login or sign up in order to post.