How to read content from a Javascript site?

Asked

Viewed 4,778 times

7

I would like to know how to read content from other web pages using javascript or some library only.

For example, from a remote news site, in this case: www.terra.com.br.

And I would like to create a webapp to read the latest news presented on this page, and it has no RSS and etc.

I know the problems that could occur if the pages I’m feeding change the layout or anything like that.

There’d be some way to do it?

  • Which OS? It has to be webapp anyway?

  • whether it makes the OS right? I can make a web page and run in some webview.

  • 1

    That was my question, whether it was for one specific or for all.

  • Okay, all right, all right :)

1 answer

10


You can directly make an AJAX request to the server, like this:

xmlhttp = new XMLHttpRequest();

xmlhttp.onreadystatechange = function() {
    if (xmlhttp.readyState == 4 && xmlhttp.status == 200) {
        var html = xmlhttp.responseText;
        processPage(html);
    }
}

xmlhttp.open("GET", "http://www.terra.com.br/", true);
xmlhttp.send();

Note that depending on the server you may have problems with cross-origin requests. One simple way to circumvent this is to use a service like whateverorigin.org. In that case it stays that way:

xmlhttp.open("GET", "http://whateverorigin.org/get?url=" + 
                    encodeURIComponent("http://www.terra.com.br/"), true);
  • ok perfect... but how do I manipulate the content? In case I want to copy content from some DIV and put in my webapp for example, all Div news content.

  • 1

    Another site that also allows this is http://www.corsproxy.com/, the way to use it is even easier. xmlhttp.open('GET', 'http://crosproxy.com/www.terra.com.br/'), true);

  • @Arilsoncarmo Implement your processPage(). There you can use some html parser or (if it is something very simple, do not recommend) use regex even. It depends on what you want to extract.

  • In this case I wanted to extract all the films from this address http://www.jcnet.com.br/cinema.php, take the image and the schedules. Would you have any example Jsfiddle? If you could of course.

  • @Guilhermebernal We are talking about Javascript, not another language. Extracting HTML from a text is a matter of two lines in javascript.

  • Note that inserting html into the DOM to make the browser process you will cause execution of scripts and the like. Not a good solution.

  • Yes exactly! javascript we’re talking about! I want to extract ALL content from the page and manipulate it. I don’t understand how I do it... OK.. do the Ajax request can have the html and the rest? For example how I take a certain content from a tag?

  • @Guilhermebernal It does not run until it enters the DOM, but it processes. // I remembered now: first version of a website I made using it. Tip: don’t use, a change on the site and your program to.

  • ok.. but I didn’t see a way to manipulate the content... ajax request? ok. But how do I manipulate what I really need?

  • 1

    Look at the example: http://jsfiddle.net/TG8Bm/

  • 1

    BTW has been researching how jQuery handles HTML strings without processing scripts: .parseHTML, .buildFragment. It’s similar to my example, only using documentFragments and with bug fixing.

  • Our was looking now... what a horrible thing the page is... I don’t know if the movie part is an iframe or what it is.. but I can’t extract the content... the way you did it is very correct... thank you... but it would have to be the content of cinema.php... and it won’t! Bizarre!

  • ah I understood what they did... every new movie is a table '-'

Show 9 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.