Get news through the URL

Asked

Viewed 67 times

0

I recently saw a video on youtube, where the guy through the URL of a news item Uol, globe and other sites of the type, he recovered the title of the news, the body and the images along with the formatting used. The application was written with the framework LARAVEL.

What kind of resources do you use to make an application like this ? has a library in LARAVEL that facilitates this ?

  • 1

    I don’t know, but with php, through the curl or file_get_content works

  • 1

    Which video?????

1 answer

2


To do this you will inevitably need to make an HTML Parsing of the page. You will need to use a DOM parser to do this.

What DOM Parser will do is take the HTML you downloaded and turn it into a DOM object in which you can browse and get the information you need.

I have particularly done some projects of this kind, and the biggest problems you will find are basically two:

1) Each site (and sometimes different sessions or materials of the same site) has a different HTML structure, making you have to make different gift maps for each session / site.

2) Website Htmls (even big ones like UOL, land) have poorly formatted, error-prone htmls. This can eventually make a mistake when making the parse of the gift, which will complicate your life.

The key is to find a parser that pre-parses the html to correct errors, or is error-tolerant.

The last time I worked on a project like this, I made a little robot with java, because it has a library ready in Java that is perfect for this, that you can search the data in HTML as if it were jquery. It’s pretty cool!

https://jsoup.org/

Hug and good luck!

  • 1

    Thank you very much.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.