Crawler to scan websites

Asked

Viewed 411 times

-2

Talk Guys, all right?

I would like to create a Crawler to scan day on some specific sites and bring me on a spreadsheet or something like the home stories of those sites. In case I’d like to do a sweep on news portals.

I am layman on the subject, so I would like to know what I need resource (Database, Server, something of the kind) to create and what is the best language for developing this type of demand.

Thank you very much.

1 answer

1


You can use the following resources:

1- Python language for Crawler using one of these libraries (Scrapy or Beautifulsoup);

2- A database of your choice (Mysql, Postgresql, ...), if you are aware of a database, I suggest you use a non-relational one (Mongodb, Cassandradb, ...), because depending on the amount of data, it works in a more agile way;

3-Deploy on a server for the program to run 24 hours a day (Heroku, for example);

It is not crucial, but beyond the database, if you want to store the information in spreadsheet, it is very simple to do with Python using the library openpyxl.

If you need a reference, follow a personal project of my own on Github that deals with exactly this subject: https://github.com/VictorAlessander/Smith

Browser other questions tagged

You are not signed in. Login or sign up in order to post.