How do I generate an excel file with the data obtained from a webcrawler?

Asked

Viewed 343 times

1

I’m making a web Rawler that should extract the name and price of iphones that appear in search on the Amazon site and generate an xlsx file with this data. However, I am unable to generate the xlsx file with this data.

I’m trying to do this through the excel4node module, but because of my little knowledge I’m having problems to fit the data into the spreadsheet.

var request = require('request-promise');
var cheerio = require('cheerio');
var excel = require('excel4node');

var wb = new excel.Workbook();
var ws = wb.addWorksheet('AMAZON');

var url = 'http://localhost:81/index.html';
var newEncode = encodeURI(url);

const crawl ={

    uri: (newEncode),
    transform: function (body){
        return cheerio.load(body);
    }

}

request(crawl)
    .then(($) =>{
        const produtos = []
        $('div[class="sg-col-20-of-24 s-result-item sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28"]').each((i, item)=>{
            const produto = {
                nome: $(item).find('span[class="a-size-medium a-color-base a-text-normal"]').text(),
                preco: $(item).find('span[class="a-price-whole"], span[class="a-price-fraction"], span[class="a-color-base"]').text(),
            }

            // console.log(item);

            ws.cell().string(); //não sei quais parâmetros adicionar aqui

            wb.write('PlanilhaAmazon.xlsx');

        });

    })
    .catch((err) => {
        console.log(err);
    })
  • Is the data coming in correctly? I need to know before mounting an example

  • Yes, Anderson. The data is coming in correctly.

1 answer

1


From a look at the example, I first changed the global variables var for const avoid using var. After that in ws.cell(), receives the first parameter is the number of the line I placed index + 1 and the second is the column. In string you place the text you want to write.

const request = require('request-promise');
const cheerio = require('cheerio');
const excel = require('excel4node');

const wb = new excel.Workbook();
const ws = wb.addWorksheet('AMAZON');

const url = 'http://localhost:81/index.html';
const newEncode = encodeURI(url);

const crawl ={

    uri: (newEncode),
    transform: function (body){
        return cheerio.load(body);
    }

}

request(crawl)
    .then(($) =>{
        const produtos = []
        $('div[class="sg-col-20-of-24 s-result-item sg-col-0-of-12 sg-col-28-of-32 sg-col-16-of-20 sg-col sg-col-32-of-36 sg-col-12-of-16 sg-col-24-of-28"]').each((i, item)=>{
            const produto = {
                nome: $(item).find('span[class="a-size-medium a-color-base a-text-normal"]').text(),
                preco: $(item).find('span[class="a-price-whole"], span[class="a-price-fraction"], span[class="a-color-base"]').text(),
            }

            // console.log(item);

            ws.cell(i + 1, 1).string(produto.nome); //O primeiro parametro é a linha da planilha o segundo é a coluna
            ws.cell(i + 1, 2).string(produto.preco);
            wb.write('PlanilhaAmazon.xlsx');
        });
  • It worked very well! Thank you very much!

  • There’s just one more thing: Did you see that I’m making the request to my local server? So... I was supposed to do for the Amazon site, but the program is not accepting the URL, for some reason the program does not return the body of the page to do the research, the return comes in random characters. You know how to solve this problem????

  • As well as random characters?

  • Your der console.log(body), instead of returning the body of the request made it does not return. Returns some very random special characters that you don’t have on your keyboard (I don’t know the name kkkk)

  • Is it the same code? Just change the url? or have you modified something?

  • It’s the same code. It’s just not accepting the site url

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.