Book query by ISBN and save result in txt via console.log

Asked

Viewed 3,054 times

3

Context:
I have a list of ISBN (International Standard Book Number) with about 100 records, all Brazilian books, and wanted to get the information about the book in a faster way, but it doesn’t have to be in real time.

To search the book via ISBN, access the ISBN Brasil on the part of research and inform the ISBN and return me the data, and then fill in a spreadsheet, to be very basic.

To do this search a little faster, I just need to type the Captcha once and the rest I refer to the url below, only changing the ISBN:

http://www.isbn.bn.br/website/consulta/cadastro/isbn/9788566250299

Need:
Based on that, I had the idea, informs 1 time the captcha and then I can search any book, changing the ISBN of the URL, but I wanted to do it automatically, and save in a TXT even separated by space the values of the fields.

Via console.log I get the information I need, in a very simple way, but it works:

    String.prototype.trim = function() {
        return this.replace(/^\s+|\s+$/g, '');
    };


var livro = '"' + document.getElementsByClassName("conteudo")[0].getElementsByTagName('div')[5].childNodes[3].nodeValue.trim() + 
            '" "' + document.getElementsByClassName("conteudo")[0].getElementsByTagName('div')[6].childNodes[3].nodeValue.trim() + 
            '" "' + document.getElementsByClassName("conteudo")[0].getElementsByTagName('div')[7].childNodes[3].nodeValue.trim() + 
            '" "' + document.getElementsByClassName("conteudo")[0].getElementsByTagName('div')[8].childNodes[3].nodeValue.trim() + 
            '" "' + document.getElementsByClassName("conteudo")[0].getElementsByTagName('div')[10].childNodes[3].nodeValue.trim() + 
            '" "' + document.getElementsByClassName("conteudo")[0].getElementsByTagName('div')[12].childNodes[3].nodeValue.trim() + 
            ';' + document.getElementsByClassName("conteudo")[0].getElementsByTagName('div')[12].childNodes[5].nodeValue.trim() + '"';

console.log(livro);

Return:

"978-85-66250-29-9" "Começando com o linux: comando, serviços e administração" "1" "2013" "135" "Adriano Henrique de Almeida (Organizador);Paulo Eduardo Azevedo Silveira (Organizador);Daniel Romero ( Autor);"

Problem:
I have to do book by book, and I’m sick of doing this :/, I wanted to know if you have some simple way to automate this, maybe you can walk through an array as an example below, and return the information even if it is in a simple TXT with spaces as return above.

Thank you

var isbn = ["9788566250299", "9788555191459", "9788555191039"];

Observing:
Using the google API Google Books Apis some books nay return result, as shown in this question Search book details with google-Books-api-in-php, so I would like to do it this way above, from which the result can be obtained by the URL:

https://www.googleapis.com/books/v1/volumes?q=isbn:9788566250299

But there is no return, already through the site of ISBN Brazil, has.

  • If I’m going to test this site, do I need to register? You always have to insert a captcha for each book?

  • No need to register, the link was wrong I arranged, just do the research a single time informing the captch, then by the link http://www.isbn.bn.br/website/consulta/cadastro/isbn/ + ISBN returns the book data from a new ISBN without typing again captch.

  • 1

    Okay, I managed to do a search but it takes a new captcha at every right search?

  • Ai que tá, the site does not validate if used the URL above and add the ISBN of the new search

  • @Sergio, do you have any suggested terms so I can search and solve the problem? Or any hint, qqr thing. Thank you.

  • If they don’t have an API to communicate, just by doing one by one by filling the captcha...

  • But you do not need to type the new captch for a new query, but the query should be via URL and not via query page, that is, after consulting the first time, it seems that the session is active, so using the URL ttp://www.isbn.bn.br/website/consulta/cadastro/isbn/9788566250299 and exchanging the ISBN I have the information of a new book

  • 1

    David this is much more interesting... I’ll take a look later, but this way you can have a Rawler that will fetch all the pages.

  • You have a list of all the Isbns you want?

Show 4 more comments

1 answer

2


Here is a suggestion :)

Make a first manual search, to pass the captcha and stay in session your IP.

Then put this script on the console:

var isbns = ["9788566250299", "9788555191459", "9788555191039"];

function trim(str) {
    return str.replace(/^\s+|\s+$/g, '');
};

function iterator(numbers, done) {
    var ISBNdata = [];
    var calls = numbers.length;
    isbns.forEach(function(isbn, i) {
        fetch(isbn, i, function(data, index) {
            calls--;
            ISBNdata[index] = data;
            if (calls == 0) done(ISBNdata)
        });
    });
}

function fetch(nr, index, cb) {
    $.ajax('/website/consulta/cadastro/isbn/' + nr).done(function(raw) {
        var data = process(raw);
        console.log('Recebido index', index);
        cb(data, index);
    });
}

function process(raw) {
    var body = raw.match(/<body>([\s\S]+)<\/body>/);
    if (!body) return;
    var proxy = document.createElement('div');
    proxy.innerHTML = body[1];
    var elements = proxy.querySelector('.conteudo').getElementsByTagName('div');
    var data = [5, 6, 7, 8, 10, 12].map(function(i) {
        return elements[i].childNodes[3].nodeValue;
    });
    data.push(elements[12].childNodes[5].nodeValue);
    return data.map(trim);
}

iterator(isbns, function(data) {
    console.log('---------------')
    console.log(JSON.stringify(data));
});

and the result is this:

[
    ["978-85-66250-29-9", "Começando com o linux: comando, serviços e administração", "1", "2013", "135", "Adriano Henrique de Almeida (Organizador)", "Paulo Eduardo Azevedo Silveira (Organizador)"],
    ["978-85-5519-145-9", "Componentes reutilizáveis em Java com reflexão e anotações", "1", "2014", "378", "Eduardo Guerra ( Autor)", "Vivian Matsui (Editor)"],
    ["978-85-5519-103-9", "Containers com Docker: do desenvolvimento à produção", "1", "2015", "127", "Daniel Romero ( Autor)", "Vivian Matsui (Editor)"]
]

About the script:

The function iterator is the one that separates each ISBN into a separate function call, from which everything happens asynchronously. Each number is passed to fetch who makes an ajax request and when burst the reply asks the process to filter the results. The function fetch calls back the callback and when all the callbacks have been called, then the "mother callback" (the one we passed at the beginning of the iterator) is called with the dice cetinhos :)


Note: use this technique only if the license to use said website allows the use of its content in the way you intend

  • 1

    Uhulll. It worked out! @Sergio , my ultimate goal was the list of all the books of the House of Code, to get all of their Isbns, I went in the search and researched by publisher, then returns me the list. With the algorithm you proposed, and adding the array with all isbns rode a table with all the books of the Code House

Browser other questions tagged

You are not signed in. Login or sign up in order to post.