Which JSON structure to use for large data volume without loss of performance?

Asked

Viewed 10,114 times

9

I am thinking about using JSON in a project, because it is highly accepted and there are many libraries ready that encode and decode it in other objects (arrays, for example), but there is something that worries me.

Suppose a Web Service returns values from a database table called clientes. Something like that:

"clientes":[
    {"nome":"João", "sobrenome":"Silva"}, 
    {"nome":"José", "sobrenome":"Barbosa"}, 
    {"nome":"Maria", "sobrenome":"Joana"}
]

Now, suppose the table has 10 million lines and I need the Web Service to return a JSON object with all of them.

The size of the object would be huge. Therefore, it would take a lot of bandwidth to transfer the information.

I could compress the JSON object using gzip, but this would generate another problem: a high processing cost to compress the object.

I could invent a compact format and just use it. But I would lose all the facilities offered by libraries dealing with JSON objects. In addition, it would be a non-standard design, which would make it difficult to maintain.

There is a solution to this dilemma?

I was thinking... maybe there is a differentiated JSON format that is accepted by libraries, and that is specific to the cases where data names are constant.

For example, something like this:

"clientes":[
    {$"nome", "sobrenome"$}, 
    {"João", "Silva"}, 
    {"José", "Barbosa"}, 
    {"Maria", "Joana"}
]

There is such a thing?

If not, what would be the best solution? I need to return large objects and would also use a common format to transfer them.

3 answers

9


If the system is very well documented, nothing prevents you from removing everything that is not necessary and do just that:

[
    ["João", "Silva"], 
    ["José", "Barbosa"],
    ["Maria", "Joana"]
]

(line breaks are only for easy reading).

If you want a little more complete, keeping all the data but without redundancy:

{
   "clientes": [
      "nomes": [ "João", "José", "Anônimo", "Maria" ],
      "sobrenomes": [ "Silva", "Barbosa", "", "Joana" ]
   ]
}

Just do not skip any empty parameter. Leave the "" in the missing position, so that the names do not go out of sync with the surnames.


Still, if you need to keep the field names separate for query, it could be something like this:

{
   "campos":
   [
       "nome", "sobrenome"
   ],
   "clientes":
   [
       ["João", "Silva"], 
       ["José", "Barbosa"],
       ["Maria", "Joana"]
   ]
}


I could divide it a thousand other ways, but I believe with these as a starting point, it’s easy to solve the problem.

7

First of all, JSON is a very compact object, imagine an enhanced XML, it’s JSON. So you can seamlessly include many and many records without worrying too much about it, including you don’t really need to submit one string containing the JSON to then give parse on it and turn into a JSON object, you can simply send the JSON object directly, most languages nowadays support this, so don’t worry.

Referring to the structure you want, more "light" and without losing the context:

I recommend the use of this:

{

    "sobrenome":{
        "cliente":[
            "Silva",
            "Barbosa",
            "Joana"
        ]
    },
    "nome":{
        "cliente":[
            "João",
            "José",
            "Maria"
        ]
    }

}

But it’s very important to highlight the words of @Bacco:

I could divide in other ways, but maybe it is dangerous to confuse when some parameter is missing (I would have to leave the "" in the missing position, in order not to get out of sync):

{
   "nomes": [ "João", "José", "Anônimo", "Maria" ],
   "sobrenomes": [ "Silva", "Barbosa", "", "Joana" ]
}

But why should I use this structure?

Because this way there is only Data Array’s and not an Array with label’s and data (causing duplicity of label to each record), such as an array with Labels and data:

{

    "cliente":[
        {
            "nome":"João",
            "sobrenome":"Silva"
        },
        {
            "nome":"José",
            "sobrenome":"Barbosa"
        },
        {
            "nome":"Maria",
            "sobrenome":"Joana"
        }
    ]

}

Note, that the "name" and the "surname" repeating to each record, in a huge object like yours, this would cause a bigger performance problem.

Additional:

Since you mentioned it Web Service first believe that you can store your json in an extension file .json containing only the json inside. And I can imagine that you have the possibility to work with javascript, so I recommend using the $.ajax() jQuery which is nothing more than a request to the server that can request a JSON directly, for example:

$.ajax({
  dataType: "json",
  url: "arquivoJSON.json"
}).done(function(){
  alert( "sucesso" );
}).fail(function(){
  alert( "erro" );
});

And there is also a method shorthand jQuery $.getJSON that would further facilitate the requisition:

$.getJSON("arquivoJSON.json", function( json ) {
  console.log("Nome: "      + json.nome.cliente[0]);      //João
  console.log("Sobrenome: " + json.sobrenome.cliente[0]); //Silva
});

Completion:

JSON is the lightest way to rescue encapsulated data from the server, so much so that it is a technology developed for this purpose, to replace XML in most cases, and because it is a technology used for WEB and WEB applications need the performance and speed in question of the user’s internet, it really does well. No matter what language you are using, just be rescuing a JSON object directly and with a well thought-out structure, you will not have major problems with performance and performance.

4

For high-performance systems that need data exchange through a recognized yet flexible protocol, JSON is not the best solution. Note that I am not talking about a web system that accesses server data via Javascript.

There are several solutions for message exchange in formats that, in relation to JSON, consume less time for serialization/deserialization and less bandwidth.

One of them is the format BSON. It is nothing more than Binary JSON (Binary JSON). The proposal is to be lighter and more efficient than JSON while maintaining flexibility and compatibility. There are implementations for virtually all languages.

Although BSON is efficient, there are some scenarios where this format takes up more storage or memory space than JSON. See this reply from Soen for more details.

Another more compact and faster alternative than JSON is the format Messagepack. It also has implementations for most languages.

Particularly, I would do some testing with the Messagepack before deciding anything.

  • 1

    +1 for demystifying JSON as being more compact format, among other things.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.