Saving a JSON in Database vs Relationship?

Asked

Viewed 1,630 times

5

When I didn’t know much about the question of database structure and relationship between tables, I used to use JSON or comma-separated data in a given field of any table in the database.

In fact, I remember a specific case where I needed to relate several images to a particular post. These images I saved a list on JSON in a certain field called imagens.

The structure was more or less like this

-----------------------------------------------
tabela
-----------------------------------------------
id | imagens                      | usuario_id
-----------------------------------------------
1  | ["image1.png", "image2.png"] |   55
2  | NULL                         |   56

However, at a certain point where changes in this system began to appear, I was harshly criticized by a programmer who knew more about databases than I (and was also mocked) for having done so.

He explained to me that the data should look something like this:

--------------------------
tabela
--------------------------
id | usuario_id
--------------------------
1  |  55
2  |  56


----------------------------
imagens
----------------------------
post_id | url_imagem
 1      | image1.png
 1      | image2.png

Considering these two cases (VS serialization relationship saved in a field), we can state that, structurally speaking, using the JSON is a bad practice?

I was criticized for using data in JSON in a table field instead of a relationship, but there is some case where I should/can use a serialization and save it in a database field, without this being considered a bad programming practice?

  • I believe that if you have unrelated data that can be saved to a database, such as a user’s custom settings for example, you can choose to json unless you will need details about this update.

  • Using the first example you posted would it be possible to define date and time of inclusion of each of the images? This may be important information for some systems, but if these details make no difference to these saved data, I see no harm in saving json

  • 1

    Yeah, the problem is that my boss later talked about making "photo album" with these images. Then it would have to have the image date yes :(. These future implementations detonate the bank’s code and structure!

  • Okay, ultimately it depends on how this data will be used. If I save for example a json with the theme settings for a system made by a user and I just need to know what the final configuration of the change is like, json by usiário can facilitate a lot of things.

  • Therefore, it is only bad practice if the structure created impairs the performance or development of its solution.

  • I disagree in the sense that the evolution of the comic book scheme is a boring thing, yes, but necessary: trying to escape it using "shortcuts" has a good chance to bring headache after... An exception is when the "variable" data will not be used in the database at all, and there is not much prediction of what structure they will have. A real example that occurred in a system of mine are the Preferences of the User Interface - Each user has their JSON with their preferences, and whenever the UI changes the set of preferences it can change. But since none of this has anything to do with the BD, a JSON is enough.

Show 1 more comment

1 answer

11


A relational database, normalized, assumes that each attribute (value of a column in a table) is atomic, that is, it cannot be broken into smaller pieces from the point of view of the relational model. From the point of view of the relational model, It’s worth noting: no one would suggest that you store each digit of a number in a separate column, or each character of a string. If a data is not used to form relationships between tables, this data can be considered for all atomic effects.

In your example above this is not necessarily true: you have a list of images, okay, but can this list of images be considered atomic? Each image belongs to a single table row, there is no image overlay (i.e. the same image belongs to several posts distinct), you will always be interested in the full list and nothing else (i.e. it is not possible for you to solve, for example, just grab the first image from the list, to make a thumbnail or cover image for example)?

If the answer is a definite "yes", there is no chance that you will parts from this list of images to anything, or is the entire list or is nothing, so no, it is not bad practice to store the list this way, whether using JSON, comma-separated strings, Blobs, etc. In the same way that your image is in an external file, and that file is seen by the relational model as an atomic thing (you don’t reference "pieces" of that image in your model, only the entire image), the list of images - or even something more complex and structured - can rather be seen as an atomic entity, and therefore saved in a single attribute.

However, most of the time the answer is not a "yes": it is perfectly possible to conceive use cases where an image is shared by more than one post, or that you want to recover only a subset of post for anything, or even if you want to associate with each additional meta-data image (such as type, size, creation date, etc.). In that case, a representation normalized is usually ideal, and this alternative suggestion would correspond to a normal form.

As a rule, I would say to, in doubt, adopt the normalized form: it is easier to denormalize a table later (in case of performance problems, for example) than to normalize one that started denormalized. Unless you have good reasons to save a JSON in a table avoid doing this. There are cases where a JSON is the best representation for a data set, but in this case a non-relational database - commonly referred to as Nosql - may be a preferable option.

  • Very good @mgibsonbr. Interestingly, if I had to sort the images by size (as I needed to do), I would have to do a huge gambit in the case of JSON. I believe that in that case the best solution would be to apply the "yes"

  • I think you mean "apply no", don’t you? Anyway, what did you mean by "sort by size"? It is necessary that the images are stored in the comic book in order of size? Or a particular query has to bring them in that order? (and another query may want in alphabetical order, for example) For small ensembles, ordering is no drama, but this is a perfect example of task that the BD could do for you if the model is normalized, but you would have to do by hand on the application layer if it is not.

  • It’s true, it’s the "no". In this case, to sort by size, it would be better to save an image table related to the post, with its respective size. Better than using PHP, json_decode and still have to figure out the size of the image with filesize and then have to make a sort in the array.

  • 1

    @mgibsonbr, speaking in JSON, relational and non-relational BD, I remembered this article.

  • @Tobymosque Damn it! Very good you have passed this article, I was in the process of using Mongodb in a future project side by side with Postgres, but apparently it is more business to study these new features of Postgres and keep everything in one bank. I’ll look at it more carefully. Thanks!

  • @mgibsonbr, just check that the Postgresql version is at least 9.3, although the ideal is to use at least 9.4. Then the Postgresql version may be an impediment for you.

Show 1 more comment

Browser other questions tagged

You are not signed in. Login or sign up in order to post.