17
I am working on two systems that have a common feature: entities whose attributes are dynamic (i.e. cannot be "plastered" - hardcoded), need to be consulted in search operations (result filter), but otherwise are decoupled from the rest of the system (i.e. there is nothing referencing them, nor are they used in any specific calculation). I’m having a hard time finding a proper way to represent them.
So far, I have used the (anti-)standard EAV. There is a table Entity
identifying the entity, a table Tag
(NxN
with Entity
) which says the "type" - or "class" - of the entity, a table Attribute
(Nx1
with Tag
) which identifies the property, its type, and whether or not it may be absent, and ultimately several tables XValue
(TextValue
, IntValue
, DateValue
etc) which say "the entity E has for the attribute A the value V".
Contextualizing, an entity could be a product for sale, its tag the type of product (e.g., "car"), the attributes being the characteristics of a car (e.g., "brand", "model", "kilometers wheeled") and the values the application of those characteristics to the product for sale (ex.: "Fiat", "Palio", "20000"). Product types can be created, changed and excluded (dynamic), one can refer to a product type by its characteristics (e.g., filter cars by the "Fiat" brand), but there is no foreign key to a specific attribute, and never there will be any "non-generic" calculation involving product characteristics.
And, as I said, I currently implement this in the form of EAV models - in a Postgresql database, cross-Platform at first but more towards Linux.
Normally the use of EAV is a "potential WTF alert", but I believe that this would be one of its legitimate uses (the data structures modeled in this way are actually dice, and not meta-data). The vast majority of the bank is formalized, only a few entities were modeled this way (and I took steps to ensure that it would be possible to formalize any model that might be coupled to the rest of the system).
Meanwhile, I’m looking for alternatives to this model - since it is laborious to deal with and full of pitfalls. I thought of representing each entity by a simple XML, but there is the requirement of the filtered search - what I do not know how to do [efficiently]. I have no experience with nonrelational Dbms ("Nosql"), so I don’t know if your performance characteristics would be acceptable (I’m predicting a high number of readings - filtered searches - simultaneous as well as simultaneous writing - although not in the same entity). Other ideas came to mind, but they seem too "crazy" to use in practice (e.g.: dynamically change the BD scheme whenever a "type" is modified).
I ask that the answers be based on prior experience involving systems with similar requirements, not only opinion and/or theoretical basis, if possible. Suggestions for specific technologies are welcome, but what I seek is one more solution strategy, and not a software recommendation.
Updating: For those who are not familiar with the "Entity-Attribute-Value" (unfortunately I did not find any material in Portuguese on the subject), here is a concrete example:
In a traditional (formalized) modeling, a "Car" entity could be represented like this:
Carros
ID Marca Modelo Km
----------------------------------
1 Fiat Palio 20000
2 Honda Fit 10000
In EAV modeling, this is how it’s done:
Entidades Atributos Valores
ID Tipo ID Nome Entidade Atributo Valor
---------- ------------- -----------------------
1 Carro 1 Marca 1 1 Fiat
2 Carro 2 Modelo 1 2 Palio
3 Km 1 3 20000
2 1 Honda
2 2 Fit
2 3 10000
The advantage of this representation is that the type "Car" is not plastered (hardcoded): if I want to add a "Color" attribute, I don’t need to mess with the table structure, create darlings new to handle this attribute, etc - just create an extra row in the table Atributos
and for each car one more row in the table Valores
. If in addition to cars I want to represent something else (e.g., real estate), just create rows in the table Entidades
with this new type, and give it attributes in the same way as the "Cars".
The drawbacks are... well, all possible and imaginable! Generally speaking, this is a standard to be avoided at all costs (i.e. an "anti-standard"). However, even the biggest critics admit that - in certain cases - it is inevitable. My question is: 1) does this apply to my case? or is there a formalized way to meet my requirements? 2) if it is inevitable, if there are good alternatives that have acceptable performance.
I couldn’t understand it, ... ?
– user6026
@Harrypotter What part? If something is not clear in the question, tell me what it is and I add more details.
– mgibsonbr
I’m trying to understand the skein, in relation to such a dynamic configuration, I’ve read a few times and every time I read I get more confused, I really liked the question itself, but if you can put an example that comes in your head I want to try to help, because, it is the part of programming that I like most !!! I don’t know if I can make it, but who knows!
– user6026
@Harrypotter I put an example, I hope it’s clear. Anyway, these days when I didn’t get an answer I’ve been looking for alternatives. If until I reach a solution no one posts any answer, I document here what I discovered...
– mgibsonbr
Are you not looking for the egg hair? By the description the EAV model works perfectly for your case. You don’t have to be so afraid of him. I have experience with a system that uses a derivation of this model taken to the extreme, and despite Wtfs (which in your case seem well under control) I still see several positive points.
– bfavaretto
@mgibsonbr, I will give a analyzed, I found very interesting!
– user6026