Alternatives to the (anti-)standard Entity-Attribute-Value

Asked

Viewed 655 times

17

I am working on two systems that have a common feature: entities whose attributes are dynamic (i.e. cannot be "plastered" - hardcoded), need to be consulted in search operations (result filter), but otherwise are decoupled from the rest of the system (i.e. there is nothing referencing them, nor are they used in any specific calculation). I’m having a hard time finding a proper way to represent them.

So far, I have used the (anti-)standard EAV. There is a table Entity identifying the entity, a table Tag (NxN with Entity) which says the "type" - or "class" - of the entity, a table Attribute (Nx1 with Tag) which identifies the property, its type, and whether or not it may be absent, and ultimately several tables XValue (TextValue, IntValue, DateValue etc) which say "the entity E has for the attribute A the value V".

Contextualizing, an entity could be a product for sale, its tag the type of product (e.g., "car"), the attributes being the characteristics of a car (e.g., "brand", "model", "kilometers wheeled") and the values the application of those characteristics to the product for sale (ex.: "Fiat", "Palio", "20000"). Product types can be created, changed and excluded (dynamic), one can refer to a product type by its characteristics (e.g., filter cars by the "Fiat" brand), but there is no foreign key to a specific attribute, and never there will be any "non-generic" calculation involving product characteristics.

And, as I said, I currently implement this in the form of EAV models - in a Postgresql database, cross-Platform at first but more towards Linux.

Normally the use of EAV is a "potential WTF alert", but I believe that this would be one of its legitimate uses (the data structures modeled in this way are actually dice, and not meta-data). The vast majority of the bank is formalized, only a few entities were modeled this way (and I took steps to ensure that it would be possible to formalize any model that might be coupled to the rest of the system).

Meanwhile, I’m looking for alternatives to this model - since it is laborious to deal with and full of pitfalls. I thought of representing each entity by a simple XML, but there is the requirement of the filtered search - what I do not know how to do [efficiently]. I have no experience with nonrelational Dbms ("Nosql"), so I don’t know if your performance characteristics would be acceptable (I’m predicting a high number of readings - filtered searches - simultaneous as well as simultaneous writing - although not in the same entity). Other ideas came to mind, but they seem too "crazy" to use in practice (e.g.: dynamically change the BD scheme whenever a "type" is modified).

I ask that the answers be based on prior experience involving systems with similar requirements, not only opinion and/or theoretical basis, if possible. Suggestions for specific technologies are welcome, but what I seek is one more solution strategy, and not a software recommendation.


Updating: For those who are not familiar with the "Entity-Attribute-Value" (unfortunately I did not find any material in Portuguese on the subject), here is a concrete example:

In a traditional (formalized) modeling, a "Car" entity could be represented like this:

Carros
ID      Marca     Modelo     Km
----------------------------------
1       Fiat      Palio      20000
2       Honda     Fit        10000

In EAV modeling, this is how it’s done:

Entidades         Atributos            Valores
ID   Tipo         ID    Nome           Entidade Atributo Valor
----------        -------------        -----------------------
1    Carro        1     Marca          1        1        Fiat
2    Carro        2     Modelo         1        2        Palio
                  3     Km             1        3        20000
                                       2        1        Honda
                                       2        2        Fit
                                       2        3        10000

The advantage of this representation is that the type "Car" is not plastered (hardcoded): if I want to add a "Color" attribute, I don’t need to mess with the table structure, create darlings new to handle this attribute, etc - just create an extra row in the table Atributos and for each car one more row in the table Valores. If in addition to cars I want to represent something else (e.g., real estate), just create rows in the table Entidades with this new type, and give it attributes in the same way as the "Cars".

The drawbacks are... well, all possible and imaginable! Generally speaking, this is a standard to be avoided at all costs (i.e. an "anti-standard"). However, even the biggest critics admit that - in certain cases - it is inevitable. My question is: 1) does this apply to my case? or is there a formalized way to meet my requirements? 2) if it is inevitable, if there are good alternatives that have acceptable performance.

  • I couldn’t understand it, ... ?

  • 1

    @Harrypotter What part? If something is not clear in the question, tell me what it is and I add more details.

  • I’m trying to understand the skein, in relation to such a dynamic configuration, I’ve read a few times and every time I read I get more confused, I really liked the question itself, but if you can put an example that comes in your head I want to try to help, because, it is the part of programming that I like most !!! I don’t know if I can make it, but who knows!

  • 1

    @Harrypotter I put an example, I hope it’s clear. Anyway, these days when I didn’t get an answer I’ve been looking for alternatives. If until I reach a solution no one posts any answer, I document here what I discovered...

  • 1

    Are you not looking for the egg hair? By the description the EAV model works perfectly for your case. You don’t have to be so afraid of him. I have experience with a system that uses a derivation of this model taken to the extreme, and despite Wtfs (which in your case seem well under control) I still see several positive points.

  • @mgibsonbr, I will give a analyzed, I found very interesting!

Show 1 more comment

1 answer

12


This is a typical use case for Nosql tools, especially for family tools document-oriented databases (like the Mongodb). Other possible / complementary alternatives are tools Enterprise search (like the Elasticsearch).

The key word here is schema-Less, a feature with both advantages and disadvantages (see that presentation by Martin Fowler on the subject).

I had good experience with great systems of:

  • E-commerce ⇒ They usually have very complicated relational models to represent attributes / attribute groups / category hierarchy / category attributes. etc..
  • Vacancies and Resumes They usually deal with unstructured textual information. Algorithms to determine relevance are extremely important.
  • Telco Store various configurations that are difficult to structure in a relational model.

In fact, from a certain volume of data as well as for certain reading and writing characteristics, Enterprise search, Nosql or both are practically necessary to keep the system responsive.

How it works

In both cases it is possible to create documents with various attributes (e.g., in JSON format), including complex attributes (arrays, nested types, etc.). Documents with different attributes can be grouped in the same category (i.e., the type of a document does not define its attributes).

When you search for a particular document (for example, id), receives all the information that persisted about that document (i.e., all attributes).

When you make a query using a certain attribute as Engines only search documents containing that attribute.

In practice your application ends up knowing and defining the rules for the attributes (e.g., a car will always have id, modelo and ano, but only imported cars have the country of origin attribute) and you make "healthy" consultations on top of those rules. That’s what Martin Fowlwer calls schemas implicit.

Characteristics

Performance: The correct use of these solutions is much more efficient than standard EVA in relational databases. These solutions are made to scale and quickly return results of queries in data sets giants.

Consultation: Type searches, ranges, logical searches, sort, scoring, full text search, Anyway, every kind of thing can be done. Of course the tools have their own characteristics. It takes a while for you to understand how to structure information efficiently, as well as lose bias in aggregate functions, clauses having and things like that (although even this kind of consultation can be done with Mapreduce).

ACID / BASE: Just like in relational databases, Nosql tools can usually be configured to support different consistency/availability/fault tolerance features (see Theorem of CAP). Each tool also has its transaction and serialization mechanisms* .

Practicality: Persist your documents and make every kind of query (as well as updates, déletes, updates partial, etc). Everything is quite simple (the learning curve of Relational DBMS, in my view, is much more accentuated).

* Specific to the mentioned tools: You will have no problem with competing readings or writing. I just recommend keeping at least two knots in the air at all times (especially for Mongodb). The intuition here is not (just) ensuring high availability, but rather extracting the most from the tools. I recommend doing this even for the development environment... This task is much simpler than it looks (I made local settings with Vms on my laptop, I lost no more than an hour setting up Mongodb and Elasticsearch).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.