When and why to use data dictionaries?

Asked

Viewed 1,362 times

7

I had already read this answer (below) and today I read it again, and on both occasions the highlighted passage caught my attention:

DRY is to avoid redundancies, right? - Stack Overflow

That’s why I use data dictionaries extended form (not only in classical definition) for many years. In the dictionary of the application I have all the facts of the application in one place. By changing there, I’m changing everything I need. Depending on the technology used, even the code may be in this dictionary, but most of them can’t do this. There are specific techniques to link the facts with the code.

I confess that I do not have much intimacy with the concept of data dictionaries and I have no habit of using them in practice, so the question arose of when and why we should use them.

But what caught my attention the most was the part that says: "extended form (not only in the classical definition)". What would be the ways of applicability of data dictionaries?

  • 1

    Vixi, now? The answer is long, I need to see if I can answer :D I think I will do little :D

1 answer

9


What I’m talking about here

Let’s define that almost everyone who speaks of data dictionary is referring to the schema of the database or at least something similar, ie, is how you model the database and have a series of metadata that will help the database system operate in that model, they will define what can or cannot do there, tells what are the tables, indexes, columns, keys, data types, restrictions, triggers, etc. For those who do not know what the term is, there is an already known view of the Dictionary date, that is, a mechanism that has all the definitions and rules of your data.

The extended use I talk about in that answer is a concept that goes beyond the database, is you use the data dictionary for your entire application. The term used here is about software development, about programming and not just about data modeling. Until we talk about data, the name is not used for nothing, but we understand all objects or artifacts (or assets) of the development as development data.

This usage, as far as I know, started strong in Erps in the 1970s. It seems to me that the first one who did this still discreetly was IBM’s COPICS (Communications Oriented Production Information and Control System). They took the idea from the SGDB data dictionary and put in some things that the ERP needed. Hence a lot of ERP started copying (incidentally, the Permanent Hourglass System at its inception was an almost exact copy of COPICS). And each time were adding more things, which turned out to be something else, but the name stayed. I learned the concept with that name.

On second thought, this is no longer a pure data dictionary. At least it’s an application data dictionary. And it’s not just about data, so it gets more accurate and simpler to call application dictionary. Well, that’s what I’m talking about here.

A few years ago it was easy to find information on what I am talking about on the Internet, now it requires much more effort, the amount of noise increases a lot when a term is used in more than one context and one of them stands out from the other. We do not have a formal and universally accepted term on the subject, so for searches it has to be by "data dictionary" and manually filter what is about DB or the application.

It is possible to have a corporate data dictionary, after all the concept applies to any kind of data or process, not only in IT. But here I will only talk about your use in software development.

Both the concept of database how much of the corporate has relation to what I’m talking about here, but are different objectives and with different assets.

What it is and what it’s for

It serves to manage complexity while giving flexibility, and in an extremely productive way, giving a very great power to the programmer and even the user (which can be questionable).

The initial cost of creating a proper dictionary is great and done because those who do not understand may not give the expected results. And for applications that will change little does not have so much advantage, the gain will occur over time as their changes can be made much more reliably and quickly. The DD in English or Ddd in Portuguese (plus a DDD :P) comes to give you productivity and robustness.

In some implementations it can really help productivity, in others it can hurt. Do not use this technology in trivial, niched systems, which does not have a large volume of objects, which does not need constant changes in business rules (even, I am not talking about eventual changes as occurs in most non-lob software).

For me the main advantage is precisely what is in the original answer, is about DRY, which I consider the most important software development principle that exists. Everything that is said about complexity management, maintainability and even other concepts that are preached in software engineering is DRY that gives all this. And some "modern" techniques that sell around preach giving it up, one of the reasons I’m critical of these techniques. They are techniques that preach increasing complexity to manage complexity. They are new, unproven and against what has been proven for decades.

I will not deny that some consider DD a complex technique, and it is. But if done properly this extra complexity can become transparent to the system. In fact it is a platform that is being created, make no mistake. And it is already clear that it is not suitable for small or simple systems. But it becomes more important in systems that people develop today, which have excessive complexity and mostly excessive redundancy. Where there are layers has excessive complexity, and to return to DRY only the data dictionary to save.

THE AD (application Dictionary) has to do with keeping all the information about the application in one place, all the same, even the documentation. DD is very Agile, but go see if any proponents of Agile have heard of it. You keep the documentation inside the application and "guarantee" that the documentation changes along with the application

  • Focuses on people’s needs and not on the software development process.
  • Get quality software with reduced amount of tools.
  • The user can participate actively or passively, but without technical noise that he does not understand.
  • Responds to change in agile way in short interactions by nature, after all if it is easier to make change the interaction gets shorter and predictable, therefore more manageable.

Project management

And possibly decreases the number of people involved by solving the problem of The Mythical Man-Month, including involving the customer more directly.

Much of what I’m talking about here is there in this book that is the canonical reference on project management. I don’t invent anything, I just organize and interpret the things that are well established. Some I don’t speak explicitly, but the DD helps in almost every point that the book touches. I recommend reading this and all the classics of our area. It’s enough to feel sorry for new people entering the area who will never even hear the names of these books because they’re only concerned about the technology of the day.

Him:

  • helps to control the progress of the project in a natural and almost transparent way, and allows to see all historical (if well done, I always say this),
  • helps decrease the need for highly competent programming professionals (although to create the DD one needs them),
  • reduces the need for tests, evaluations, and even thinking too much about architecture in each change that is made, being easier to ensure the conceptual integrity, something that few people talk about and is one of the most important things in software development.

AD is a concept, but it turns out to be a tool. It is usually managed through a framework, or even an SDK, because it is a very integrated library with your application and/or external tools that help manage the dictionary. This is a platform. But it’s different from others frameworks that people use so much?

He would be so obscure and rejected (even by default) if a great player release something like this? For example, if Visual Studio came with a tool and the . NET supported it within it, wouldn’t DD have a massive adoption, even where it shouldn’t? (In fact they created it, the Visual Studio Lightswitch, something done by those who did not understand well what needed to be done, so it did not work).

It is no different from using a ORM, in fact it replaces with advantages an ORM, after all the intelligence happens to be in the dictionary. Ends the dichotomy between code-first or model-first, is Dictionary-fisrt and unique, is a single source of truth (in some cases may turn into single version of the truth). Be sure to also read about the System of record, one of the bases of DD, but applied to development.

Anyway, it’s a find and who knows it and can bear it, don’t ever want to let go.

Criticism

But of course he’s not far from criticism. Although some fair, the criticism I hear the most (and little, because people do not know and do not want to know the one of the application data dictionary, people just want to learn what is fashionable, what is being spoken by everyone, which is not the case here)and quite valid, is that it creates the so-called Second-system or Inner-Platform. This is actually not good, but it is a price you have to pay to have a lot of benefits. But make no mistake, people do such complex things nowadays that they end up creating the same complexity by accident, and without so many benefits. At least in AD you know you’re creating complexity. People live by creating these aberrations without realizing, this is especially true in "web applications".

I suggest reading the Wikipedia article that talks about internalized platform, because much of what is said about good practice and the architectures and patterns of projects that are preached nowadays end up falling into it. Pay attention! The problem is that people don’t realize this.

Among other things I mentioned in the text I will illustrate here the Nosql, but they can be more specific things like the wrong use of exceptions. And there are controversies when the article says that a virtual machine is always a good choice and a Inner-Platform acceptable. If this is true, the AD certainly is, because it brings much more benefits, comes to be something close to a silver bullet, size or gain (after the initial cost amortized, so it would be better to have something ready, which is not easy to do to meet all demands, which by the way is one of the problems that SQL has and that is why it gave birth to Nosql).

Those who have no experience will fall into a series of traps, which I have already fallen into. For example enter the Greenspun’s Tenth Rule. I learned about DD (now called AD) in a large project of a large company in the branch of Erps, and it never worked well, but it was always very useful anyway. The problem is that it was done without knowing what an AD should be, without thinking about the future, and was used more as a marketing tool than as an engineering tool.

I see the same in products like Dynamics from Microsoft, just to stay in a nominal example (up to above average) of who did the AD without understanding what it is (others did worse or even have something similar), without planning, to meet a need in the middle of the project and not as an initial requirement, and calling it DD, which is already misconceived at the start.

I do not recommend doing in production before you understand the subject well or if a third makes a product well thought out for you to consume. I don’t do this because I don’t know how to finance or market it (even more than today this kind of thing would only succeed if it were open source), is the problem of being an engineer. One thing I know for sure: trying to make almost every product today will produce a very bad result. It cannot be born as a MVP, that was the mistake of the current Erps. And I’m afraid of the monster that might turn out to meet everyone’s need, so I’d have to think of a solution, which would probably become a third-system, which can be very bad.

Of course all these criticisms fit to the many things that are used today without thinking. Take for example the law of software involvement: if taken seriously, we should only use Assembly.

I can cite the fact that it requires good programmers to implement it as a difficulty of it. It also requires a minimum of competence for its use, but this is almost universal requirement, perhaps it suffers a little more at a certain level of use because the person needs to understand its functioning, and the relationships between everything, but I can’t see how it’s any different than normal codes. In fact it may require less experienced programmers to take care of some more isolated assets, which previously would not be.

It is necessary to assemble what is called surgical team, where each person has his or her specialty. There is the chief surgeon (architect of AD), the other surgeons (engineers of the specific domains), and the assistants who take care of only one aspect of surgery (programmers who code the most detailed assets), those who perform even less relevant things, or that are easy to accomplish, but that need to be well-made.

I do not know if it pays to do in any software (well, any is already exaggeration, of course some are obviously discarded). But you know? How? I talk about experience, there is eliminated opinions without foundation, the person just think that it does not serve. I have experience doing this in platform ERP and in dynamic language. I have some experience outside of that, but in a limited way. Does it work well in C# in an internal ERP of a company? Does it work outside an ERP? I wanted to try at least the first, I think my experience would make it a success and if someone wants to hire me for this type of project I am available :D.

Obviously you can’t be afraid to become a platform when the goal is to be a platform. But does it pay to have a platform when that wasn’t the goal?

What a problem he solves

Today it’s common to have documentation, and it can be multiple instances, the database, and increasingly we’re seeing several of them for the same thing, and have a few layers of the application, which can be the model, the controller and the vision, this on the server and then on the client again the same thing, and can use different languages, and can have other layers, services, contexts or other needs, and have GUI client, web, "API" or even CLI. Every new field involves analyzing and potentially changing a huge amount of locations, and you can’t forget any. And it may be that other departments, other than yours, or even the user when he has the privilege to do so, have used it somewhere that you don’t even know about. No control of all assets and where they are used has no way to work.

In the era of microservice and DDD, just to name two new pests they invented, the use of the data dictionary should be mandatory. I’ve said before here at Sopt that a better implementation of Ddds can make it more palatable and viable for many cases. And that probably goes through AD. I’m not talking about something that denies modernity, but facilitates its adoption, if it’s really still necessary.

AD makes OOP "obsolete". Not that it can not use this way of thinking in specific mechanisms of software, which, incidentally, is where OOP shines, but its use to define all software does not make much sense, including because the application dictionary tends to follow a more relational model of doing things (it looks like the concept of database, but it is not it in itself). AD can eliminate the need for OOP and all design standards normally associated with this model, keeping complexity much more controlled.

In fact, OOP is a simplified and naive way of making an AD that focuses on the object and not on relations. Do you think well what is most important and what can give more problem, is the object or the relationship that these objects have with each other? DD focuses on the relationship, although it has everything OO has as well. Ah, it encapsulates and abstracts much more.

Types of dictionaries:

  • Active application dictionary

    Generally implemented with so-called dynamic languages, or at least in so-called static languages but with powerful reflection mechanisms.

    This way allows the user to configure in the application itself what he wants to change and everything is reflected there, at the time, during the execution. Sounds great, right? Not in my experience.

    Users tend to do wrong, and this freedom creates a situation where we practically return to the data management model and complex processes through spreadsheets (who has seen this knows the problem it is). But this apparent freedom imposes some limits on what can be done in the dictionary. I think the biggest mistake of Ads is just wanting to be a user tool, and somehow, marketing. Shines everyone’s eyes, no one sees the headache that will be.

    There is a relatively high cost of performance, especially if you do all possible checks that are necessary to give robustness.

    People overestimate the need to change software behavior at runtime, and in fact the Erps I know that use AD in practice need to restart for a lot of reasons, both the client (even the web) and the server.

  • Passive application dictionary

    Generally implemented as code generators. Today I like this more, perhaps because my bias is now on static languages.

    It is used as a developer mechanism, not the user. You have a catalog of objects from your system to facilitate the overview and control the change. Changing something in it you need to regenerate the application that will be sent to the user. It is even possible to generate different versions depending on the user.

    This way it gets rid of the user becoming part of the engineering of the product and leaves the changes in the hands of those who theoretically have more conditions to think about the changes. The user can passively participate in the change.

    Another very big gain is the performance, after all much of what would be decided at runtime is solved in the generation of code. Not to mention simplification, since complexity is abstracted by the code generator, nothing very different from many solutions you might use and not even notice.

    This is a form of scaffolding on Steroids.

Functioning

As he did not get traction in the marketplace there are no very formal definitions of how to call things. I will make an informal summary here, there is no room to be a manual, nor do I have this whole foundation so solid that it becomes something so canonical.

It reminds a lot of the data dictionary of a database, but it involves all the software.

In essence we have a catalog of objects that will be used in the application. Some people prefer to call artifacts, but probably need a different name because it is neither one thing nor the other, at least it can not be confused with the concepts used programming or in software engineering (you can see that the term is ambiguous until then), or even less in business architecture, maybe it looks like the use in UML, so I’m going to call assets, which is a term used in games, but I think for that context is appropriate.

By the way, there are similarities between UML and AD, and maybe AD should be what UML promised and did not deliver, due to excessive bureaucracy and lack of concrete result to deliver, it remains just another layer in development, which goes against the basis of what is Agile, what I I’ve criticized it before and already showed inconsistencies.

One of the reasons I think a universal AD tool can’t work out is that it would become something close to UML, which has proved more of a nuisance than a solution, and some people have already realized.

The AD needs to be created thinking of certain realities and meet more or less similar scenarios. It is possible to make an AD to suit most Erps or Lobs in general, but not for all types of software, it would be too generic, and it would take a lot of work to deal with the specifics, which would be too complex to be flexible for so many scenarios, because each new asset type may require specificities.

The ideal would be a programming language and an environment designed to handle an application dictionary, but I don’t think that’s going to happen. I have an idea of something like this, but it will never get off the ground because it takes a lot of resources to execute it. Nothing that a reasonable community or a large company can’t afford, but getting that involvement comes a long way.

I’ve been challenged about the fact that this language solution only works in the IDE, doesn’t have a traditional compiler and workflow, what it failed to understand, and I explain, is that the language serves the application dictionary, and the traditional language that analyzes normal code that you know alone takes care of the algorithms, which are simple. The data structure, which is complicated, is the responsibility of the AD, which only makes sense to exist in the IDE, and the texts (codes) of the algorithms are being attached as assets of the software linked to other assets.

In the catalog you have all kinds of information that serves the software, all with advanced organization, requiring you to understand well of taxonomy, probably ontology (less to organize, but to define better) to get good, not to mention the use of dialectics to model correctly, but there has to do with the organization of the catalogue and the project as a whole.

There are virtual assets that serve more to the development process and the concrete ones that will end up somehow in the software itself, what the user will deal with directly.

You can have all repository controls (in an advanced and natural way), issues of all kinds, including Prs in the AD, having attached documents that support decisions and serve as support in the workflow, or having some of the things that are typical of UML. Finally, what is useful to development can be put there in an organized way to "leap to the eye" whenever it is convenient, even when you are not so aware of it. Connecting these assets, even after the fact, is fundamental to the complete success of the use of AD. What will be included in the AD SDK depends on the methodology adopted.

It may have mechanisms that are not data, but behaviors in the process. Can put Agreement for example, or tests (which may still be necessary, but in a different way, I’m leaving the term well open here).

They will also have assets that are packages, namespaces, modules, types (classes, enumerations, structures, etc.), including rules of use that are normally not in normal code. And these types can be of various natures, and can represent data in the database, in the application, in the reports, files, network, and other mechanisms, in addition, of course, to the business rules themselves. Anyway, everything in the software must be precisely catalogued.

Without a code generator system or the code itself make necessary adaptations at runtime, you will have duplicity of efforts and consequent loss of DRY, which will cause all the problems we have today without AD. It doesn’t mean the AD is completely useless, but it’s hard to see the advantage.

You may be thinking that this is inflexible, but if it’s done well the flexibility is equal to what you can produce with direct code, and in fact everything that is algorithm continues to have normal code. The data structure changes.

The way it will be implemented varies according to the technologies that will be used as background of the data dictionary. Will you use C# or Python? SQL or Nosql? Etc.

It is easy to see how everything is more organized, unique, allows several compositions, and everything is "more at hand". Everything is there, you do not forget anything.

All changes can be propagated only by responding that you want the inclusion within other assets (the field is already a more granular asset), but it may need some manual work, aided and accompanied by the AD to avoid doing wrong, or to justify why you will not enter some place you probably should. You automate decisions. The field should go into all the system screens that use this entity? In what way in each one? And the reports? And other forms?

If you start thinking about your applications with AD in mind, even if you don’t implement it, you’ll see how much it can change the way you make software. Many of the problems you face, what you think you could do differently, what you find repetitive and boring, error-prone, would probably be different with AD.

And the future can be better with artificial intelligence out there. AI only works when you have a very large and well-modeled database to make decisions, AD helps a lot in this.

Facilitates documenting Apis and maintaining their stability, avoiding changing something you can’t unintentionally.

One good source to start understanding AD is to see how data is stored in database, then extend the assets to anything in the system and not just the database.

Another complementary idea is the semantic spectrum. Not that it helps define the AD better, but it gives some idea of what an AD model might look like.

It is common for AD assets to adopt cascade strategies. In some technologies this ends up avoiding inheritance even in mechanisms where it has always been consecrated.

Just to list some possible asset types:

  • databases
  • logic tables
  • physical tables
  • directories
  • columns
  • indexes
  • triggers
  • enumerations
  • standard values
  • guys
  • papers
  • restrictions
  • validations
  • relations
  • events
  • notifications
  • lookups
  • transactions
  • versions
  • families
  • activities
  • tasks
  • files
  • menus
  • pages
  • forms
  • flaps
  • fields
  • miscellaneous controls
  • reports
  • graphics
  • sessions
  • standard messages
  • scripts
  • function
  • values
  • entities
  • scheduling
  • accounts
  • users
  • identities
  • assignments
  • elevations
  • parameterizations
  • questioning
  • selections
  • miscellaneous data tables
  • languages
  • formats
  • locations
  • globalization
  • miscellaneous metadata
  • privileges
  • audits
  • profiling
  • contexts
  • modules
  • subsystems
  • aggregators
  • policies
  • strategies
  • rules
  • exceptions (not about Exception, that is within types)
  • priorities
  • workflows
  • facades
  • factories
  • descriptive
  • tickets
  • to-dos
  • documents
  • customers
  • infra
  • layering
  • External input Apis
  • External Apis of output
  • imports
  • exports
  • lawsuits
  • etc..

In some things you can granulate more, make maps and moorings, I put only examples. You can see that you have a lot of DB, GUI, common domains, Patterns design, etc., all in a different way than what is encoded today. I think we have noticed a pattern of what can be placed. I just touched the surface and it depends on each case, so I think trying to make an AD that fits everything doesn’t go very well. If it seemed too much, know that Erps usually have many thousands of assets, even millions in some cases.

I have little experience making the AD take care even of the mechanisms, so I don’t know how much it remains useful in this part.

Real example

To show an example (I’m not signing below as good implementation) is here the RADICORE. Has some Erps open source (I don’t like all of them) that use a data dictionary system and can be inspected, but they do everything very shallowly, and the gain is not so great. It seems to me that everyone uses it as marketing. The Adempiere is an example. Has a modern accidental attempts.

Completion

I wanted to analyze how AD happens in different scenarios, for example game development.

It is not a tool for all cases and that can be used in all scenarios, but where can the gain can be substantial. I’m talking about cost, simplicity, ease of maintenance, reliability in what you’re doing, adherence to customer/user needs, and making everything more predictable.

The wrong use can be tragic, like any tool. The abuse of deepening it can start to bring more costs than benefits. It can be complicated or costly to delve into much later.

The paradigm shift is brutal, but only so to have such a big gain. That’s why people are afraid of it.

More specific questions can be interesting now that you have a sense of what it is.

  • 2

    What I like about Stack Overflow is questions that make you think. This may have helped me better define the term, and I’m sorry if there the term was used carelessly, the fault is not quite mine :) To improve I have to cut something, hit the limit :)

  • Wow, when you said the answer would be long you weren’t kidding! Hehe... It will take me a while to read all the external references and understand the concept better, but, thank you very much, it will be very useful! :-)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.