Why validate even with a default value

Asked

Viewed 254 times

7

When I’m preparing something, however simple it may be, the question always arises:

I must trust the mine code?

This puzzle is usually associated with the fact that "tomorrow" may not be me supporting the project, errors can happen, and the code must be prepared to deal with them.

Taking simple examples:

  • Parameters of a function

    function consultaGrupos($dbh, $limit=0) {
    
      if (intval($limit)>0) {
        // ...
      }
    
      //...
    }
    

    The function accepts two parameters, a database connection object and an integer indicating that it is intended to limit the query in X records.

    The whole is already by default 0, but when writing the verification of the existence of a value >=1 I end up verifying that the conversion of the received value to integer is greater than 0 ensuring that it does not come for example batatas.

  • A value in a matrix

    $moduleTitle = $this->vocabulary["text"][1];
    
    if (empty($moduleTitle)) {
      $moduleTitle = 'Um valor padrão...';
    }
    

    The input in the matrix that contains the vocabulary exists and by the "specification" must be filled in, but for some bizarre reason the same may perhaps come empty, so we verify this.

  • A property of an object

    Code normal:

    $dbObj = consultaProduto($dbh, 1);
    
    if (!is_object($dbObj)) {
      // ups... não foi possível obter o produto #1
    }
    

    Code extra validated:

    $dbObj = consultaProduto($dbh, 1);
    
    if (!is_object($dbObj)) {
      // ups... não foi possível obter o produto #1
    }
    else if (!property_exists($bdObj,'url') || empty($bdObj->url)) {
      // ups... não foi possível apurar o URL do produto #1
    }
    

    The database query returns an object, in this example one of the properties of that name object url always exists. However, we not only check whether we have obtained a result in the query performed but also check whether the property actually exists.

    Here, as in the example above, there is also the verification that the value of url is or is not empty.

Question

I will be being too protective to validate something that is already assumed and contains a value by default, or in fact I should not trust and validate the received values?

  • Would this question be the same as this one? http://answall.com/q/42597/14584. My vision: from the non-validation (primary option) and validate when there is a clear need and a well-defined expectation on the outcome of the validation.

  • @Caffé They are really similar, although in what you refer I do not see an answer to what I ask... My example is a function, but the question aims to get why of validating something that is supposedly already defined and contains a fallback! Not limited to the example given. If relevant, I can edit and apply other examples.

  • Unfortunately, with PHP you don’t have return typing so if the X feature depends on something returned by the Y feature, then yes, you should validate even if you are the team’s most meticulous programmer. Now, in the case of a resource coming from a database, you can popular a predefined Model with the data returned, this way it will always exist and the correct way (otherwise it would not have been inserted, since it would pass the same Model).

  • @Brunoaugusto Good point of view, I invite you to leave an answer with the same ;)

  • Unfortunately I wouldn’t risk it because my knowledge of database is quite restricted and my incursions with ORM and related Design Patterns despite ready and functional may have conceptual misconceptions that could be harmful to the topic.

3 answers

7


A definitive answer is not possible. There are controversies about what is best.

I will repeat what I always say and the AP knows it. The most important thing is to standardize what the team does and always follow the same pattern. What you can’t do is spend every hour doing something different. Creating the expectation that something is in the code and then not being is not good.

Valor default

I don’t see a reason to differentiate whether the parameter has a default value (by default) or not. After all if the function is called without passing an argument to this parameter it is certain that its value will be correct. But if a value comes? Who guarantees that this value is correct?

I see any parameter in the same way. At this point I think there is no discussion.

Verify results

It depends a little on what will be done next, but in general, I think the ideal is to check if everything is ok with the result. Because if you’re not, you might have an unexpected result.

The question says that there is a specification saying that it cannot have an empty value. Everything that is specified must go to the code somehow.

Unless you have a clear and good reason not to.

The closer the specification, the better. And the more accurate, the more localized the error identification, the better too.

The big question is whether it is possible to generate an unexpected and mostly unwanted result if the information is not exactly what you think you should get.

In robust code it is paramount that you do nothing with dubious data.

Consider even analyzing whether this verification cannot be done within the function that is calling and generating the result in question. It is not only the parameters that we should check, the function return can also be checked before closing the function and take a more appropriate action when we know it will return something that will not be useful.

Of course, placing this verification within the function implies that it becomes more specific. It may be that you have both uses. Still consider having two versions of the function, one that allows, for example, returning empty and the other that prevents this return.

Check arguments before passing them

It is useful to better inform where the error is. In general it aggregates little information to do this check before. In theory the function should indicate well that there was a problem in your call.

But there is an exception where validation is fundamental. There are cases where an argument can be valid for that parameter (the same goes for an index of a array which is still an argument) but in that specific situation a certain value at that time may not be adequate, it may bring unexpected results if it is used. It is something that only the programmer there at the time knows that there can be problem in using that value.

It is obvious that it is a case to check before whether it is within the need.

Some may say that in the background this is a variation of the previous case where the value is already checked as soon as it is received as return of a function (or any operation that returns some value).

What are you doing?

It depends a little on the type of code being produced.

I’ll rule out a script simple of probable short duration, according to the description of the question. I will also consider only the premises posed in the question. I will disregard the case of the programmer being unique in the project (which I would respond differently).

Library

Are you making a generic library that will be used by other people? It is almost certain that it should check and give the best possible information for the function user to know what he did wrong.

I cannot see any other way to ensure the use of the library function correctly without doing so. Especially in PHP.

There is a performance cost in checking, but it is usually minimal. It hardly ever makes a difference, and if you do it it’s probably best to do this in another language and expose it to PHP. So I won’t consider performance an argument for not placing such an important check.

Application code

In a certain way codes that will be used by several people in extended time, even if it is application-specific code, does not fail to behave as a library code.

So the decision whether to validate or not goes through the culture of the people involved in the project.

If you cannot guarantee who these people will be or know that they are not systematic and/or do not usually use other forms of code verification, the most obvious answer is to check.

You can only be sure that the function will be used correctly if you do the check.

External testing

Today there is the idea that the verification should be done by external tests. It’s a good idea, but this only ensures verification if someone does the test correctly in the consumer code.

Testing a function in isolation can only test what is inside it, what it has control over. Then it is possible to test what she does and the value default that it receives (you have control over this value).

Function tests can only verify simulated situations of use of it. It is possible to test the behavior of the function in certain situations, in certain values received.

There is the understanding that you will rarely test 100% of situations. This alone should question whether the test alone is an effective solution.

But let’s consider that the test covered 100% of the situations. This just shows what the function will perform in case of misuse. It does not prevent misuse.

To ensure that you have no misuse you will have to test the use of the function. And these tests should be done in the function consumer code based on its documentation.

There will be discipline and total understanding so that this is always done and obtain a satisfactory result?

Even if it’s possible, it’s worth it?

Test all consumer codes takes a lot more work test only once in the producer code of the desired result.

What’s more, it is guaranteed that it will always run correctly (provided it has been written correctly). Tests don’t guarantee anything.

Of course perfect tests can guarantee, but who guarantees that they are perfect?

At the moment it seems to me that using only the tests a little naive and not at all productive.

In every function it is necessary to check the parameters?

In I think it is a matter of consistency. But I understand who thinks that some cases are unnecessary.

If the parameter immediately serves as an argument for another function that will perform a check it probably won’t bring any more problems. It is a case where the verification code saving can be justified.

Of course you may end up in a huge sequence of functions that do not check parameters by delegating to the next called function.

The only major drawback is that the error will occur at a higher level of the call stack indicating that the error occurred at a location that may be far from the true source of the problem by forcing the programmer to precorrect all the stack trace and looking at documentation of all functions of the call chain to find where the error is.

With checking on all functions you will always have the information at the real origin of the problem simplifying manual debugging (on the olhômetro) or through a tool, saving the function user time.

Although the functions of PHP are documented programmers keep passing wrong data to them. Imagine if you didn’t have any checks and he took on some kind of value, how much worse it would get. In fact some functions do this and is one of the most criticized things in PHP.

Let’s say you put an expressive name in the parameter to show what you can and cannot pass (not that this guarantees anything). What name would you choose to indicate that it needs to be a numerical value that cannot be negative? What if it is a stricter range of values? What if you have some tracks? The parameter name should contain information about its rule? What if you change the rule? This would be the worst case of Hungarian notation I have ever seen.

Of course, you can rely on the written documentation and it will be followed. It doesn’t give robustness, it just doesn’t. You can trust or not, this is your decision and obviously you don’t want anyone to decide for you. Here is just some information to help you decide.

What to do

A function must prevent its operation or solve a problem whenever a parameter is unsuitable for use.

What can go wrong?

If you don’t have a check, you don’t have a perfect test that detects a past argument wrong, the language doesn’t have a way to verify everything that’s needed and the consumer code calls the function with wrong argument, it will run wrong, possibly producing an unexpected result.

The "possibly" may seem like an attenuator but is actually an aggravating one. It can pass the false sense of reliability in some situations.

Luckily in some cases the problem will be so serious and so constant that the programmer will soon realize the error.

You want to count on luck in your code?

Still he may not have the best possible information about the problem.

But what often happens is that the function performs normally without presenting apparent errors. It only gives a different result than expected. And it can stay that way, without anyone noticing for years, causing damage.

The same goes for the continuity of codes that depend on function returns.

I haven’t seen the rest of the code in the posted examples but most likely some subtle error will happen if I withdraw the checks. And probably you and the people who were messing with it will have a considerable waste of time to figure out what’s going on.

Again, if you’re lucky the mistake will be catastrophic right away.

If you need to take that check, what to do?

If it is essential that the verification is removed from the code, probably by performance, document very well how the function should be used and how to test its use. If possible create a tool to help test it.

In this case you can delegate the check to the consumer code where the check is done at a point that does not affect performance.

Completion

Particularly, in the situation described (do for future programmers) it is not exaggeration to make every kind of check possible. In doubt, check. It is never useless to prevent the programmer from using something wrong.

A good recommendation would be: if you can prove that a bad data will not cause any problem, do not check.

In this situation one of the greatest dangers is to think that programmers will use everything right always and will not make mistakes that will not be easily detected. Right here at Sopt we see every day programmers make silly mistakes and fail to find the problem often because the code doesn’t make it obvious. Conventions don’t help to find mistakes.

Haven’t you learned that you should always validate what the user does? However absurd the data may be, it can always happen. Well, we are talking about validating what a user is doing. In this case the user is the programmer who uses the code you made. Why should we treat programmers as superhumans who don’t make mistakes?

In PHP any parameter can see anything, no matter if it has a value default or not. Then you need to check what comes.

That’s what I usually say in several of my answers. The rule serves to guide, but there are many rules. When applying the correct one is not always easy to know. It takes experience and especially not to cling tooth and nail to one of them. Understand the context before applying a rule. Don’t blindly believe what you read in a book because it doesn’t know its context.

Other languages

Only in languages that have more control in compiling on the content that can be used during the consumption of the function is that it gives to relax some checks but not all. Some languages may avoid the need to check types or if a value is null. More sophisticated ones may even have a signature that can be used at compile time to analyze the value characteristic and decide whether it is acceptable. But note that in the background you will write a check as well even if the syntax simplifies this.

When to avoid

You can avoid checking in cases where you are sure (really) that wrong data will produce proper result as well. There are cases, for example, that returning a void has an adequate semantics. The rest of the code will know how to behave with this, it may even have a desirable side effect.

Another possibility is wanting to take the risk. Trusting that everyone will read the documentation, will make all the tests perfect and when it does not occur will not cause a big problem. It’s not a recommendation, just a statement.

And of course if the situation is different, if you have complete control over the application, if the code is not made for other people to use or maintain, then you can relax a little more. The problem is knowing how much can be done without compromising negatively.

Additional reference

I talk more about it in that reply. Especially up type checking where it is usually less necessary to check in some languages. There is a tendency of typos causing a very visible problem. Of course language with casting ill-thought automatic increases risk.

  • 2

    I keep reading a lot of text approving the generalized validation of parameter types and even with so much text does not appear the basics: what are the benefits of generalized validation? You can give some examples of what might happen without the generalized validation and how you’d be better off with it?

  • 1

    I understand the pro-validation recommendation, but reading your two answers I still do not understand if you are favorable to always validate the guys of the arguments (in the case of these questions, in PHP and JS). Because if it’s forever validating type in these languages, it wouldn’t be better to go at once to a language with more rigid typing?

  • 1

    Types I think it’s not always interesting to do. But it depends on what you’re doing. If it’s a library I think you should always do yes. It would not be better to "opt" for a static language of all situations for at least two reasons: 1) as you will opt for static language to run on browser? If you only know one language or can only use that one, you have no choice, most people do not choose PHP 2) Static language does not allow several flexible constructions that a dynamic allows, it is not just a matter of checking type in compilation or not.

  • But you’re a little right. The point is that dynamic languages are used when the programmer has control over everything he is doing, has a very cohesive team, there is a lot of discipline in programming or when robustness is not required. But if you want something robust, you have to check everything. Like I said, in some cases you can even leave it to the last function called check the guy. It does not give accurate information about the error but does not cause major problems. Your case is different from the question here. You have control over all software.

2

We need more meaningful code instead of more code

When the code is very expressive, it’s easier for programmers to know what to expect from it. This lessens the fear that leads us to add more and more code in search of robustness - which is usually a thankless task because as much as we validate, we will never validate enough.

When the C# (as we know, a language that was born as statically typed) introduced the var there was a wide discussion - for many, when used to declare unnamed types this would decrease the readability of the code. An example of one of these argumentators was that

List<ContaaPagar> contasVencidas = contasaPagar.ObtemContasVencidas();

was more expressive than:

var contasVencidas = contasaPagar.ObtemContasVencidas();

Then a counter-argument questioned: what else can be returned by contasaPagar.ObtemContasVencidas() than a list of payables? A list of potato bags? And, in the context of payables, what else can reside in a variable counts in addition to a list of overdue payables?

Of course, if the first argumentator had shown an example like this:

List<ContaaPagar> cts = ctPg.getVenc();

everyone would agree that, in this case, make explicit the type of the variable falls very well (just doesn’t fall better than refactoring this ugly code).

This was just an example, brought from a language that was born strongly and statically typed, to demonstrate that the most important thing is that the our code be expressive.

I will go into each of the examples in the question.

Example 1 - Validate parameter type

function consultaGrupos($dbh, $limit=0) {

    // se $limit contém outra coisa que não um inteiro,
    //  lança uma exceção.
}

The signature of this function, depending on the context and culture of the project, until it is already well expressive. But just to ensure, we will rewrite it:

function obtemGrupos($contextoDb, $qtdMaximaResultados=0)

What else can this function return and what else can I pass on to it beyond what has already been left obvious in the signature? What are the chances of someone passing "potatoes" in the second parameter? It is worth throwing another exception besides the standard exception that PHP will release when it tries to use "potatoes" as if it were an integer?

Of course I’m considering that "Group" is a very well defined and unambiguous business term in context.

Example 2 - Validate the existence of an item in a dictionary

$moduleTitle = $this->vocabulary["text"][1];

if (empty($moduleTitle)) {
  $moduleTitle = 'Um valor padrão...';
}

If I understood the context, this is not just a validation - it is a business rule that admits that not all the necessary items can be present, and that if they are not the application continues to work with a default value. Of course, you would probably want to encapsulate this in a function so that you can reuse this check instead of repeating the code every time. It would look something like this:

obtemPrimeiroSinonimo($texto) {

    $primeiroSinonimo = $this->vocabulario[$texto][1];
    
    if (empty($primeiroSinonimo)) {
        $primeiroSinonimo = 'Um valor padrão...';
    }
}

See that once again I considered that there is some knowledge of the context on the part of the programmer, and that the code expresses this context well so that the code itself helps in the programmer’s learning.

Example 3 - Validate a property of an object received from a function

$dbObj = consultaProduto($dbh, 1);

if (!is_object($dbObj)) {
  // ups... não foi possível obter o produto #1
}
else if (!property_exists($bdObj,'url') || empty($bdObj->url)) {
  // ups... não foi possível apurar o URL do produto #1
}

This second validation is correct depending on the modeling of the solution or the system as a whole. An example of when it is correct: url is not a mandatory attribute of the product, this query happens at the time of trying to create a link to it, and it is expected by the business that a product without a url simply remains without a link in this particular view. Note then that it is not a validation only, but a flow predicted by the business, and what happened is not a "could not ascertain" but rather a "this product simply has no url and is a right of it, so I checked".

The first validation is also correct if an automated test (I went a little further now) demonstrates that the expected behavior of consultation is to return "nothing" if there is no product available. But once again, this is not a "could not get" but a "there is no product here for you". Finally, it would be better, instead of returning "nothing", to return an empty list since the method signature suggests a list as return.

What is not correct here is the expressiveness of the code: dbObj is hardly a business term and this name has no value to a human.

Completion

  • We write validations to handle exceptions provided by the business. This is the best reason to write a validation.

  • We write validations when we can’t be clear enough in the API of our framework or library and we know that consumers, because they don’t understand how to use them, will use them wrong. Then the validation error message explains what we failed to leave auto-explained by the API.

  • We write validations when we don’t have automated tests to validate and explain the less obvious behaviors of our system.

  • We do not write generic validations without understanding the reason very well, that is, without understanding what could happen badly without validation and how it is helping to prevent the problem. Why don’t we do that? The main reason is to avoid the waste of writing potentially useless code that can even get in the way.

0

In my opinion, regardless if there is already a default value defined or not for the parameter in the function exposed, I see its validation as something good. In fact, almost excellent, since we are talking about a programming language in which we do not care about types of variables. Anyway, I don’t see this as too much, but with a great concern.

  • 2

    And what are the benefits of this validation? I mean, can you list some positive practical effects?

  • The benefit is exactly what friend @bigown said above. PHP is a weakly typed link, meaning that in function parameters absolutely anything can come. That is, the biggest benefit is that you make sure your code works the way you expect it to. In addition, precisely by maintaining these validations, other programmers will be able to keep their code more coherent, after all, it is known that a certain method or function will only work in such a way. Finally, more validations, fewer exceptions.

  • 1

    The problem is @bigown hasn’t listed any benefits yet either. How widespread validation ensures that "your code works as you expect it to work"?

  • The benefit is exactly that, make sure your code works the way you expect it to. In the example you expect the value of the parameter to be an integer, if it is not integer, it does not enter the condition and does nothing. Is it working the way you expect it to? No. It’s a generalized validation, but it supports you and other programmers. Like I said, more validations less exceptions.

  • 2

    I improved the question with more examples so that we are not limited to the given example of a function, but rather concentrated on the question itself and the reason why we should validate even though we know that "from behind" already comes something by default.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.