Before you begin, it is important to eliminate a serious problem of misinformation. That’s why it was necessary to add an introduction about setting up environment logs.
It is common for many inexperienced PHP programmers to rescue variables directly, without checking if they exist:
$foo = $_POST['foo'];
The general justification is that it "works" and does not cause error.
Sooner or later, they end up discovering that in certain environments it works and in others it does not.
Some environments display error message and others hide, which makes it even more difficult because an inexperienced does not know how to look at the internal logs or even configure them for better control. Nor do they verify that the environment is properly configured.
PHP allows you to hide or ignore error messages if they are not of the "fatal error" type. However, undefined variables do not cause "fatal error" and so, apparently, there are no problems in accessing directly $foo = $_POST['foo'];
.
That’s where the problem lies.
Even if the error is not displayed, the error occurs internally.
First of all, the PHP programmer should properly configure the environment where the application will run.
For development environment, the famous "localhost" is recommended to activate all error messages and warnings, including strict warnings and on disused syntax and functions.
Configuration of the error log
Below is a practical and simple example to configure the error log environment and warnings:
<?php
/**
A constante E_ALL indica que todos os tipos de erros e avisos serão reportados.
*/
ini_set('error_reporting', E_ALL);
/**
true: ativa o log de erros. Permite o uso de 'error_log'. Veja mais abaixo.
false: desativa
*/
ini_set('log_errors',true);
/**
Define o local onde os logs serão registrados.
Recomenda-se o uso de path absoluto.
Exemplo, evite referenciar o caminho para paths relativos como "../folder/log.txt". Defina com um patrh absoluto "/path/absolute/folder/log.txt".
Alguns provedores de hospedagem compartilhada não permitem configurar essa diretiva em tempo de execução.
Para esses casos, deve-se consultar o manual ou suporte do provedor de hospedagem sobre como ler logs de erro.
*/
ini_set('error_log', BASE_DIR . DIRECTORY_SEPARATOR . 'logs' . DS . 'php' . DIRECTORY_SEPARATOR . 'PHP_errors-' . date('Ym') . '.log');
/**
Normalmente não usamos esse recurso em ambiente de desenvolvimento e tampouco em ambiente de produção.
Recomenda-se manter como "false" por questões de performance.
true: ativa
false: desativa
*/
ini_set('html_errors',false);
/**
true: ativa a exibição de erros e avisos
false: desativa
Em ambiente de produção, mantenha como "false". Em ambiente local, como "true".
Note que "display_errors" pode ser desativado pelo provedor de hospedagem. Normalmente, provedores de hospedagem compartilhada configuram como "false" por padrão e impedem a configuração em tempo de execução.
Para esses casos, deve-se consultar o manual ou suporte do provedor de hospedagem sobre como ler logs de erro.
*/
ini_set('display_errors',true);
This above code should be in the bootstrap of the application, that is, an initial file where everything that should have priority of execution, should always be on top of all other scripts and always present in any calls of scripts.
Having clarified this point about configuration of error logs, let’s return to the subject about $_GET and $_POST variables.
Whether the environment is well or poorly configured, just adopt a secure standard that ensures the running integrity of your application.
To do this, adopt good programming practices.
Instead of accessing directly $foo = $_POST['foo'];
, always check if the index exists.
if (isset($_POST['foo']))
echo $_POST['foo'];
An erroneous and very common practice is to apply the function empty()
instead of the function isset()
:
if (!empty($_POST['foo']))
echo $_POST['foo'];
This is bad practice because the same problem described above occurs where we access directly without checking if the object exists.
In poorly configured environments or whose configuration hides error messages, it gives the impression that there is no problem in using this way.
Another common error is execution logic error:
$foo = $_POST['foo'];
if (isset($foo))
echo $foo;
If the index foo
of the global variable $_POST
is non-existent, will return the error undefined index
.
Once again, we return to the problem of a poorly configured environment. In environments that hide this level of error, gives a false impression that there was no error.
In this case the error is quite obvious. PHP will run this line $foo = $_POST['foo'];
and regardless of what is in the following lines, the error already triggers right here. Therefore, it makes no sense to write this way. This is not done in PHP and not even in other languages because it is illogical.
Think of it as closing a lock and leaving the key in the lock. It is illogical and often lacking attention.
Sanitization, Filtration and MVC
It is common confusion that they make between "check if a $_POST exists" and "filter the received data".
The confusion is due to the fact that most beginners do not adopt a methodology such as MVC where layers and responsibilities are well defined.
To avoid misunderstandings and unnecessary discussions it is important to point out that we will not talk here about the various variants of MVC because this is not the focus. Let’s use the term MVC as something generic.
The MVC letter M refers to the word "model" which represents the "business model". Translating, it is the "business model".
A solid and well built platform allows the free modeling of the "business model".
Thinking about what has emerged in the market the frameworks we know as Cakephp, Symfony, Zendfw, Laravel, Codeigniter and hundreds more.
Using a framework, we don’t need to worry about the basic functions of an application’s engine. Things like redeeming a $_GET and $_POST are hidden by the mechanism of the framework. This mechanism is part of the C letter of the MVC, representing the "Controller".
When making a request to receive a $_GET or $_POST parameter, the controller (Controller) does not see the business model. He just rescues the requisition.
That’s what you use isset()
or array_key_exists()
or some other equivalent technique to avoid triggering errors.
The controller has no responsibility to sanitize or filter, in short, has no responsibility even to determine whether a non-existent request is valid or not. The validation is in charge of the "Model".
Once the controller has rescued the global variables, the "Model" accesses them and applies the filters according to the previously defined business rules.
If a variable returned as " 123abc " and the business model allowed spaces, there is no reason to remove the spaces at the time of redemption within the "Controller". Therefore, it is considered bad practice to apply filtering "indiscriminately" at the time of rescue.
To understand this in practice:
if (isset($_POST['foo']))
$foo = trim($_POST['foo']);
if (empty($foo))
echo 'error: foo is empty';
The controller should never make such a decision, because it is not the responsibility of this layer to sanitize or apply validations.
Also a common error in assigning responsibility between MVC layers.
if (!isset($_POST['foo']))
echo 'erro: foo não existe';
For the controller, whether a request exists or not is irrelevant. Who should validate and decide is the "Model".
A good part of the problems of bad practices related to the subject can be eliminated by the use of MVC, OOP or simply good programming practices. Even in scripts written in procedural form, without the use of MVC, it is possible to organize the codes in generic functions that can be reused for several business models.
Specific issue concerning validations
Text only: Name, profession, etc;
Only whole numbers and nothing else: House number, days, age, etc;
Dotted and comma numbers: percentages and cash values, etc;
Numbers with / and -: Dates;
As mentioned above, this is the responsibility of the "Model".
There is no specific universal "standard" because it depends on the business model.
Let’s take as an example, a field to fill name.
There are business models that allow names that include numbers. There are also those who have a single field where the user should or should not include name and surname. There are other templates where the name must have only 1 name and 1 last name.
So there’s no specific answer to the question you asked:
But what’s the right way to do that. In this case, if you get a name, what’s the right way, is using isset + Trim + Empty the best way? > > Anything else(!)?
Let’s take another example from the list above. Date values.
It’s the same situation. It depends on the business model. Does the model allow full dates? It allows you to write only the day and month, or the year and month (credit card). Can you enter with dates in the past? What is the limit? 10,000 years ago? 1 million years ago? Allow future dates? 50 million years in the future? What format allowed? The ISO 8601 standard only or a specific business model format?
Will it be converted to timestamp? Note that timestamp has limits.
Normally the user should not be allowed to enter the date freely. It is more secure that it comes from select fields.
What about monetary values?
It’s easy when the currency isn’t decimal, right?
No! Wrong! Even full-value coins should be treated as decimals, however, it depends heavily on the business model. Although that business model that treats non-decimal currency as whole, it is a flawed model.
And when treating monetary values in decimal format, what is the ideal size of decimal places? Two houses? Seven houses?
One should also observe the difference between the actual value and the formatted value for user-friendly viewing and one should not overwrite the other as we are talking about monetary values. A small difference of $0.0099 makes a significant difference in banking systems where it moves trillions per week. If the system does not allow 4 decimal places, the value would be 0.00, that is, someone will lose money.. rsrs
Even if it’s not a trillion-moving financial institution, it doesn’t matter. The system should be as solid, as reliable as possible. Righteous.
All this is defined in the business model and should in no way be handled in the controller.
This also responds to the use of sanitation functions such as the filter_input()
.
functions as trim()
, filter_input()
, strlen()
, mb_strlen()
, empty()
, among others, in order to filter the data received, they should be applied in the "Model" layer. It should never be used instead to verify the existence of an index, which is the function of the isset()
.
Requests to expected values
Basically, as for input, I get two types of data:
predefined by the system: e.g.: select, radio; entered by
user: text, textarea, dates; so I know for example that in the first
case, I can accept to receive the data, and use in the script, only if it
is exactly as I hope.
This is a form of "validating" that dispenses with any function, even the
isset? Because whether it is yes or no, of course it is set (or not? :-)
)...
if ($radio === "sim" || "não") {
//faça algo
}
else {
// erro
}
When you request a value that is expected, obviously you should also check whether the parameter exists within the global variables.
You should also validate the value received because you should never trust user entries. Even in radio or select fields, the user can manipulate the value.
This form presented is safe, but there are 2 errors where one is serious because it returns wrong result:
$nome = isset($_POST['camporadio']) ? $_POST['camporadio'] : false;
if ($radio === "sim" || "não") {
//faça algo
}
else {
// erro
}
The condition if ($radio === "sim" || "não") {
should be if ($radio == "sim" || $radio == "não") {
.
The presented form would cause logic error: if ($radio === "sim" || "não") {
.
Another mistake is that $radio
was not defined because who receives the $_POST
is another variable defined as $nome
.
But I believe it was mere distraction while typing the question.
Final remark, I thought it best not to mention the old "Register globals". This feature bothered a lot for about 10 years but was removed in PHP 5.4 after many years in mode DEPRECATED
. Nowadays it is "rare" to find someone who uses so I found it irrelevant to talk about the resource. For references: http://php.net/manual/en/security.globals.php
[Text added on 2015-10-05]
Difference between sanitization and validation
It is important to understand the difference between sanitization and validation.
In sanitization the unnecessary characters are removed automatically, without triggering error messages or warning if the final format goes through validation.
Sanitization always precedes validation.
Below, common examples of sanitization:
The user enters the name in lowercase letters or without a default, however, the system must convert to uppercase letters because the business model requires it:
José Maria -> JOSÉ MARIA
jose maria -> JOSÉ MARIA
jose Maria -> JOSÉ MARIA
User enters data containing spaces at the beginning or end. Then the system removes unnecessary characters automatically, without triggering error or warning messages
" foo" -> "foo"
" foo " -> "foo"
"foo " -> "foo"
User enters letters in a numeric field. Then the system removes and keeps only numbers.
Finally, the situations are diverse and depend on the business model.
Sanitization makes usability more user-friendly and also saves server resources, data transfer and time to both the client and the server.
If we treat everything directly in the validation, without sanitization, the process becomes more bureaucratic.
Validation comes after the sanitization process. This is the last step to accept or deny a certain data.
Let’s see in practice a specific example of numerical input validation?
Here we have a function that filters a string and returns only numeric characters.
function NumberOnly($str, $float = false)
{
if (!is_array($str))
{
$a = '';
if ($float)
{
$a = '.';
$str = str_replace( ',', $a, $str );
}
return preg_replace('#[^0-9'.$a.']#', '', mb_convert_kana($str, 'n'));
}
return '';
}
Typically, PHP programmers from Latin or Anglo-Saxon speaking countries do not care about Eastern languages like Arabic, Indian, Chinese, etc.
The above function supports Japanese numbers in the "zenkaku format". https://en.wikipedia.org/wiki/Language_input_keys
The function mb_convert_kana()
converts zenkaku characters to hankaku format.
The example below illustrates the difference between formats:
zenkaku: 123
hankaku: 123
zenkaku: ABC
hankaku: ABC
If the function does not convert to "hankaku" format, "legitimate" numeric character entries would be removed, causing inconvenience to the user.
Many systems, even Japanese websites, especially the old ones, do not sanitize and explicitly ask the user to modify the format by the keyboard functions.
This is really uncomfortable and creates a bureaucracy hindering accessibility.
The second parameter is only a flag where "true" indicates that you must allow the dot character (.) when you want to allow decimal numbers.
Note that the function does not validate the number format. You can enter an invalid decimal number like 123.4.5.6, but this is not the responsibility of sanitization. This is the responsibility of validation.
The Numberonly() function so far fulfills, as fully as possible, with its role of allowing numbers or points, removing all other unnecessary.
Next step is validation.
/**
Imprime 123
*/
echo NumberOnly('abc123');
In the "Model", there must be a specific rule for this entry. With illustrative purpose, let’s assume that the rule says that the number must have a minimum of 4 digits and a maximum of 5.
In this case, the validation would return error warning to the user because the sanitized input returns only 3 digits.
Related: Get external variable isset vs filter_input
– rray
@rray this topic looks really good, I didn’t know... to seeing here, thanks! (in time: I don’t think it’s duplicate)...
– gustavox
I don’t think it’s duplicate, if the issue isn’t too extensive it can include, how to treat a field that gets html, for example those text editors like tinymce
– rray
Ah, so, I’m here hoping not to consider broad, I think it’s best not to include anything else not rsrsr But if you want you can edit the will!
– gustavox
Actually, it’s too complex, and really, it’s going to take some time to write the theoretical part, if I choose to answer, because the answers I have, but it’s kind of hard to explain. The question is actually good.
– Edilson
@Edilson, no hurry, if you want to answer can do it calmly (I just hope they don’t close). I know it’s not that simple... I’ve even taken a -1 rsrs...
– gustavox
I should have +1 for editing, but since there is not take my +1 mental @brasofilo (I was right in the doubt of leaving that message, I was already thinking of taking, but when you edited to take only her, everything was clear ^^)
– gustavox
Cool that you approve :) The technical text is already very extensive, I thought that goal introduction was disturbing. The question appeared to me in the voting analysis to close, as it shows none here, must have been a sign.
– brasofilo
a challenge and so much to be able to explain all this without complicating too much. rsrs I will try to take time to respond. But that will take me about 2 hours, at least. A bigger problem is that we usually vote to close when a question is too broad. This question involves many different subjects, although they are related.
– Daniel Omine
Use the
Respect\Validation
:)– Wallace Maxters
added text on sanitization and validation.
– Daniel Omine