Storing representative values of a user’s 'sex' in databases

Asked

Viewed 3,406 times

10

I have some doubts about the representation of some fields in the database. How should I store fields such as "sex" in tables in the database? In the literal form "male" and "female" or through numbers representing each sex as "1" and "2", respectively? This doubt extends to other fields, for example, there is a need to store the weight classification of an object in "very light", "light", "medium", "heavy" and "very heavy". I should store a representative number of each category?

  • 2

    It depends on how you use it in the application. I always create a table of type Sex:[id, name]. So I have it represented with both number and string.

  • @Earendul this id would be given by me or this is an id type INTEGER NOT NULL AUTO_INCREMENT used to be primary key

  • @Earendul I think that your comment is worth an answer. : ) Because, storing directly in literal form generates difficulties with localization (presentation in different languages).

  • 1

    @Ricardohenrique Since there are probably only 2 records (’M' and 'F') you can define the number yourself, and it can be a primary key since there will never be two equal numbers and you can relate to other tables

  • Creating a table to let eternally with 2 records just wouldn’t be a bit pointless ? Isn’t it more practical, easy, functional to establish this via code ? Constant or Enum ???

  • You will spend a table on your DBMS, you will have to query (selects and joins), process the query and the result just to prove something you already know ? After all it was you who populated the table and it is not changed... PHP does not have the Enum structure, but we can easily implement it this way: http://stackoverflow.com/questions/254514/php-and-enumerations

Show 1 more comment

3 answers

8


Depends on each case.

What I never recommend, but there may be exceptions, is to use the full description. It will not be wrong to use the description if you have a good reason for it. If you do not use the description and there is one that should be used in some situations you will have to treat it in some way.

In many cases a code is enough. Even by external force, legislation for example, it may be that the correct is to put 1 or 2 and that is all that matters. In another you should put M or F and need no description other than this one. Then you should only use this. That is, the column is already self descriptive in this way.

But you may need an extra description. There are three solutions:

  1. You leave it to one stored Procedure solve this or for the application to treat since this description does not usually change. Many people will criticize this option because the database is not being used properly. I agree with the criticism, but I make the caveat that there may be reason, especially to let the application handle this when the description may change depending on the execution context. Probably won’t occur with sex but can occur with other types.
  2. Create an auxiliary table with the descriptions and make a relationship, a very common solution. It is highly recommended. Contrary to what many forget, it is still necessary another point of the database or application to know how to manipulate this information and properly if the interest is to receive the description of the data. Of course in this case a 1 or 2 will be a ID for the table referenced. This ID usually a number but it doesn’t have to be, it can be a character and use the M or F, seems a good natural key.
  3. Create an enumeration if the system used allows it. Some criticize its use because the resource may have limitations. But it can solve well in most cases. At the bottom is a standardized form of the database to use an auxiliary table. You will still have to do something in data recovery to get the description.

As I said, I find these forms more organized and flexible, especially number 2, so it’s the normalized form. But if you want to facilitate the consultation, give a little more performance, then play the direct description is the solution, so you already get what you want without using artifices in query.

But keep in mind that this could cause some headaches if you have any reason to change these descriptions. Nothing that cannot be solved in cases of very simple data as described. In more complex data you may never be able to disambiguate if there is a change. Don’t do this with city names, for example.

It just won’t use boolean. I’ve seen this when there were only two values and it’s wrong.

Has a ISO on the use of sex.

1

In the case of fixed information, as is the case. There are only 3 possibilities:

  • Masculine
  • Feminine
  • Undefined (in case your system allows the person not to define sex).

Therefore, this data is fixed and does not need a table in your database, this would be a complete waste of resources.

Create a class with constants and you’re done.

<?php
abstract class Sexo
{
    const Indefinido = 0;
    const Masculino = 1;
    const Feminino = 2;

   public getStringValue($value)
   {
       switch($value)
       {
           case self::Indefindo:
               return "Indefinido";
           case self::Masculino:
               return "Masculino";
           case self::Feminino:
               return "Feminino";
           default:
               return "Valor não é válido";
       }
   }
}
?>

When you need to use just do the following:

<?php
    $meuSexo = Sexo::Masculino;
    //var_dump($meuSexo) => 1

    $string = Sexo.getStringValue($meuSexo);
    //var_dump($string) => Masculino
?>
  • I believe that there are currently much more than just 3 possibilities, see the range of options on facebook.

-1

In order to curiosity:

In the case of weight of an object in "very light", "light", "medium", "heavy" and "very heavy". We can use some statistical methods to define a range.

So if we have 50 products, with the following weight listing:

1, 1, 3, 4, 4, 5, 5, 6, 6,
7, 7, 7, 7, 7, 8, 8, 8, 9, 9,
10, 10, 10, 11, 15, 16, 16, 16, 17,
18, 18, 18, 18 ,18, 40, 40, 40, 40, 40,
47, 47, 47, 49, 49, 50, 55, 58, 59, 90.

We have the highest value as 90 and the lowest as 1, so our sampling amplitude is AA = 89 (90 - 1).

Then we calculate i, which theoretically would be the amount of clusters we need to have: i = 1 + 3.3 log 50 = 6.6 ~ 7 (normal rounding).

Therefore the ideal would be to have 7 weight ratings, but for this example we will use the 5 already classified by you, so consider i = 5.

We now need to calculate what the intervals will be
AA / i = 89/5 = 17,8 ~ 18 (always rounded up).

Very Light: 1 |- 19
Light: 19 |- 37
Medium: 37 |- 55
Heavy: 55 |- 73
Very Heavy: 73 |- 91

<?php
abstract class Peso
{
   public getStringValue($value)
   {
       if ($value < 19){
          return "Muito Leve";
       }elseif($value < 37){
          return "Leve";
       }elseif($value < 55){
          return "Médio";
       }elseif($value < 73){
          return "Pesado";
       }else{
          return "Muito Pesado";
       }
   }
}
?>

Browser other questions tagged

You are not signed in. Login or sign up in order to post.