A bit mask can/should be typed?

Asked

Viewed 122 times

8

In most programming languages, when you want to create a bit mask you usually use an integer type and operations bitwise (and, or, xor, not, shift left, shift right...). However, although nothing prevents the programmer from assigning a specific value (say, 6: 110) for the mask, constants are usually created to represent each bit and insist - as good practice, and to avoid problems of incompatibility in the future - on using these constants, avoiding the "magic values". This is not usually enforced, however.

There would be some harm in creating an abstract "bit mask" type whose subtypes were particular applications of this technique, and make the compiler force the use of this type? For example, some languages that support enumerations (enums) - like Java - allows you to create methods whose parameters have to be of this type, so that the programmer has no choice but to use its members, even when each of them has one or more [unique] values associated. And an enumeration may or may not be used to implement bit masks, but it also has other purposes[1].

My question is, specifically: is there any use case for bit masks in which the freedom to use integers instead of defined constants brings a significant advantage, and its loss can compromise the expressiveness of the code? I think this is something that only those who have experience working with bit masks can answer, but if someone has some external reference dealing with the subject would also be quite useful. In my limited experience, the main cases of using a bit mask are:

  • Set multiple bits or just a particular bit (or clear a particular bit);
  • Check whether a particular bit (or bit set) is set;
  • Serialize/deserialize (i.e. save the data structure that contains the bit mask in a file or other binary/textual format).

I can’t think of any other.

[1]: By the way, contrary to the premise of that related question, I have good reasons to want to change the bit mask throughout the evolution of products, both their individual values and their set of elements - but always versioning, so as not to break old code. This restricts my particular case, but does not preclude the question (for I remain interested in knowing what is lost when using a specific type for bit mask instead of a "generic integer").

  • 3

    If you have a good reason, you know what you are doing, everything is valid... :) I hope you get very good answers since the question is.

  • I believe that there is only one aspect to the need for typing, to improve the maintainability and readability of the code, because, in computational terms, what matters to the processor is the value of the byte itself, not the name of the variable, etc. The advantage/disadvantage would be only in terms of code maintenance and ease of the API. You could sometimes use Tellisense, make it easier to test, things like that, but I don’t know if it’s a big advantage. If it doesn’t generate a lot of maintenance on that part, good documentation would supply the need for a type

  • The C# language allows you to combine Enums by the |operator. That is, it can be typed. Now if it should?? I don’t know how to answer.

  • 1

    This article on msdn, can give you an idea of the arguments that engineers used to add this resource to the language: link

1 answer

2

How they work:

Bit masks will always have values based on 2, the sequence would be basically: 1, 2, 4, 8... up to the maximum number of bits. Imagine for example if you had to choose your favorite color:

VERMELHO = 1;
VERDE = 2;
AZUL = 4;
TODAS = 7;

Arbitrary values:

You cannot give any value to them as you quoted 6 or 110. The reason is that these values occupy more than 1 bit. The cases you will see values like this are combinations, as in the example, ALL is actually the combination of all the above values.

This is the normal use of bit masks and the Windows does this a lot in Apis to create windows where you will see that WS_POPUPWINDOW is actually the combination of WS_POPUP, WS_BORDER and WS_SYSMENU.

Languages without enumeration support:

In languages where there is no support for enums, I usually create a class or some series of functions that validate the value received, where we assume that I use only 4 bits, any value above 15 would be invalid.

Change values throughout development:

This can happen yes but in this case it is good to remember the consequence: MAYBE you will have to recompile all the programs that depend on your constants.

I say maybe because there will be cases where your functions depend only on 1 or 2 constants that you have not changed.

Must be typed or not?

It’s up to you, let’s assume you want to add constants without editing the old ones:

Free constants (as in the Windows example): Just declare more constants that won’t break old code.

Subclasses (languages that support classes but not enums): You don’t need to recompile old codes but you’ll have to create a subclass every time you need to add a constant if you don’t want to recompile everything.

Enums: In some languages, like C#, you would have to recompile everything that depends on it.

Observing: Of course languages like PHP you wouldn’t have to recompile but still have to update all the servers that depend on your constants, the headache would be similar.

My recommendation is: Avoid bit masks whenever possible, it is easy to introduce bugs in them and if you use some methodology like Test Driven Development you will see that it is also easy to forget to write tests for all combinations of them.

Another problem you may encounter is to try to mix them with database, will make it very difficult to queries and maintenance is horrible, learned the hard way.

Use constants or direct numbers:

Some languages make implicit conversion between numbers and the constants used, the problem in using numbers directly is maintenance of code, is much more difficult for another programmer to understand because it will seem that you are using invented values but just comment your code to communicate the goal.

What I will lose by not being able to use numbers directly?

Converting integers to a group of constants is extremely useful if you want to specify a configuration using only numbers and they are short.

An example of this is changing file permissions on linux:

chmod   000 ---------
chmod   400 r--------
chmod   444 r--r--r--
chmod   600 rw-------
chmod   620 -rw--w----
chmod   640 -rw-r-----
chmod   644 rw-r--r--
chmod   645 -rw-r--r-x
chmod   646 -rw-r--rw-
chmod   650 -rw-r-x---
chmod   660 -rw-rw----
chmod   661 -rw-rw---x
chmod   662 -rw-rw--w-
chmod   663 -rw-rw--wx
chmod   664 -rw-rw-r--
chmod   666 rw-rw-r--
chmod   700 rwx------
chmod   750 rwxr-x---
chmod   755 rwxr-xr-x
chmod   777 rwxrwxrwx
etc...

It treats each number as a 4-bit bitmask and, in this case, if it had to match constants, the command would be considerably larger apart from the fact that remembering 3 numbers is even easier.

  • Thanks for the answer, but I don’t think you understand the question. I’m talking about API. In your example above, if I wanted to mix green with blue, and nothing red, I could use VERDE | AZUL (following the API) or simply use 6 (not following the API, working directly on its internal representation). That’s if the language gives me the option to do both. But if the language doesn’t work, if she just gives me the API and nothing else, what do I miss? This is the X of the question (this is a more conceptual question, without focus on a specific language or specific application).

  • As for amendments, of course if I change the way I represent I would have to recompile everything that uses that representation. Anyway, this is tangential to the question, I quoted only to give context (for more context: I don’t actually intend alter a bit mask type, but create new types without necessarily preserving compatibility with the old ones, and use a adapter for old codes to work in the new API).

  • It is I extended the article beyond the question even in case someone who has never messed with bitmask already has examples. The last topics show better why you should allow using numbers.

  • Yes, this example of UNIX is a valid argument, +1. It seems to me that when it comes to an extremely popular API, "magic numbers" are not a big problem, as everyone knows what they mean. In a darker API, however, the benefit would be less.

  • Yes, but in most cases the staff make constants, so it’s easy to deduce the structure. Because of the doubts always document their functions and future programmers will not have a headache.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.