How to correctly choose a data type

Asked

Viewed 104 times

1

Normally I always programmed without paying much attention to it, always used int, float, double and so on, but more recently I saw a person commenting that currently float is a useless type because it is 32 bits and should be used instead double for being 64 bits, reading a book on optimization in C++ found that a split operation can take many cycles

Multiplication and Division take longer time. Integer multiplication takes 11 clock Cycles on Pentium 4 Processors, and 3 - 4 clock Cycles on Most other microprocessors. Integer Division takes 40 - 80 clock Cycles, Depending on the Microprocessor

and that to accelerate this should be used a conversion to unsigned int as an example:

// Example 7.4. Signed and unsigned integers
int a, b;
double c;
b = (unsigned int)a / 10; // Convert to unsigned for fast division
c = a * 2.5; // Use signed when converting to double

So how do I choose a type of die in a "correct way"?

  • 1

    If unsigned does not meet the requirements of the application there is no reason to use it. The choice of data type should take account of the application requirements as a priority. Remember that early optimization is the hand of all evils.

2 answers

3


Giving attention to types is something of utmost importance in programming, especially in C++.

If float it was useless it did not exist, right? It is very useful and I hope it is not yet a C++ programmer, where it is too useful. A double takes up twice as much space and in many cases this makes a huge difference. Moreover depending on the type of platform a double may be slower by having to do two-part operations. Of course double may be faster on some platform, but this doesn’t always happen even when it’s 64 bits. Talking generically about this is useless, what counts is the test proving that it is faster in the situation you need.

If the unsigned int accelerates the split also depends on the platform where it is running, the ideal is to use the most suitable type and almost always the int` is good. Unsigned types are more difficult to understand and have unexpected behaviors in some situations, so one should avoid. It can be used smoothly, there is no such thing as never to be used, it should only be avoided if it is not absolutely necessary (there is issues to be observed).

If an API you use has this type you have to work with this type, you have no choice, at least directly, of course in some cases you can take the data like this and turn into int if desired and useful to do this (there may be a conversion that has a cost).

Only do optimizations of this type if you have a performance problem and know where to move. In addition you should measure to see if you can perform better, you may not be able to, in many cases you will not. And there are other ways to optimize this. Division really is slow, and it has effective techniques to optimize this in many cases.

The right choice is a complete mastery of everything about computing, the language, the compiler you use, the platform you run and experience a lot. There is no magic answer.

The use of auto it is not for you not to think about the type you are using, quite the contrary, it is to say that the type does not matter in that case, which is not always what you want. autoIt is not to save typing or accept any type, it is to say that the type that comes from the expression being assigned in the statement is acceptable, and even if it changes to your code it is all ok. You thought about the type and established that it doesn’t matter.

0

Currently with C++ Modern you can opt for "auto" in most cases. In the case of float and double, double really performs the best operations on 64bit equipment, because the alignment of memory happens naturally and this causes the processor to perform in less cycles. Already unsigned should never be used, there is a historical design problem and the current recommendation is to always use int.

"Unsigned integers are good for Representing bitfields and modular arithmetic. Because of Historical Accident, the C++ standard also uses unsigned integers to represent the size of containers - Many Members of the standards body Believe this to be a mistake, but it is effectively Impossible to fix at this point. The Fact that unsigned arithmetic doesn’t model the behavior of a simple integer, but is Instead defined by the standard to model modular arithmetic (wrapping Around on overflow/underflow), Means that a significant class of bugs cannot be diagnosed by the Compiler. In other cases, the defined behavior prevents Optimization.

That said, Mixing signedness of integer types is Responsible for an equally large class of problems. The best Advice we can provide: Try to use Iterators and containers rather than pointers and Sizes, Try not to mix signedness, and Try to avoid unsigned types (except for presenting bitfields or modular arithmetic). Do not use an unsigned type merely to assert that a variable is non-negative."

Here is an example of how you can test the cycles of your function. Note: The part of the code for calculating the cycles was done by Frederico L. Pissarra

#include <iostream>
#include <cmath>


// Mantém o timestamp inicial.
uint64_t __local_tsc;

#ifdef __x86_64__
#define REGS1 "rbx","rcx"
#define REGS2 "rcx"
#else
#ifdef __i386__
#define REGS1 "ebx","ecx"
#define REGS2 "ecx"
#else
#error cycle counting will work only on x86-64 or i386 platforms!
#endif
#endif

// Macro: Inicia a medição.
// Uso CPUID para serializar o processador aqui.
#define BEGIN_TSC do { \
  uint32_t a, d; \
\
  __asm__ __volatile__ ( \
    "xorl %%eax,%%eax\n" \
    "cpuid\n" \
    "rdtsc\n" \
    : "=a" (a), "=d" (d) :: REGS1 \
  ); \
\
  __local_tsc = ((uint64_t)d << 32) | a; \
} while (0)

// Macro: Finaliza a medição.
// NOTA: A rotina anterior usava armazenamento temporário
//       para guardar a contagem vinda de rdtscp. Ainda,
//       CPUID era usado para serialização (que parece ser inóquo!).
//       Obtive resultados melhores retirando a serialização e
//       devolvendo os constraints para EAX e EDX, salvando apenas ECX.
// PS:   rdtscp também serializa, deixando o CPUID supérfluo!
#define END_TSC(c) do { \
  uint32_t a, d; \
\
  __asm__ __volatile__ ( \
    "rdtscp\n" \
    : "=a" (a), "=d" (d) :: REGS2 \
  ); \
\
  (c) = (((uint64_t)d << 32) | a) - __local_tsc; \
} while (0)

// --------------------------------------------------------------------



template<typename T>
inline constexpr T length(T x, T y) {
  T len = x * x + y * y;
  return T(std::sqrt(len));
}

int main()
{
    const int ciclos = 900000;
    auto out{0.0};

    uint64_t t;
    uint64_t n = 0;

    for(int i = 0; i < ciclos; ++i)
    {
        BEGIN_TSC;
        out = length(3.4, 2.6);
        END_TSC(t);
        n += t;
    }
    std::cout << "Length: " << out << "\n";
    std::cout << "Média de cíclos do processador: " << n / ciclos << "\n";

    return 0;
}
  • This still doesn’t answer the question of how I can figure out which type would be more ideal, for example look at this function of the Vulkan API: vkEnumerateInstanceExtensionProperties, one of the parameters is an unsigned int of 32-Bits, basically it serves to return the number of extensions available, so why not simply use a traditional int instead of using a uint32_t ?

  • When you use an API like Vulkan you should choose the format they determine. If you ask for unsigned int, use this. Their option is probably linked to performance and see that in the case of Vulkan this is related to GPU that works better with 32bit data, because usually the main goal is performance. In the original issue you had not commented on it, so I pre-assumed you were talking about CPU.

  • Well I mean in general

Browser other questions tagged

You are not signed in. Login or sign up in order to post.