How to resignify bytes without Undefined behavior?

Asked

Viewed 84 times

2

Details

In Assembly, C, C++, C# with unsafe and other languages it is possible to reinterpret binary code at the address as of different type from the original. Convert type int* for float* in C, that is to say that it points to whole valid 0x3F800000 then there’s also floating point 1.0f.

Although it allows algorithms that require fine-grained control of the bits and even if one expects something obvious from a rereading, it is still considered U.B. (Undefined behavior), that is, undefined behavior, it is not known what compiler/interpreter will do with it.

If I’m not mistaken, almost always convert pointers is considered U.B. and I want to know why. Why do this gives B.O.? For example, reading whole float after all does not give an expected result due to the known formatting that the float has? What might turn out different?

Here Visual Studio optimizes so well that Disassembly even finds constants after conversions of known values. To my knowledge, the compiler can at most

  • use any code among several possible ones that represent the value (like float Nan, which has several binary codes that represent it, then the compiler chooses anyone) and

  • do not keep reading and writing order when optimizing code in more complicated situations (type in arrays traversing indexes instead of working with simple variables).

Other than that, I don’t know and that’s why I don’t understand.

Even for me in the compiler one would avoid the second U.B. in an obvious way: being implemented to maintain the order of reading and writing instructions than it does not guarantee access to bytes at different addresses. In other words, if what the programmer expects is that order then it only changes if there is absolute certainty that the result will be the same. Still, I think this problem has already happened to me programming on VC++. So that this?

Questions

So the first question is why re-signifying values in memory is U.B. so generalized? Now the second question is as in case of need to do this we ensure that it is not U.B. and the result is certainly the same? And of course, preferably without disabling optimizations.

To be clear, if I want in C++ two functions that convert pointers (one to "read and write" and the other to "read only") in a generic way with template, like this...

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    return (DstDataType*)srcPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    return (const DstDataType*)srcPtr ;
}

Why is U.B. and how do I do exactly these functions withnon-U.B. procedures and do exactly the same as expected from them?

Edit: Is this code U.B.? Not by itself, it makes it possible to optimize the inline call? Is this the solution? Exchange typecasts for memcpy and memmove always avoids U.B.?

# include <string.h>

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    DstDataType* dstPtr ;
    memmove( &dstPtr , &srcPtr , sizeof(void*) ) ;
    return dstPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    const DstDataType* dstPtr ;
    memmove( &dstPtr , &srcPtr , sizeof(const void*) ) ;
    return dstPtr ;
}

1 answer

0

The question you exemplified through the code in:

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    return (DstDataType*)srcPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    return (const DstDataType*)srcPtr ;
}

It can be replaced by the use of interfaces that make the conversion from variable nonconstant to variable read-only at the access points, and also by the expression const_cast:

int i = 3;                              //variável inicial não-constante
const int* i_cptr = &i;                 //ponteiro constante
int* i_ptr = const_cast<int*>(i_cptr);  //casting de ponteiro constante para ponteiro não-constante
*i_ptr = 4;                             //mudança de valor da variável não-constante
std::cout << *i_ptr; //4

Removing or assigning the status of const/volatile in Runtime is trivial, but be aware that modifying the value of constant variables, even after using const_cast, is still UB. Nothing can change that. Const_cast is useful for accessing interfaces, but transfers the responsibility to keep the constancy of values constant to you, the programmer.

why re-signifying values in memory is U.B. so generalized?

The C++ ecosystem is not quite like that, apart from several different types of pointers, there are other features that allow direct access to memory. Since Std::memcpy, reinterpret_cast, Std::bit_cast, and move Semantics until atomic variables, where you can select the memory access order during multi-threading and optimizing routines at the level of single instruction in the processor.

as in case of need to do this we ensure that it is not U.B. and the result is certainly the same?

The only way to ensure the correct behavior of any program is through in-depth study of the development environment. In your case, it is necessary to seek a greater understanding of the C++ rules and understand the available tools. Programming in C++ is quite different from programming in C and these differences always exist for a reason. For example, when you refer to the lack of type punning with Unions in C++, it happens due to the way C++ handles memory alignment and aliasing. To remain consistent with these rules, it is necessary that Unions only have one type at a time, without maintaining two types at the same time.

Other features or situations are deliberately determined as UB in the Language Standard. Both to make compiler (or STL) implementations more efficient, and to make other more complex resources available.

Why is U.B. and how do I do exactly these functions withnon-U.B. procedures and do exactly the same as expected from them?

template <typename T>
T* to_non_const(const T* src) {
    return const_cast<T*>(src);
}

template <typename T>
const T* to_const(T* src) {
    return src;
}

Note that this implementation can be dangerous for unsuspecting users, since hiding the use of const_cast, it can be understood that the responsibilities attached to its use are no longer necessary. Both functions are redundant as they do nothing but mask the use of const_cast.

  • Aviana, notice that the first function I did converts SrcDataType* ("read and write" pointer to "source data type") for DstDataType* ("read and write" pointer for "Destiny data type"), that is, it does not apply write restriction, it only applies the new data interpretation.

  • Likewise, realize that the second converts const SrcDataType* ("read only" pointer to "source data type") for const DstDataType* ("read only" pointer for "Destiny data type"), ie no write restriction strip, only applies the new data interpretation.

  • And all the time I approached the re-signification of the data regarding the format of the data type and without the change regarding the presence or absence of const. I mean, I worked hard to make it clear that I want to keep const as it is and change the pointer type.

  • Pimeiramente, sorry for the mistake, was tired and during my reading I had not seen from this angle. The core of my answer doesn’t change much. Casting types using C notation int var = (int) foo; is not the recommended way in C++. The right way is through static_cast<T> and dynamic_cast<T>. Example: template <typename IN, typename OUT>&#xA;OUT to_out(IN var) { return static_cast<OUT>(var); }

  • There’s no need for anything more complicated than that. A type T in a template maintains the const/volatile properties automatically, there is no need to define different functions for each possible combination. As I said before, unless there is a special reason or behavior that must be performed before casting, this type of function is redundant. You can just use static_cast, dynamic_cast or reinterpret_cast directly. If conversion is not possible, static_cast does not let your program compile and dynamic_cast returns nullptr.

  • It means none of those _cast of C++ cause Ub?

  • const_cast and reinterpret_cast are the most dangerous and you should use carefully, but static_cast and dynamic_cast give you clear signals when something is wrong.

  • Hm, I know. I would like to know the "Ub rules" about this, especially reinterpret_cast which is apparently what is used in this context.

  • https://en.cppreference.com/w/cpp/language/reinterpret_cast

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.