How to resignify bytes without Undefined behavior?

Question

How to resignify bytes without Undefined behavior?

Asked 4 years, 1 month ago

Viewed 84 times

2

Details

In Assembly, C, C++, C# with unsafe and other languages it is possible to reinterpret binary code at the address as of different type from the original. Convert type int* for float* in C, that is to say that it points to whole valid 0x3F800000 then there’s also floating point 1.0f.

Although it allows algorithms that require fine-grained control of the bits and even if one expects something obvious from a rereading, it is still considered U.B. (Undefined behavior), that is, undefined behavior, it is not known what compiler/interpreter will do with it.

If I’m not mistaken, almost always convert pointers is considered U.B. and I want to know why. Why do this gives B.O.? For example, reading whole float after all does not give an expected result due to the known formatting that the float has? What might turn out different?

Here Visual Studio optimizes so well that Disassembly even finds constants after conversions of known values. To my knowledge, the compiler can at most

use any code among several possible ones that represent the value (like float Nan, which has several binary codes that represent it, then the compiler chooses anyone) and
do not keep reading and writing order when optimizing code in more complicated situations (type in arrays traversing indexes instead of working with simple variables).

Other than that, I don’t know and that’s why I don’t understand.

Even for me in the compiler one would avoid the second U.B. in an obvious way: being implemented to maintain the order of reading and writing instructions than it does not guarantee access to bytes at different addresses. In other words, if what the programmer expects is that order then it only changes if there is absolute certainty that the result will be the same. Still, I think this problem has already happened to me programming on VC++. So that this?

Questions

So the first question is why re-signifying values in memory is U.B. so generalized? Now the second question is as in case of need to do this we ensure that it is not U.B. and the result is certainly the same? And of course, preferably without disabling optimizations.

To be clear, if I want in C++ two functions that convert pointers (one to "read and write" and the other to "read only") in a generic way with template, like this...

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    return (DstDataType*)srcPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    return (const DstDataType*)srcPtr ;
}

Why is U.B. and how do I do exactly these functions withnon-U.B. procedures and do exactly the same as expected from them?

Edit: Is this code U.B.? Not by itself, it makes it possible to optimize the inline call? Is this the solution? Exchange typecasts for memcpy and memmove always avoids U.B.?

# include <string.h>

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    DstDataType* dstPtr ;
    memmove( &dstPtr , &srcPtr , sizeof(void*) ) ;
    return dstPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    const DstDataType* dstPtr ;
    memmove( &dstPtr , &srcPtr , sizeof(const void*) ) ;
    return dstPtr ;
}

1 answer

Browser other questions tagged c c++ pointer type-conversion

You are not signed in. Login or sign up in order to post.

by aviana • **348** points · Answer 1 · 2020-12-04T03:12:40+00:00

The question you exemplified through the code in:

template< typename DstDataType , typename SrcDataType >
inline DstDataType* RemeanPtrAs( SrcDataType *srcPtr ){
    return (DstDataType*)srcPtr ;
}

template< typename DstDataType , typename SrcDataType >
inline const DstDataType* RemeanPtrAs( const SrcDataType *srcPtr ){
    return (const DstDataType*)srcPtr ;
}

It can be replaced by the use of interfaces that make the conversion from variable nonconstant to variable read-only at the access points, and also by the expression const_cast:

int i = 3;                              //variável inicial não-constante
const int* i_cptr = &i;                 //ponteiro constante
int* i_ptr = const_cast<int*>(i_cptr);  //casting de ponteiro constante para ponteiro não-constante
*i_ptr = 4;                             //mudança de valor da variável não-constante
std::cout << *i_ptr; //4

Removing or assigning the status of const/volatile in Runtime is trivial, but be aware that modifying the value of constant variables, even after using const_cast, is still UB. Nothing can change that. Const_cast is useful for accessing interfaces, but transfers the responsibility to keep the constancy of values constant to you, the programmer.

why re-signifying values in memory is U.B. so generalized?

The C++ ecosystem is not quite like that, apart from several different types of pointers, there are other features that allow direct access to memory. Since Std::memcpy, reinterpret_cast, Std::bit_cast, and move Semantics until atomic variables, where you can select the memory access order during multi-threading and optimizing routines at the level of single instruction in the processor.

as in case of need to do this we ensure that it is not U.B. and the result is certainly the same?

The only way to ensure the correct behavior of any program is through in-depth study of the development environment. In your case, it is necessary to seek a greater understanding of the C++ rules and understand the available tools. Programming in C++ is quite different from programming in C and these differences always exist for a reason. For example, when you refer to the lack of type punning with Unions in C++, it happens due to the way C++ handles memory alignment and aliasing. To remain consistent with these rules, it is necessary that Unions only have one type at a time, without maintaining two types at the same time.

Other features or situations are deliberately determined as UB in the Language Standard. Both to make compiler (or STL) implementations more efficient, and to make other more complex resources available.

Why is U.B. and how do I do exactly these functions withnon-U.B. procedures and do exactly the same as expected from them?

template <typename T>
T* to_non_const(const T* src) {
    return const_cast<T*>(src);
}

template <typename T>
const T* to_const(T* src) {
    return src;
}

Note that this implementation can be dangerous for unsuspecting users, since hiding the use of const_cast, it can be understood that the responsibilities attached to its use are no longer necessary. Both functions are redundant as they do nothing but mask the use of const_cast.