This statement is old and exists precisely because the compilers did not know how to optimize the code well in the past, resulting in an executable with questionable performance (the variable is copied, and copies can have a high cost). Nowadays, returning something local can be even better than using output parameters. Of course, never rely on popular phrases about optimizations, always do the calculations and performance measurements of your program to get any conclusion.
We have two names for the types of possible optimizations in this case:
- RVO: Return Value Optimization (optimization of return value in Portuguese), and
- NRVO: Named Return Value Optimization (optimization of return value with name in Portuguese), which is basically a variation of RVO for cases when the value has a name (i.e. is a variable).
These two optimization techniques are within the technique Copy Elision (copying/omission in Portuguese). In c++17, Copying is part of the standardization. Previously, this technique was mentioned as permitted, but did not go into many details about which cases were allowed or not allowed to omit copies.
With all this said, we can now observe the effects of RVO and NRVO:
When the RVO optimization technique is successfully applied, the copy (which would previously have been made) of an object, which had just been created and returned by the function, is omitted, making the storage area of that object the same as the object that is receiving this return value. To be clear, the following code:
#include <string>
std::string foo() { return "teste"; }
auto s = foo();
It is transformed into the following:
#include <string>
std::string s;
void foo() { s = "teste"; }
foo();
Notice how optimization used to store the outside variable s
to assign the literal string "teste"
, instead of creating a new std::string
and copy this object pro s
. Compiling with GCC 7.3 and with optimization level 3, we have the following body for the function std::string foo()
:
foo[abi:cxx11]():
lea rdx, [rdi+16] # Calcula o local onde `s` está
mov rax, rdi
mov DWORD PTR [rdi+16], 1953719668 # Escreve "teste" no buffer de `s`.
mov BYTE PTR [rdi+20], 101
mov QWORD PTR [rdi+8], 5 # Escreve o tamanho da string.
mov QWORD PTR [rdi], rdx
mov BYTE PTR [rdi+21], 0 # Escreve o caractere nulo da string.
ret
Instead of creating a new object from std::string
, the function foo
just assume that the object’s storage location already exists (i.e., whoever called the function has already allocated space to the object) and makes use of it.
The NRVO variation does exactly the same thing, except that it is extended to variables. If we had the following code:
#include <string>
std::string foo()
{
std::string s_local = "teste";
s_local[0] = 'T';
return s_local;
}
auto s = foo();
We would have exactly the same optimized output, with the single addition of a mov BYTE PTR [rdi+16], 84
at the end, which changes the first character of the string to a capital T. This is, s_local
and s
will have the same storage location after optimization.
There are some cases where NRVO optimization cannot be applied easily. If we just return the same local variable, then the application of NRVO is trivial. Otherwise, if we have returns of multiple values, then we are in a difficult case for NRVO, and probably the optimization will not be performed. For example:
std::string foo(bool b)
{
std::string s1 = "abc";
std::string s2 = "def";
return b ? s1 : s2;
}
Here, the compiler may even be able to apply NRVO (by writing "abc"
or "def"
string, depending on the value of b
), But once the code gets more complex, the chances of NRVO being successfully applied decrease. In contrast, if we only have returns of always the same variable, the function can get as complex as you want, that the application of NRVO will be trivial independently.
Finally, here is the output of your (briefly changed) function of some compilers (compiling with c++17 in all).
#include <string>
#include <algorithm>
std::string foo()
{
std::string s = "teste";
std::transform(begin(s), end(s), begin(s),
[](char c) { return c - 32; });
return s;
}
With GCC 7.3 and optimization level 3:
foo[abi:cxx11]():
lea rdx, [rdi+16] # Calcula o começo da string que já existe fora da função
mov DWORD PTR [rdi+16], 1953719668 # Escreve "teste"
mov BYTE PTR [rdi+20], 101
mov rax, rdi
mov QWORD PTR [rdi+8], 5
mov BYTE PTR [rdi+21], 0
mov QWORD PTR [rdi], rdx
sub BYTE PTR [rdi+16], 32 # Sequência de subtração (pra passar pra maiúsculo)
sub BYTE PTR [rdi+17], 32 # que foi desenrolado de `std::transform`
sub BYTE PTR [rdi+18], 32
sub BYTE PTR [rdi+19], 32
sub BYTE PTR [rdi+20], 32
ret
With Clang 6.0.0, level 3 optimization and also compiling with libstdc++:
foo[abi:cxx11](): # @foo[abi:cxx11]()
lea rax, [rdi + 16]
mov qword ptr [rdi], rax
mov qword ptr [rdi + 8], 5
mov dword ptr [rdi + 16], 1414743380 # Clang conseguiu remover o `std::transform`
mov word ptr [rdi + 20], 69 # e já passou a string na versão maiúscula
mov rax, rdi
ret
You can play around and test with compiler outputs on Compiler Explorer Godbolt.
I’ve never heard that, including marking your question as a favorite in case someone answers it, I believe it may be in case of memory saving, projects for Arkadin for example that we have to save memory since it has more limited hardware, because if you pass the variable by reference it will occupy only one space in memory, and if it returns another variable it will use one more space in memory.
– Wictor Chaves
I see no sense in this statement and without you putting the context from which you read such information, I believe it will be impossible to measure anything.
– Woss