What happens when we call a function?

Asked

Viewed 639 times

19

At the machine instruction level, what will happen on the call?

1 answer

23


Introducing

I don’t know if you know, but every C code is compiled for a machine code that will instruct the processor to do something. This machine code is ideal for the computer, but difficult for the human to understand, so we have higher-level languages with concepts that we understand better.

Function is a mathematical concept that we apply in programming. Directly it does not exist in the machine code. Possible instructions are:

  • simple numerical operations,
  • data movement between memory and registers,
  • Simple execution flow control.

Call and return

The function call is actually an instruction CALL which is actually like a GOTO (or JMP, as it is known in Assembly), so it just diverts the execution of the application to the address where is the code that would be the function. The basic difference of CALL is that it stores the address to where it should return in a stack, so when it is within the function and find an instruction RET, she will take the address that is at the top of the pile to give another GOTO to it and divert the execution to this address. So in a way we can say that the function is just this.

Actually the function, ie the generated machine code needs a header (prologue) and footer (epilogue) that you programmer does not realize because it is something that does not matter for your understanding of your code.

As all operations are performed on loggers and you cannot lose the data that is there because normal execution probably still needs them and account that they will be there when there is a bypass to a function it is necessary to preserve the state of the loggers.

Then some instructions are placed before the machine code generated by compiling your C code who does this job storing your data in a stack.

At the end of the execution of your code it is necessary to restore the status of the registers by taking what is on the stack, this always occurs before an instruction RET.

Normally there will be a global symbol of the application to contain the address from where the function code is, so the code you call can use what we know as a variable (roughly speaking). This is important because it is common that the actual execution address is not real, even considering that we are in virtual memory, so understand that real is not even the physical address I’m talking about, is not known until that code is loaded into memory, so you need to use this symbol and not a real address. At least in functions that can be called from any part of the code and cannot be optimized to contain the direct address.

Arguments, parameters and return

Of course the function can have passage of arguments for parameters. If this occurs it is necessary to transfer their values to their function, this usually happens in the stack.

Since in fact the generated final code is not compartmentalized into functions and is one thing, in fact what it is doing is copying the data from one variable to another, that is, it is equivalent to a simple assignment.

But actually there are no variables in machine code. What you do is carry data from memory to another point in memory, passing through the registers, and how optimization it can carry from memory to the register, or even from the register to register when it is possible.

It is common to try to only make movements between registers because they are much faster, but it is not always possible, because if it has many parameters it does not have enough registers, and not all of them can be used for this and it also has data that does not fit in the register. What is copied is simple numerical data that fits in the register, remembering that char is a numerical datum, pointers, reminding that array is just a pointer, and data in structs that do not usually go in registers.

The same occurs with the return. In general a register is highlighted so that it is done. Just as it occurs in the passage of parameters, there is a convention of which registrar will be used for the returned data that the code you called will already have to use when you finish the function code (x86 convention). Normally the return of the function only does not enter into register when it is too large and can be placed in memory directly at the location of the variable or temporary storage location contained in the calling function.

The parameters can be considered local function variables, so what you’re doing is you’re putting values into those variables. So everything goes to the stack of execution in memory, if you can’t optimize for register.

Real example

Of course what I just described is the most common, depends on the architecture of the processor can work a little differently. Some C compiler can do some slightly different things, as long as it complies with what is in the language specification.

Think of a C function like this:

int add(int i,int j) {
    int p = i + j;
    return p;
}

The compiled code would look something like this:

.globl add
add:
    pushl %ebp          //cabeçalho salvando em pilha o registrador que será manipulado
    movl %esp, %ebp     //joga o ponteiro da pilha em ebp
    subl $4, %esp       //aloca espaço (4 bytes) na pilha para a variável p
    movl 8(%ebp), %edx  //8(%ebp) é o i, só atribuição do que está na pilha + 8 bytes
    addl 12(%ebp), %edx //12(%ebp) é o j, aqui já faz a adição, edx é o outro operando
    movl %edx, -4(%ebp) //-4(%ebp) é o p, atribuição
    movl -4(%ebp), %eax //eax é o registrador de retorno
    leave               //é o mesmo que dar movl %ebp, %esp; popl %ebp ret, é o rodapé

%esp is register with the current stack location pointer. When you have a 12(%ebp) is the address indicated by the value in the register %ebp plus 12 bytes. %ebp is where the frame of the stack of this function.

The call to this function will be something like this:

add(1, 2)

In Assembly:

pushl $2  //joga o segundo parâmetro na pilha
pushl $1  //joga o primeiro parâmetro na pilha
call add  //desvia para a função

Try it yourself, make simple codes, Compile using GCC with option -S and see the resulting Assembly. In Visual C is /FA.

The result will not always be the same, it depends a little on the compiler.

There is different assembly syntax. If you use GCC 7 and use Intel syntax the generated function will be:

push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-20], edi
mov     DWORD PTR [rbp-24], esi
mov     edx, DWORD PTR [rbp-20]
mov     eax, DWORD PTR [rbp-24]
add     eax, edx
mov     DWORD PTR [rbp-4], eax
mov     eax, DWORD PTR [rbp-4]
pop     rbp
ret

See on Compiler Explorer. Also put on the Github for future reference.

Completion

It seems complicated, but as you learn it becomes easy, because in fact the simplest concepts are the concrete ones. That’s why I often say that people should learn to program from the bottom up, with Assembly, or at least C, understanding everything that happens in the compilation process, leaving abstractions for later.

In short it is this, obviously I made simplifications, if you want to understand really need to research something deeper and precise, if you have specific questions, ask new questions.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.