19
At the machine instruction level, what will happen on the call?
19
At the machine instruction level, what will happen on the call?
23
I don’t know if you know, but every C code is compiled for a machine code that will instruct the processor to do something. This machine code is ideal for the computer, but difficult for the human to understand, so we have higher-level languages with concepts that we understand better.
Function is a mathematical concept that we apply in programming. Directly it does not exist in the machine code. Possible instructions are:
The function call is actually an instruction CALL
which is actually like a GOTO
(or JMP
, as it is known in Assembly), so it just diverts the execution of the application to the address where is the code that would be the function. The basic difference of CALL
is that it stores the address to where it should return in a stack, so when it is within the function and find an instruction RET
, she will take the address that is at the top of the pile to give another GOTO
to it and divert the execution to this address. So in a way we can say that the function is just this.
Actually the function, ie the generated machine code needs a header (prologue) and footer (epilogue) that you programmer does not realize because it is something that does not matter for your understanding of your code.
As all operations are performed on loggers and you cannot lose the data that is there because normal execution probably still needs them and account that they will be there when there is a bypass to a function it is necessary to preserve the state of the loggers.
Then some instructions are placed before the machine code generated by compiling your C code who does this job storing your data in a stack.
At the end of the execution of your code it is necessary to restore the status of the registers by taking what is on the stack, this always occurs before an instruction RET
.
Normally there will be a global symbol of the application to contain the address from where the function code is, so the code you call can use what we know as a variable (roughly speaking). This is important because it is common that the actual execution address is not real, even considering that we are in virtual memory, so understand that real is not even the physical address I’m talking about, is not known until that code is loaded into memory, so you need to use this symbol and not a real address. At least in functions that can be called from any part of the code and cannot be optimized to contain the direct address.
Of course the function can have passage of arguments for parameters. If this occurs it is necessary to transfer their values to their function, this usually happens in the stack.
Since in fact the generated final code is not compartmentalized into functions and is one thing, in fact what it is doing is copying the data from one variable to another, that is, it is equivalent to a simple assignment.
But actually there are no variables in machine code. What you do is carry data from memory to another point in memory, passing through the registers, and how optimization it can carry from memory to the register, or even from the register to register when it is possible.
It is common to try to only make movements between registers because they are much faster, but it is not always possible, because if it has many parameters it does not have enough registers, and not all of them can be used for this and it also has data that does not fit in the register. What is copied is simple numerical data that fits in the register, remembering that char
is a numerical datum, pointers, reminding that array is just a pointer, and data in structs
that do not usually go in registers.
The same occurs with the return. In general a register is highlighted so that it is done. Just as it occurs in the passage of parameters, there is a convention of which registrar will be used for the returned data that the code you called will already have to use when you finish the function code (x86 convention). Normally the return of the function only does not enter into register when it is too large and can be placed in memory directly at the location of the variable or temporary storage location contained in the calling function.
The parameters can be considered local function variables, so what you’re doing is you’re putting values into those variables. So everything goes to the stack of execution in memory, if you can’t optimize for register.
Of course what I just described is the most common, depends on the architecture of the processor can work a little differently. Some C compiler can do some slightly different things, as long as it complies with what is in the language specification.
Think of a C function like this:
int add(int i,int j) {
int p = i + j;
return p;
}
The compiled code would look something like this:
.globl add
add:
pushl %ebp //cabeçalho salvando em pilha o registrador que será manipulado
movl %esp, %ebp //joga o ponteiro da pilha em ebp
subl $4, %esp //aloca espaço (4 bytes) na pilha para a variável p
movl 8(%ebp), %edx //8(%ebp) é o i, só atribuição do que está na pilha + 8 bytes
addl 12(%ebp), %edx //12(%ebp) é o j, aqui já faz a adição, edx é o outro operando
movl %edx, -4(%ebp) //-4(%ebp) é o p, atribuição
movl -4(%ebp), %eax //eax é o registrador de retorno
leave //é o mesmo que dar movl %ebp, %esp; popl %ebp ret, é o rodapé
%esp
is register with the current stack location pointer. When you have a 12(%ebp)
is the address indicated by the value in the register %ebp
plus 12 bytes. %ebp
is where the frame of the stack of this function.
The call to this function will be something like this:
add(1, 2)
In Assembly:
pushl $2 //joga o segundo parâmetro na pilha
pushl $1 //joga o primeiro parâmetro na pilha
call add //desvia para a função
Try it yourself, make simple codes, Compile using GCC with option -S
and see the resulting Assembly. In Visual C is /FA
.
The result will not always be the same, it depends a little on the compiler.
There is different assembly syntax. If you use GCC 7 and use Intel syntax the generated function will be:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov edx, DWORD PTR [rbp-20]
mov eax, DWORD PTR [rbp-24]
add eax, edx
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-4]
pop rbp
ret
See on Compiler Explorer. Also put on the Github for future reference.
It seems complicated, but as you learn it becomes easy, because in fact the simplest concepts are the concrete ones. That’s why I often say that people should learn to program from the bottom up, with Assembly, or at least C, understanding everything that happens in the compilation process, leaving abstractions for later.
In short it is this, obviously I made simplifications, if you want to understand really need to research something deeper and precise, if you have specific questions, ask new questions.
Browser other questions tagged c function
You are not signed in. Login or sign up in order to post.