Summary
- State (fields) and behavior (methods) are distinct things and are in different memory areas.
- A method is not part of the internal data composition of a class or structure.
- Method is just an abstract concept of object-oriented languages. In fact they are ordinary functions.
- Methods are codes at a fixed memory position and are independent of their instances.
- The method code is therefore shared by all instances in a similar way to static data in a class. That is, there is only one time and is available for every application.
- For those who know languages that have some form of functions and structs, as C, the method is the function, and the data is the structs staying in different memory locations, having only an indirect relationship.
Details
Do you understand how a computer, memory, processor works? Do you have any idea how a common (native) program works? If you don’t know, other questions may be needed to understand everything.
I’m going to simplify a few things to facilitate understanding. If you are going to look deeply into some small mistakes will be made by this simplification, but that does not compromise anything, unless you want to become a expert on the subject.
Functioning of CLR
First you need to understand some points of how . Net works:
The entire memory of an application written to run on . NET is managed by CLR. There is nothing this application can do that corrupts the memory (write or access), except for some bug in CLR. Memory is managed by a garbage collector (Garbage Collector) which, contrary to what the name implies, is not only used to de-locate memory, but also to ensure the proper allocation of objects (objects in a broad sense, meaning a set of data that are part of a single thing, not only in the concept of object orientation paradigm). There is actually an unmanaged area, but this is another subject.
The CLR has a Jitter. That is, a program in any language compatible with . NET is compiled and generates an intermediate code, called CIL. This code is a form of bytecode, which, in general, is created through a language Assembly own of . NET as can be seen in Wikipedia’s English article on CIL.
Jitter usually runs when a program is started, or on demand in certain circumstances. It is a kind of compiler of this bytecode. It generates the native code, which is the code that the processor really understands, is the code that will actually run. At the end of this process the resulting code is very efficient because it runs straight through the processor in the way it expects. If the language compiler you are programming and Jitter are very efficient, it is possible have a code as fast as it had been written in C or even using an Assembler. But in general it is not happening in practice.
A method is a concept analogous to a function. Before the OOP paradigm we had only the concept of function. And the processor usually has facilities to handle functions. But a method is nothing more than a function. There is no concrete method, it is an abstract concept for programmers to better understand the organization of the application and not to worry about the mechanism.
Methods internally
A method in a . NET language is generated more or less this way in CIL:
.method private hidebysig static int32 Add(int32 x, int32 y) cil managed
{
// Code size 9 (0x9)
.maxstack 2
.locals init ([0] int32 CS$1$0000)
IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: add
IL_0004: stloc.0
IL_0005: br.s IL_0007
IL_0007: ldloc.0
IL_0008: ret
}
Note that this is a representation for a human to understand, in practice only bytes that only the CLR understands to determine the effective code.
Generating native code
When Jitter generates the native code that will be executed it needs to be put into memory. This area of memory is controlled by CLR, your application does not have direct access to it. Simply put, only Jitter can write there and only CLR can allow any reading access to it. The reading obviously occurs when the execution is in progress.
A method when it is already in its native form is just a set of bytes that the processor understands in a part of the memory whose initial address is stored in a symbol table with the names of all functions (remember that method does not exist concretely) available for your application.
At this point it doesn’t even matter if the function (this address in the symbol table) is unmanaged code (but its access is managed), written in C, C++, Assembly or other language, including functions that are part of the operating system API. The symbols in this table must be unique. There cannot be two functions with the same name.
Functions are unique and immutable
The native code generated by Jitter is immutable, that is, it will never be changed during the execution of the program. There is even the possibility of a method/function having its code changed during execution in an advanced technique, but what happens is the change of the code address in the symbol table to indicate a new code and not the change of the code itself. I won’t go into detail about immutability here, but that goes for code too, not just for data.
Avoiding collision of names
There is a technique to ensure that functions that appear to have the same name are different. You make a "decoration" in the name of the functions with other "words" that make them unique, creating "surnames" for the functions, add the name:
How a method is just a function with a code to be executed by the processor (it contains no data, only processor instructions), he exists only once in memory.
Under no circumstances is it replicated to the instances of the classes (the objects, speaking in OOP terminology). You can have thousands, millions of objects of a class accessing the same code of the method even simultaneously. To run it you are doing internally read-only in this area of memory to send p/ the processor and the area is immutable, so no problem of competition.
Therefore a method int TotalizarInscritosDaLetra( string )
class Concurso
in namespace Aplicacao
and contained in the DLL MinhaApp
would probably have like its internal name something like MinhaApp_Aplicacao_Concurso_TotalizarInscritosDaLetra_int_string
(depends on the implementation, this was just an example).
So if you create the code:
var concurso1 = new Concurso();
var concurso2 = new Concurso();
both the variables concurso1
how much concurso2
when they call the method TotalizarInscritosDaLetra
who makes an argument string
(in C# the return does not matter, the Overload does not consider it, but the CLR yes, so the native concrete name considers it), will execute exactly the same code (physically), started at the same memory address.
If you want to see the internal names in CIL, you can use the utility ILASM that comes with the . NET or more sophisticated third-party software to make Reflection of the code as the .NET Reflector and the Dotpeek. It is also possible to see functions at a lower level but never tried and I will refrain from suggesting how.
Generic methods
Note that if you use generic programming, it complicates a bit more: a method void Add( T )
class List<T>
can generate several different functions in memory (not in CIL). If your application uses a class object List<int>
you will have a function Add
with a name similar to mscorlib_System_Collections_Generic_List_Add_void_**int**
.
This goes for any guy named value type (by value). But all types Reference type (by reference) have only a shared version of these functions, after all the effective value that the functions will have to deal with is the reference to the object itself. And all references have the same size and semantics. O . NET does this trick to save memory and not generate versions of functions for all types used by the generic class.
Estates
Properties, which are nothing more than methods with differentiated syntax, use name variations as well. The properties usually have two generated functions, one for get
and another to set
(See the CIL as they get).
There are several other methods in which the name of the internal function turns out to be slightly different than what you see in your code.
Delegates
Delegates are still normal methods like the others, only have some characteristics of their own. They exist during the entire execution of the application (or at least while it is loaded in the Application Domain. I won’t go into detail about loading and unloading "modules").
Delegates are also called nameless functions. Therefore, abstractly there is no name, internally a concrete name will be generated to be placed in the symbol table. Something like dll_ns_classeDoDelegateExemplo_InternalMethod_etc_etc
(the real name pattern is different, this is just an example to facilitate understanding).
A method delegate is encapsulated in a class, even if it doesn’t look like it, because the internal mechanism isn’t really important when we’re creating our applications. And in this class it is possible to have references to data external to delegate, in this case the delegate uses the concept of closure
An extra note is that delegates are as fast as virtual methods. Essentially the only overhead real that a delegate has in relation to a non-virtual method is the pointer indirect, just as it occurs in virtual methods.
You may have references to delegates destroyed when they are no longer being used, but the function code delegate is never destroyed (except in the Unload from Appdomain), it is in an immutable and not accessible area for writing to your application.
Methods and data
Therefore methods are stored in memory but not as data, there are profound differences in the way it is stored.
The data you find inside a method (the local variables), as a rule, is in the stack, a changeable area of memory organized in a stack form as its name suggests, where the data is piled in each new execution scope (not to be confused with lexical scope) and stacked at its end.
Reference types still have the data of referenced objects within methods (in the stack) in the memory area called heap, which is managed directly by Garbage Collector.
Static methods X instance methods
Understand the difference between static methods and class instance methods (remembering that static classes only have static methods).
For the purpose of understanding your question, internally the functions are generated identically in both cases. Instance methods are practically syntax sugar for static methods. We can say that concretely only static methods exist.
Abstractly we have instance methods whose only real difference is that they have an extra parameter that you don’t see in your statement. For example, an instance method int Calcular()
class Imposto
actually has your real signature int Calcular( Imposto this )
. When you use the this
implicitly or explicitly within a method, it’s actually like accessing a local variable called this
which has a reference to an instance of its class.
This explains concretely why you cannot access instance members from within a static method, it does not have this "variable".
Completion
There are other implications and more advanced situations on the subject, but I think this gives an overview. Want to know more and technically accurate? See article in MSDN Magazine (in English).
I point out that most of what I said here, despite being in rather generic terms, is implementation detail and is not part of any specification and can change the way it works in the future.
The book by Jefferey Ritcher CLR via C# is very good to deepen on the subject (is a little outdated, but still helps (see also the Book of Runtime).
See more on the subject of memory management (including the mistaken statement which types per value are stored on stack) in another question.