How does method management work in C#memory?

Asked

Viewed 1,436 times

28

In C# there is a clear distinction between value types (structs) and reference types (classes), this distinction being basically the way in which the CLR manages instances of each type. Value type instances are placed on the stack directly and reference type instances are placed on the heap, containing a reference on the stack.

Basically, this is easy to understand with primitive types and simple classes that contain only fields. In such cases we are talking about storing data in memory and this is something simple to understand.

The problem is that methods are also stored in memory. Classes can contain methods, and in this way, these methods should be stored in memory in some way when we create an instance.

Also, using delegates we can point to methods, which again suggests that methods are stored in memory as well as data.

But this is strange: what would become of storing a method in memory? A method does not contain a "data" that can be saved, it can contain multiple variables, multiple commands, etc. I cannot understand how CLR manages this.

3 answers

21


Summary

  • State (fields) and behavior (methods) are distinct things and are in different memory areas.
  • A method is not part of the internal data composition of a class or structure.
  • Method is just an abstract concept of object-oriented languages. In fact they are ordinary functions.
  • Methods are codes at a fixed memory position and are independent of their instances.
  • The method code is therefore shared by all instances in a similar way to static data in a class. That is, there is only one time and is available for every application.
  • For those who know languages that have some form of functions and structs, as C, the method is the function, and the data is the structs staying in different memory locations, having only an indirect relationship.

Details

Do you understand how a computer, memory, processor works? Do you have any idea how a common (native) program works? If you don’t know, other questions may be needed to understand everything.

I’m going to simplify a few things to facilitate understanding. If you are going to look deeply into some small mistakes will be made by this simplification, but that does not compromise anything, unless you want to become a expert on the subject.

Functioning of CLR

First you need to understand some points of how . Net works:

  • The entire memory of an application written to run on . NET is managed by CLR. There is nothing this application can do that corrupts the memory (write or access), except for some bug in CLR. Memory is managed by a garbage collector (Garbage Collector) which, contrary to what the name implies, is not only used to de-locate memory, but also to ensure the proper allocation of objects (objects in a broad sense, meaning a set of data that are part of a single thing, not only in the concept of object orientation paradigm). There is actually an unmanaged area, but this is another subject.

  • The CLR has a Jitter. That is, a program in any language compatible with . NET is compiled and generates an intermediate code, called CIL. This code is a form of bytecode, which, in general, is created through a language Assembly own of . NET as can be seen in Wikipedia’s English article on CIL.

    Jitter usually runs when a program is started, or on demand in certain circumstances. It is a kind of compiler of this bytecode. It generates the native code, which is the code that the processor really understands, is the code that will actually run. At the end of this process the resulting code is very efficient because it runs straight through the processor in the way it expects. If the language compiler you are programming and Jitter are very efficient, it is possible have a code as fast as it had been written in C or even using an Assembler. But in general it is not happening in practice.

  • A method is a concept analogous to a function. Before the OOP paradigm we had only the concept of function. And the processor usually has facilities to handle functions. But a method is nothing more than a function. There is no concrete method, it is an abstract concept for programmers to better understand the organization of the application and not to worry about the mechanism.

Methods internally

A method in a . NET language is generated more or less this way in CIL:

.method private hidebysig static int32 Add(int32 x, int32 y) cil managed 
{ 
 // Code size 9 (0x9) 
 .maxstack 2 
 .locals init ([0] int32 CS$1$0000) 
 IL_0000: nop 
 IL_0001: ldarg.0 
 IL_0002: ldarg.1 
 IL_0003: add 
 IL_0004: stloc.0 
 IL_0005: br.s IL_0007 
 IL_0007: ldloc.0 
 IL_0008: ret 
}

Note that this is a representation for a human to understand, in practice only bytes that only the CLR understands to determine the effective code.

Generating native code

When Jitter generates the native code that will be executed it needs to be put into memory. This area of memory is controlled by CLR, your application does not have direct access to it. Simply put, only Jitter can write there and only CLR can allow any reading access to it. The reading obviously occurs when the execution is in progress.

A method when it is already in its native form is just a set of bytes that the processor understands in a part of the memory whose initial address is stored in a symbol table with the names of all functions (remember that method does not exist concretely) available for your application.

At this point it doesn’t even matter if the function (this address in the symbol table) is unmanaged code (but its access is managed), written in C, C++, Assembly or other language, including functions that are part of the operating system API. The symbols in this table must be unique. There cannot be two functions with the same name.

Functions are unique and immutable

The native code generated by Jitter is immutable, that is, it will never be changed during the execution of the program. There is even the possibility of a method/function having its code changed during execution in an advanced technique, but what happens is the change of the code address in the symbol table to indicate a new code and not the change of the code itself. I won’t go into detail about immutability here, but that goes for code too, not just for data.

Avoiding collision of names

There is a technique to ensure that functions that appear to have the same name are different. You make a "decoration" in the name of the functions with other "words" that make them unique, creating "surnames" for the functions, add the name:

How a method is just a function with a code to be executed by the processor (it contains no data, only processor instructions), he exists only once in memory.

Under no circumstances is it replicated to the instances of the classes (the objects, speaking in OOP terminology). You can have thousands, millions of objects of a class accessing the same code of the method even simultaneously. To run it you are doing internally read-only in this area of memory to send p/ the processor and the area is immutable, so no problem of competition.

Therefore a method int TotalizarInscritosDaLetra( string ) class Concurso in namespace Aplicacao and contained in the DLL MinhaApp would probably have like its internal name something like MinhaApp_Aplicacao_Concurso_TotalizarInscritosDaLetra_int_string (depends on the implementation, this was just an example).

So if you create the code:

var concurso1 = new Concurso();
var concurso2 = new Concurso();

both the variables concurso1 how much concurso2 when they call the method TotalizarInscritosDaLetra who makes an argument string (in C# the return does not matter, the Overload does not consider it, but the CLR yes, so the native concrete name considers it), will execute exactly the same code (physically), started at the same memory address.

If you want to see the internal names in CIL, you can use the utility ILASM that comes with the . NET or more sophisticated third-party software to make Reflection of the code as the .NET Reflector and the Dotpeek. It is also possible to see functions at a lower level but never tried and I will refrain from suggesting how.

Generic methods

Note that if you use generic programming, it complicates a bit more: a method void Add( T ) class List<T> can generate several different functions in memory (not in CIL). If your application uses a class object List<int> you will have a function Add with a name similar to mscorlib_System_Collections_Generic_List_Add_void_**int**.

This goes for any guy named value type (by value). But all types Reference type (by reference) have only a shared version of these functions, after all the effective value that the functions will have to deal with is the reference to the object itself. And all references have the same size and semantics. O . NET does this trick to save memory and not generate versions of functions for all types used by the generic class.

Estates

Properties, which are nothing more than methods with differentiated syntax, use name variations as well. The properties usually have two generated functions, one for get and another to set (See the CIL as they get).

There are several other methods in which the name of the internal function turns out to be slightly different than what you see in your code.

Delegates

Delegates are still normal methods like the others, only have some characteristics of their own. They exist during the entire execution of the application (or at least while it is loaded in the Application Domain. I won’t go into detail about loading and unloading "modules").

Delegates are also called nameless functions. Therefore, abstractly there is no name, internally a concrete name will be generated to be placed in the symbol table. Something like dll_ns_classeDoDelegateExemplo_InternalMethod_etc_etc (the real name pattern is different, this is just an example to facilitate understanding).

A method delegate is encapsulated in a class, even if it doesn’t look like it, because the internal mechanism isn’t really important when we’re creating our applications. And in this class it is possible to have references to data external to delegate, in this case the delegate uses the concept of closure

An extra note is that delegates are as fast as virtual methods. Essentially the only overhead real that a delegate has in relation to a non-virtual method is the pointer indirect, just as it occurs in virtual methods.

You may have references to delegates destroyed when they are no longer being used, but the function code delegate is never destroyed (except in the Unload from Appdomain), it is in an immutable and not accessible area for writing to your application.

Methods and data

Therefore methods are stored in memory but not as data, there are profound differences in the way it is stored.

The data you find inside a method (the local variables), as a rule, is in the stack, a changeable area of memory organized in a stack form as its name suggests, where the data is piled in each new execution scope (not to be confused with lexical scope) and stacked at its end.

Reference types still have the data of referenced objects within methods (in the stack) in the memory area called heap, which is managed directly by Garbage Collector.

Static methods X instance methods

Understand the difference between static methods and class instance methods (remembering that static classes only have static methods).

For the purpose of understanding your question, internally the functions are generated identically in both cases. Instance methods are practically syntax sugar for static methods. We can say that concretely only static methods exist.

Abstractly we have instance methods whose only real difference is that they have an extra parameter that you don’t see in your statement. For example, an instance method int Calcular() class Imposto actually has your real signature int Calcular( Imposto this ). When you use the this implicitly or explicitly within a method, it’s actually like accessing a local variable called this which has a reference to an instance of its class.

This explains concretely why you cannot access instance members from within a static method, it does not have this "variable".

Completion

There are other implications and more advanced situations on the subject, but I think this gives an overview. Want to know more and technically accurate? See article in MSDN Magazine (in English).

I point out that most of what I said here, despite being in rather generic terms, is implementation detail and is not part of any specification and can change the way it works in the future.

The book by Jefferey Ritcher CLR via C# is very good to deepen on the subject (is a little outdated, but still helps (see also the Book of Runtime).

See more on the subject of memory management (including the mistaken statement which types per value are stored on stack) in another question.

9

Some conceptions you formulated are wrong.

  1. Structs do not necessarily exist in the stack. A struct is a set of data that is stored in the form of a value, rather than a reference that points elsewhere. It is encoded directly in the context to which it is associated, or which will be used (who decides this is the CLR):
  • if it is a local variable, it will be placed directly on the stack, and will cease to exist automatically when the context of the method ceases to exist, that is, when an exception occurs or the method returns.

  • if it is an instance field of a class, it will be placed directly in the class body, which will be in the "heap", allocated directly in the object structure, with a certain offset relative to the beginning of the same

  • if it is a static field, then it will be stored statically, and will not appear in the object’s memory layout as an instance

  • if a struct is assigned to a variable of the reference type, such as object, Enum or some interface whatever the struct implemente, will be stored in the "heap", through an operation called Boxing, that consists in allocating space in heap Managed, and copy the value there.

  • optimizations can cause a local variable of type struct, does not even exist in the pile, for example a int used as an iterator for.

  • one local variable may end in heap, if captured in a closure, or need to live longer than the local context (who decides this is the CLR)

  1. Methods are not part of an object instance. Compiled code is stored statically, in a fixed memory position. When the method is called the JIT compiler goes into action and compiles the method, in a fixed place, and from there everything that references the method will point to a fixed memory address that contains the compiled code.
  • the method code being static, takes knowledge of the object to which it is being applied through a special parameter, which is affectionately called this in C#, or Me in VB.NET.

  • Every code that is compiled is associated with a Application Domain. The only way to discard compiled methods is to eliminate Application Domain.

  1. Delegates rather point to the methods compiled, but which in turn are static... so when a delegate is destroyed the compiled method continues to exist. In addition to pointing to a method, a delegate may contain data, for example when a closure is made.
  • Note: delegates are classes, and refer to the data captured in a closure... therefore for Garbage-Collector to have the captured data it is necessary to undo all references to delegate

References and more study material: unfortunately all in English

5

I cannot speak specifically of the CLR, but the reasoning must be the same: every time you define a class, that class is represented in memory in some way. There is a single representation for this class - unlike its instances (objects), which can have several. All instances of a class share then the reference for that class:

Foo a = new Foo();
Foo b = new Foo();
a.GetType() == b.GetType(); // true

It is there - next to the class - that is the representation in memory of your methods. When you call a method in an object:

a.bar(10, 20, 30);

What is being done is the method class of a be called using as parameters itself a and the 10, 20 and 30.

Okay, but what about references to methods? You say that 'a method does not contain a "data" that can be saved', and often that is true. However, if you want to call him, you need a reference to him. This reference is roughly a pointer to the place in the definition of the class where this method is represented. Using this reference, you can call it without using a "plastered" value in the source code (hardcoded).

Furthermore, there are situations where there is data to be saved: in the case of a method "tied" to a specific object (bound method), or in the case of a function which is a closure of another function (i.e. defined within the other - and thus with access to the local variables of the "outside" function). Again, I’m not familiar with. Net, I may have oversimplified/taken some liberties, so I suggest you wait for a more precise answer. But what I wrote should give a basic idea of the logic behind it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.