Memory allocation in C# - Value types and reference types

Asked

Viewed 4,996 times

51

In C# there is a difference between the form that memory is allocated by the CLR for reference types (classes) and value types (structures). The difference, from what I’ve always heard, would be that value types are stored in the stack and reference types in the heap. It occurs, however, that value types can also be stored in the heap through Boxing, or being members of a reference type.

Thus, the distinction between value types and reference types becomes not exactly the place where each one is stored. In this way, how to differentiate more correctly value types and reference types? How does CLR actually manage memory allocation for instances and classes and structures separately? Also, why is there such a distinction? After all, classes and structures, although they have differences are very similar to each other.

2 answers

39


The location at which memory allocation is made is only determined by its lifetime and not by the data type.

  • In C# there are instances of two forms of values: value types (the value itself) and references to other instances.
  • There is the concept of "storage sites" that can store values.
  • Every value manipulated by a program is in a "storage location".
  • All references, except for null and void, point to a "storage site".
  • Every "storage place" has a lifetime where its content is valid.
  • The code of a method may require a "storage site".
  • If the "storage place" is needed only during the activation time (simply, period in which the method is running), it is called the "short life". If the "storage place" is needed for a longer time than this, it is called "long life".

Reference types

Types that are by reference (Reference types) really has your data (the object itself) in heap (common memory to the entire application managed by Garbage Collector). This is an implementation detail, but it certainly works that way. And of course, these types have a reference (a pointer, a memory address) that is stored somewhere ("storage location").

The reference itself (not the object it points to) is a type of value (value type) also.

When the compiler or the Jitter cannot determine for certain the lifetime of a "storage site", the safest way is used, therefore the heap is used.

Value types

Types by value should be immutable. Porting any transport of its value must be done through full copy of its members, except if it is explicitly determined that this must be done by reference (notably with modifier ref in method arguments, and return from C# 7).

A curiosity:

struct Textos {
    private string Texto1;
    private string Texto2;
    ... aqui vão os construtores e métodos/propriedades de acesso e alteração dos membros, garantindo a imutabilidade ...
}

string is type by reference. You think there is some error or "bad practice" in creating this structure which is value type?

No problem. The structure is short (8 bytes in 32 bits or 16 bytes when it is in 64 bits), is immutable, and possibly has other desirable characteristics for a type per value (as the example is not complete, we can only imagine this).

Within this structure there are only two references to strings. Nothing more than that. The texts that are the values of these strings are stored in the heap (even this is a little more complicated because of the interning). For all intents and purposes Textos is a structure that only holds two references (pointers).

Memory usage

A structure has no overhead from memory. The size of a structure is always the sum of the sizes of its members (remembering that in the example above the size of the members is the size of the reference), also taking into account the data alignment.

Types that are stored by value may be in several places.

Look for reliable sources will see that the allocation of these types is a little more complicated than most programmers . NET understand.

  • The value (which is the object itself) may be in a register by optimization of the Jitter. The CLR know how to handle it. Enumerators are usually put in register to optimize loops.
  • The value can be in the pile as everyone imagines, this is very common. These are the data that are directly linked to the methods (you access them through local variables).
  • The value may be in heap for being enveloped in some other type, a class, a array, etc..

The last point deserves more details.

  • If a value type is part of another Reference type as a member, where this value will be stored?

    Now, if a class (a type by reference) is assembled as a sequence of members of other types, the actual content of this class (the object and not its reference) is in the heap.

  • If a type by value is part of another type that is certainly stored in the heap, how it might be stored in the stack?

    Simple, it is not. See:

     public class Carro {
         public string Nome;
         public int Status;
         public bool EhNovo;
         public DataTime DataDaVenda;
         public Decimal ValorDaVenda;
     }
    

    The example has flaws but what is important is that each instance of this class will have stored:

    • 4 bytes of member reference Nome;
    • 4 bytes of the member Status;
    • 1 byte (theoretically disregarding limb alignment) EhNovo;
    • 8 bytes of the member DataDaVenda;
    • 16 bytes of the member ValorDaVenda.

    All these spaces allow you to store your values in the heap, where a class instance Carro will be stored. In the case of the member Nome possibly have yet another part stored in the heap - provided that it is not a null reference - but that it may be somewhere else in the memory and not together with the instantiated object of type Carro, just by coincidence may be in sequence.

  • What if the class is a list of integers? Something like:

     var notas = new List<int> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    
  • These whole ones, which are value types, are part of the collection’s content. Where they will be stored?

    In the heap also. Internally, ultimately, all elements of this list are stored in a array, which is only a data sequence of the defined type. The List is a class, therefore a Reference type and all your data is on heap.

  • And if I capture a local variable like int of a method in a delegate which will be exposed to other parts of the application. Where this int local will be stored?

    In the heap also. It is the only way to keep a die alive that was local and survived through a delegate. When this local data is captured by a delegate, forming a closure, it becomes part of the class that encapsulates the delegate, through a reference, so it is part of something that is in the heap and its life time is controlled by Garbage Collector (may be collected when the delegate is not referenced by the application).

As it turns out, there are several ways in which a given by value can be in the heap.

The reference itself to objects (the concrete data) are values as well. I think it is already clear that the reference itself can also be in these three places.

For example, when you have a list of strings (List<string>), what is stored in this list are the references for each of the strings, for each of the texts.

  • If you want to calculate the size of one List<string> with 1000 null elements, roughly - forgets the other members of the object List and the overhead that every object by reference has - what size would be occupied in the heap by this list on a 32-bit architecture?

    Each reference will occupy 4 bytes (32 bits == 4 bytes) per element in the list. 4 x 1000 elements = 4000 bytes. That’s it.

    And the data of strings itself? If all are null, there is no more consumption. But if you initialize all 1000 strings with some text, the overall memory consumption will certainly increase because 1000 objects of the type string should be allocated. But the consumption of the object of type List will not change anything. References continue to occupy the same space. Of course, the contents of all these references were 0 (the address agreed to indicate null) and have other values of the memory position where each string was allocated (some could even be already allocated by interning, but that’s another story).

We can conclude that value types which survive the execution of a method are stored in the heap. This is the correct conclusion. And it’s amazing how the myth of the relationship between the object’s allocation and form of representation survives.

Virtual memory

The thing gets a little more complicated because any data might not even be in RAM, it might be in a mass storage form. There is another common misconception that everything we have allocated to memory will end up in physical memory. It goes into virtual memory that can be physically on a hard drive for example. But that’s another matter.

Completion

"Everyone "knows" where the data is allocated and this is usually irrelevant. Knowing that the important thing is the life time of the object is not always part of the programmer’s science.

Other strategies are possible. I stress that all this is detail of current CLR implementation.

Always remember that the shape or type of a value and the "storage location" are distinct concepts.

Behold this answer to better understand the stack and the heap.

References:

Part 1 and part 2 of Eric Lippert’s article.

  • 1

    Knowing that the important thing is the life time of the object is not always part of the programmer’s science, I believe that. I’ve asked some questions related to this to other programmers and could not answer me =/

8

The answer accepted is very good, but a deeper dive in a few points below.

Edited: I removed the recommendation about the string in the struct, it didn’t make sense.

Thus, how to differentiate more correctly value types and reference types?

Value types are considered literals or types that implement deep copy. The number 3 a string "foo" are values, they cannot be changed, there is no need to create a new representation for the value 3, there is only one representation, so when assigned, a copy is made generating another memory space.

The string has a particularity, it is unchanging, people forget that there is only one representation of it in memory, so the assignment of the string to avoid wasting time, copies its memory address and not a new copy of it. Note that "cat" + "dog" implies "catdog" which is a completely different representation (another memory), making concatenating many strings in series an onerous operation.

C# brought the struct from C++, making the struct have the same semantics as value types, that is, it is a copy Shallow that is intended to be a deep copy, the responsibility is for the programmer to ensure that no reference is copied in Shallow way, compiler is responsible for copying the struct during assignment creating a new memory space.

Rule of thumb: Structs are composed only of value types, copying a struct is generating a deep copy of it and its members.

Structs are usually used for purely value types, using references within struct is considered a bad practice, because from struct it is expected that the copy generates new memory addresses and no references to it.

Otherwise use the class to represent your entities.

How CLR actually manages memory allocation for instances and classes and structures separately?

Beware, C# is highly influenced by C++, but without some features that were considered unsafe or impractical.

Notice the use of struct without new, it is a valuetype is in the class signature:

using System;
using System.Reflection;
using System.Linq;
using System.Collections.Generic;

namespace Linq
{
    struct Teste
    {
        public int t;
    }

    class Xpto
    {
        public int t;
    }

    class Program
    {
        static void Main(string[] args)
        {
            Teste t;
            Teste x = new Teste();
            Xpto xpto = new Xpto();

            // Ok, compilador inicializou por nós.
            Console.WriteLine(x.t);

            // Exception, valor sendo usado sem ser inicializado.
            Console.WriteLine(t.t);
        }
    }   
}

Notice how you can use the struct in the stack without creating a new instance, but in this case the compiler does not initialize the Fields. Most programmers prefer the new, which they keep putting in the stack (in this context). Let’s look at the icing on the cake, the CIL.

.method private hidebysig static 
    void Main (
        string[] args
    ) cil managed 
{
    // Method begins at RVA 0x2058
    // Code size 27 (0x1b)
    .maxstack 1
    .entrypoint
    .locals init (
        [0] valuetype Linq.Teste x
    )

    IL_0000: ldloca.s x
    IL_0002: initobj Linq.Teste
    IL_0008: newobj instance void Linq.Xpto::.ctor()
    IL_000d: pop
    IL_000e: ldloca.s x
    IL_0010: ldfld int32 Linq.Teste::t
    IL_0015: call void [mscorlib]System.Console::WriteLine(int32)
    IL_001a: ret
} // end of method Program::Main

The new did not create a newobj as it does with the classes, this was a decision of C#, if we look calmly at the signature of the structs they do not derive from the Object directly, but from Valuetype. Even if this derives from Object new semantics was inserted to differ from objects, without removing from them the behavior of Object.

.class private sequential ansi sealed beforefieldinit Linq.Teste
    extends [mscorlib]System.ValueType
{
    // Fields
    .field public int32 t

} // end of class Linq.Teste

The problem arises when you bring the maximum to the fore: All types of value are in the Stack. Then I won’t rain on the wet, read the Moderator post, because in C# it is allowed that the compiler decides how to prefer, the language does not use raw pointers and does not matter to the programmer where it was allocated. Interestingly, CIL is a STACK based language, but the C# compiler can box/Unbox value types, and this behavior should be avoided for performance issues. Fields are part of the entity and obviously stick together with it in the heap.

A good compiler when creating a local scope value will always allocate to the stack, unless some major factor exists, but that really doesn’t fit the C#programmer’s concern. The use of Garbage Collector takes away from the programmer’s shoulders to be afraid of losing the value of the stack at the end of the scope, this problem exists in C++. C# is compiled in CIL, in CIL it is transitioning value from one stack to another, and memory is collected in a timely fashion.

Besides, why is there such a distinction? After all, classes and structures, although they have differences are very similar one with the another.

Similar yes, equal no! It is very useful to use structs when we think in the direction of functional programming and immutability. When you want a copy of values to be a copy and not just a pointer to the same memory, use struct, but respect the above to keep the expected correct behavior.

If you compare that in C++ they are almost equal, C# gave more function to the struct. The struct in the background is a C inheritance pulled into C++ by compatibility and incorporated into C# to represent an aggregation of values.

  • 1

    Your answer is good but has some problems. Poses string as a type of value and it is only immutable but it is a type by reference (then even shows this). Value types can and is very common to be allocated in heap. Even after showing that he can, he makes this claim that it is false. God’s Life Time/Storage Place Data interests the programmer who wants to understand and use the language resources correctly without buying myths.

  • The concept of CIL being stack based is not related to the storage of values, is the same word used for different things (would be long explain here). There is a misconception about the functioning and the relationship between stack and GC/CIL, but it’s already getting too long to explain here. Just to be clear (you’re not saying otherwise but people don’t always interpret the text correctly): classes can be used in functional and immutable programming.

  • 2

    Says it’s bad practice to put reference types inside structs. Bad practice is doing something you don’t know how to do. One of the members of . Net/C# does not consider bad practice if you know what you are doing. http://stackoverflow.com/a/945681/221800. The . Net keeps doing this. This one website is all based on this. Why isn’t there this recommendation in C++ and some people recommend not doing this in C#? Because the C++ programmer is expected to know what he is doing. It is normal to tell that the C# programmer does not know.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.