What is Garbage Collector and how does it work?

Question

What is Garbage Collector and how does it work?

Asked 8 years, 1 month ago

Viewed 3,969 times

21

What is Garbage Collector, how it works?

When should we care about him?

1

Related: At what times it is necessary to force garbage collection in C# for better application performance?

– vinibrsl

2017/11/16 at 10:59

2 answers

11

I will answer in general and use the CLR GC as the basis. The second question has already been answered.

Memory management is something very difficult.

There is a definition that only 3 things revolutionized software development: high-level language, modularization and automatic memory management.

While we allocate in the stack everything is easy and the management can be done automatically by the language. But there are numerous situations that the life time of the object requires it to stay in heap. There the application is responsible for releasing the allocation.

It is common for the programmer to forget or err in when releasing this memory, including because there are situations that it is even difficult to control in code when release is possible. So there may be memory leakage or release something that is still in use.

There are several techniques that can automate the release of memory. Strictly all of them can be called garbage collection since it frees up the memory of something that no longer needs to be used, so it’s that turned to garbage.

Some people consider that not everything that is automatically released is a collection. Some consider only the collection that is made after the object is no longer necessary. It is from this collection that we will talk. I am not going to talk about the collections made by smart pointers that manage totally or partially the lifetime in a single way or by reference count. They are useful, they serve a lot, they have advantages, but it is not the case to discuss here. There are situations that they are not suitable.

The Garbage colletor that everyone speaks is a complex memory management mechanism that is responsible for the allocation and release of memory. It is he who decides where to place the objects in the heap and he also decides when to release the memory and in what way.

The mechanism used is memory status checking to identify what is in use and what is no longer. In this way, there is no risk of something being left behind.

This is a huge simplification for the programmer because he can make whatever mess he wants and doesn’t have to worry about memory (or almost).

This is what we call managed memory, the basis of the .NET. philosophy You can’t corrupt memory and don’t have leaks (without taking certain precautions).

Of course this has a cost, it has disadvantages.

This model has a certain memory waste. It only frees the memory from time to time, usually it does not return all or no memory to the operating system.

It runs in a non-deterministic way, you don’t know when it will run. There are ways to have some control in some mechanisms, but it is usually misused and does not have all control.

Because of this, releasing external resources to the application can occur late if you have no other mechanism to control this.

Almost always it generates breaks, some can be long. Of course it depends on the quality of the mechanism.

There are other more specific details that make it undesirable in some situations. I keep improving.

But in the environment of several threads, where there are exceptions, objects circulating through the application with no clearly defined lifespan, where there are abstractions that hide certain allocation effects, where there are data structures with circular reference, it is very difficult to function without this management.

The way it allocates puts nearby objects always reducing memory fragmentation which is a huge problem in manual or automated management directly. Near objects guarantee the reference location and the cache is used more efficiently giving more performance in most cases. Trying to do something similar at hand takes more work than writing a GC. So it is possible to have these gains without a GC, but it is almost always unfeasible.

You work with memory as if it were infinite.

The collection takes place looking at what we call roots (Roots), then it starts by looking at the processor registers, the static area of the code and the stack of the application (may be stacks if you have other threads). From there he builds a graph of referenced objects. In each object he encounters he may have other references to other objects, and so he recursively goes into the heap. This is called the phase of mark.

Then comes the phase of memory liberation. It has several techniques of doing this, among them the Sweep, that will release even all objects that have not been marked with assets, has the copy that copies to another area what is still active and kills all that existed and has the Compact, where it does the copy in a specialized way. This is what . NET uses. Plus it uses a generational compactor.

With memory generations the breaks can be reduced, the allocation can become very efficient, much more than done with manual or automated memory management, and allows to have different collection strategies for each generation, producing the best result.

A perfect garbage collector could be more efficient than manual or automated management in almost every case. Of course we can adopt manual strategies to be very efficient, but in practice would create mechanism even more complex than the so-called tracer collectors (tracing Garbage colllector).

. NET solves the non-deterministic problem using an availability standard (disposing). This allows the resource to be released before the GC runs whenever the object is no longer needed. Of course, if this does not occur the release will be done by the GC, which will make the pause larger and leave the resource leaked for another time than it should. A file can stay open, for example.

. NET solves the problem of overhead copy having a separate area for large objects, so only the smaller ones are copied.

But generations cause another problem. If the GC is called too much, it tends to play for subsequent generations objects that have short lifespan, which is far from ideal since each generation tends to have more pause and tends to have more overhead. That’s why you shouldn’t call him manually.

Obviously the specific way the GC works is implementation detail. GC gives some guarantees, otherwise the programmer can not use these implementation details.

One of the cool things about GC is that memory allocation takes place with equal or similar performance to what you need in stack. In the stack Leasing only increments a pointer that usually stays in the register, is very fast. In generation 0 can be done the same way. Allocate in sequence only incrementing a pointer.

In contrast to manual allocation (automated allocation is still a manual allocation, only more abstract has a cost that is not trivial to allocate). He needs to find a location. It has efficient algorithms paying another price or memory waste, which can be worse than GC, or very high memory release cost. There are optimizations that can be done, but it’s a lot of work. It is much worse if the memory always has to ask the operating system for the memory it will allocate. It also gets much worse if it has several threads because the lease needs to be stopped, gives a overhead dog’s.

The GC of the. NET has an area for each existing processor, so an allocation never occurs concurrently and can be naturally atomic without blocks (Locks). Is very efficient.

Generally, each of these Gen0 areas starts with 256KB. But this can be adapted as the execution identifies that it can be more efficient with another size, reducing the pause time or decreasing the number of breaks according to the garbage generated pattern.

When this area fills up, it fires a collection. Then the marking phase is done and copies everything that has survived in this area for Generation 1. Java has a copy strategy for an auxiliary area still in Gen0 before giving more time in this generation. This is important because Java produces much more garbage.

If all goes well little thing is copied. It is very common objects have very short life.

When there is a copy, all references to it need to be updated to the new address where the object is.

This is a beautiful one overhead and harms the cache since it has to access data that is not actually being used by the application.

In Gen1...

In order for the CG to function well help is required from the compiler, it is necessary to have structures with additional information. If he didn’t have it, he could just use what’s called a conservative GC, where he only releases memory if he’s sure that it’s a reference to an object and he doesn’t always know, so a lot of memory leaks out. In practice it cannot be used.

Other languages

. NET is usually better than current Java because it encourages the use of stack more than the heap, and is doing this more and more. It seems that there is the intention of Jitter or even the compiler to optimize itself and put in the stack some things in the stack when it identifies that the object is small and has guaranteed life time only in the stack. Java GC and Jitter is smarter since he usually abuses more than heap.

One thing about C++ is that it doesn’t need a garbage collector because it generates very little garbage. This is not entirely true because it is less, but it is not so little. It takes trouble not to generate so much garbage in many cases, there is a GC, it just is not tracer, and often not efficient.

Completion

Make no mistake, I’m not saying GC is better than other ways, but it’s not as bad as they say. There are cases where he can be better, where he is worse, the difference is not so absurd and almost always makes no difference.

To learn more has the our dear tag. Especially the GC to the C#.

Obviously, there are more specific questions about the various types, techniques, and specificities about Garbage.

Much of what is still missing can be read in other answers such as How to identify and avoid memory Leak on . NET?.

This is going to be one of those long answers, but I’m going to do it slowly. The links will come later. Calm that I will still organize the text.

Excellent explanation

– Caique Romero

2017/11/16 at 13:05
"This will be one of those long answers, but I’ll do it slowly. The links will come later. Calm down I’ll still organize the text." - I guess you don’t need that warning anymore, huh?

– Victor Stafusa

2017/12/15 at 06:34
@Victorstafusa forgot it, it’s half over yet, I’ll try to finish hj or in fds, thank you :)

– Maniero

2017/12/15 at 09:56

Browser other questions tagged c# .net memory memory-management garbage-collector

You are not signed in. Login or sign up in order to post.

by Luiz Santos • **3,162** points · Answer 1 · 2017-11-16T11:05:17+00:00

Authorship note
The content below is mostly composed of excerpts from a article originally published by Macoratti on its website. Reproduction authorized by the author.

The garbage collector (Garbage Collector) of . NET Framework manages the allocation and release of memory for your application. Each time you create a new object, the Common Language Runtime allocates memory to the object from the heap managed. While address spaces are available on heap managed, the Runtime continues to allocate space for new objects.

However, the memory is not infinite. Eventually, the garbage collector must perform a collection in order to free up memory. The garbage collector optimization engine determines the best time to collect, based on the allocations made.

When the garbage collector performs a collection, it checks for objects in the heap managed that are no longer being used by the application and performs the operations necessary to recover your memory.

Thus, garbage collection is a process that automatically frees the memory of objects that are no longer in use. The decision to resort to the destruction process is made by a special program known as the garbage collector (Garbage Collector). However, when an object loses its scope at the end of the method Main(), the destruction process is not necessarily invoked.

Thus, you cannot determine when the destructive method will be called. The garbage collector also identifies objects that are no longer referenced in the program and frees up the memory allocated to them. You cannot destroy an object explicitly in the code. In fact, this is a garbage collector’s prerogative, which destroys objects for programmers. The garbage collection process happens automatically. It ensures that:

Objects are destroyed: It does not specify when the object will be destroyed.
Only unused objects are destroyed: an object is never destroyed if it retains the reference of another object.

C# provides special methods that are used to free the instance of a class from memory, they are : Finalize() and Dispose().

Finish()

The destructive method Finalize() is a special method that is called from a class to which it belongs or from derived classes. It is called after the last reference of an object is released from memory.

Dispose()

The method Dispose() is called to release resources, such as connecting to a database, as soon as the object using the resource is no longer being used. Different from the method Finalize() the method Dispose() is not automatically called and you need to call it explicitly from a client application when an object is no longer needed. The Idisposable interface contains the method Dispose() and so to call this method the class needs to implement this interface.

Comparison:

Source