How does CPU cache performance work?

Question

How does CPU cache performance work?

Asked 5 years, 9 months ago

Viewed 74 times

3

Recently I discovered that it is possible to get a huge performance when using the CPU cache. An example I saw was a program that reduced its runtime from 10 seconds to 200 milliseconds just using this concept.

How this performance achievement works?

1 answer

Browser other questions tagged performance cache language-independent cpu

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2019-10-06T18:04:06+00:00

For the programmer this matters little, even more if not using languages that allows very large control of memory, even these can not make use of cache directly, it is the processor’s prerogative to take care of it. What you can do is use objects in a certain way so that it is more likely to be cached, which is something difficult to do and does not compensate in most applications.

For the data to be used quickly it needs to be in the register, but not everything can be there, there are few. So there’s a memory nearby that has a quick access as well, but there’s a process to get the information that has a cost. It can’t all be one thing just because the distance would take the dice to answer, after all the way to go is greater. And if everything was at that distance everything would be slower, then the processor territorializes the areas according to the physical distance.

In modern processors there are usually some levels like this, so there is a slightly larger memory next to this last memory, which is a little farther away and therefore slower. Then there’s another level next to this one a little farther away that has a bigger capacity and is a little slower. It may be up to another level, but it doesn’t usually make up for it, they’ve tried and given up.

Then there’s the RAM that’s no longer in the processor, and there can be other intermediate forms. The closer it gets to where it is processed and the simpler the mechanism, the faster it is. RAM is still a cache, but the question focuses on the processor.

In the case of the processor everything is transparent, it will put closer to the register what is most used and what is most likely to be used at that time. Instead of accessing a slower part it can access in the faster part, this is the cache and this is what gives more speed.

It has several techniques to facilitate this and are not always intuitive, so it is not possible to say that it has a simple and reliable recipe. In general the ideal is to have better reference location (that no one gave a good answer, if it doesn’t roll now, I’ll post something).

The term CPU here is not good because it has no cache, only the processor has.

May be useful: What makes cache invalidation a difficult solution?.