What are Zero Cost Abstractions?
It is the ability to create an abstraction, to have a greater expressiveness than what you are doing, without incurring cost because of this.
In general when you create an abstraction and hide an implementation detail you usually use some mechanism that imposes some cost of memory or processing that is not always noticeable, but if you measure it is there, and accumulating with others in large quantities can make the application noticeably less efficient.
It’s a noble goal and I like it, but it’s rare to achieve effective result.
Origin of the term
It is very rare for a language to achieve many good abstractions at no cost. What you do is this cost be reduced, or be obvious, or pay only if you use the facility, but have a way of doing, even if more complex, without the cost.
What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any Better.
--Bjarne Stroustrup
The first sentence of the quote gives plenty of room for much to be accepted as ZCA. The second shows the hardest side to get.
The term began to be used more strongly by C++ (you can notice from the above quote of the creator of language), and it is also part of his philosophy. And marketing. I’m not saying that C++ isn’t efficient and often doesn’t achieve the goal of zero cost abstraction, But it hits less often than it looks. There are even articles and lectures on the subject, not everything is so free so even comparing only with C.
There are several C++ mechanisms that impose cost and when they launch it they say it is free of charge. In general it is very efficient, but without any cost it is usually exaggeration.
For example C, it has many features that we can say are abstractions of zero cost, this happens whenever that code produces an assembly exactly the same you would if you had to write it in the hand and with the ability to do as optimized as possible. If you cannot, if you put an extra instruction, if you need to take up some extra space, if you have to choose an instruction that costs more, if the necessary order is one that does not favour processor optimizations at the time of execution, then the abstraction cost was not zero.
I’ve seen definition that says the meaning would be:
costs are well defined, predictable and more or less controllable
This is a fallacy. Even when you consider that just being able to see the cost you will pay is already ZCA is still very complicated, I will cite examples that the costs are not so obvious unless you know the implementation details, which killed part of the advantage of being an abstraction. So a lot of people think of the term as a cake recipe and not a questionable fact. The term is not a scientific norm, it is a marketable expression.
If that were true then almost every language would have ZCA. It is true that C is a language that this is quite true (I will not talk about Assembly because there are no abstractions that generate an extra layer of nothing), but not 100% truth. All the others have much lower percentages, although some more than others, case of C++ and Rust, but not in all mechanisms.
If the term was that last quote then it should be Controllable Cost Abstraction or something like that.
C compared to C++ or Rust
Some will say that you wouldn’t do anything more efficient than that, achieving the same goal, since the goal is to have something easier to use. That may be true, but it’s an abstraction and it has a cost.
You can say that you don’t have to use it and only pay if you use it, there’s another, more efficient way. It’s also true, but you can’t say that that mechanism is cost-free, you can still say that language can be cost-free, but it’s only real free if you program as C.
Even this is an exaggeration because the C++ compiler generates slightly heavier code even if it only uses C mechanisms. It’s minimal, but it’s not zero.
It is the same with Rust, although in some things it is closer to C than C++. But it also has less powerful abstractions, Rust leaks more abstraction than C++ (if you want you can leak in C++ as well, and if you do it tends to get more efficiency in many cases).
That’s why language comparison tests don’t help much, you can write very efficient code in Rust without using abstractions and a less efficient in C++ using abstractions.
Examples
string
An example is the string
C++, very powerful and easy to use, but you can’t say that it is always free of extra costs compared to what is used in C. It usually makes a dynamic allocation that is very expensive without you even knowing (it doesn’t always do). And there’s optimization that increases cost elsewhere.
Most C++ implementations take up 24 bytes no matter what you use. Some think it should take up 32 bytes. So small strings with up to a certain size fit right there and everything is great, except for the fact that always have to analyze if it is a string short before doing anything. If it is big there is the pointer to the site of the heap where in fact the string. That sounds good, it has interesting earnings, but it has costs. That’s a good abstraction, but it’s not zero cost. It’s weird, the guy could be by value or reference. If you have several similar short texts you will have several instances, but if they are long it may be the same object. A simple reference can have 24 bytes? WTF! Puts this in a array and see the damage it does. Neither Java nor C# or languages script tend to be so inefficient at this particular point.
And it’s used a lot in places where you need to calculate a hash, you don’t pay anything if you don’t use it, but if you use it you will pay a higher price because you will have to calculate every time you need it, an indecency for a language that excels in performance.
At least that’s how it is in the standard implementation, if you don’t use it any of this costs, you can make one of your own without any problems and that doesn’t have the unwanted costs, but you will also have problems using it with every standard library that waits for the string pattern.
Have you ever heard that in computing, as in life, (almost) everything is tradeoff?
64-bit C# also occupies at least 24 bytes for one string (not counting the pointer on the original object(s) reference(s), and grows fast as it adds characters. It’s great abstraction, but it’s cost-effective, it always requires dynamic allocation, but if there’s no optimization for short text, it’s more efficient than C++.
Java has arrived to have 40 bytes in 32bits, I do not know how it is today, but this is absurd, in many cases to store 1 or 2 characters (it will be bigger).
C has none of that cost, you decide where you want to allocate, although some say you should never allocate to stack, or even within a struct
(some compilers don’t even allow it), all great, but it takes work, cause error, not always the programmer gets the best efficiency, for example he can store as value and at some point may have to copy the data that can be too big, to compensate, or use a reference where it causes lambança.
Not to mention the fact that you have to pass the size along with the text to make it right and be efficient (it’s not the way we see it that people use strlen()
everywhere, the lack of abstraction can generate an absurd extra cost). The concreteness of C does not free her from having bad costs.
Although it did not need an abstraction to solve this, with the abstraction becomes more pleasant and less prone to errors, that is, it is not the abstraction that C++ created that made it more efficient, it is the right mechanism, the abstraction is just a cover that helps something else.
For all of me strings are wrong in every language I know (many). This is a type that depends on many scenarios and each has a need, if you need to keep the cost lower with the best abstraction string should be a concrete type always below an abstract type, and should have several implementations. In general the staff considers this a defect of language, but only because it does not have the general abstract type that defines the basic contract than a string should do (it can even have a hierarchy of these abstract types), so whenever it does not need a specific feature, it accepts any string (most of the cases).
This is not cost-free because either it would need to be a polymorphic type that generates extra indirect, or it needs to be a parameterized type that generates multiple versions of methods that use different implementations of a string.
Code generation
Code generation in some way (template, macro, preprocessor, external tool, "optimizers") is usually a double-edged knife, because it tends to generate faster code by solving a lot of things at compile time, but the code usually gets bigger, which in some cases may mean some loss of performance as well, in another way.
Without the C++ or Rust code generation mechanism to achieve certain efficiencies it is necessary to write much more code, and redundant, which can generate several problems, so people prefer to pay a price of efficiency, for lack of abstraction.
Dependencies
In general the language libraries are very dependent on various parts, so if you use a function you can end up pulling a huge line of other functions and you don’t even notice, you pay a charge cost of that even without actually using, just because you had unused dependency.
Even C has this. Few people know, but if you use a library function, it goes whole. It doesn’t have to be that way, but all compilers organize objects so that it actually happens. If calling does not pay processing, but pays space on the executable, which may or may not be reflected in memory cost (it depends on how it is organized and the virtual memory), but if separating function by function there will be another cost, so some people prefer to make a single source file (amalgamation) and so have no additional costs, as far as.
Abstract type without cost
Another example would be to create a type that stores and manipulates temperature. In general it only encapsulates a concrete type and does not impose additional costs, in the end it is the same if it had used a primitive type in terms of efficiency.
Or almost that, because when calling some methods of this kind it is possible that they do something that you would not do at hand if it were specific. It is difficult to make abstractions without cost even in the most basic way. There are times it is good to use a way, there are times it is better something cleaner and straightforward. A simple verification of valid argument already changes the cost. Without abstraction you don’t need to have the cost every time, in some cases you can go straight without validating anything because you are sure of the validity. But if you need to validate you will have to write a code in your hand before calling the desired function.
Function cost
The function itself has an extra cost just to call. Of course there can be some optimization and it be eliminated, leaving no cost. But compiler optimization can not always eliminate 100% of the cost. And in some cases it doesn’t pay to optimize, you will pay the price, and it will be minimal, derisory in all, but it won’t be zero.
And there are optimizations that can cost even more, so the use of a good compiler is indicated when you want maximum performance. Nowadays it is stupid to try as much optimization as possible for everything. And the cost of just trying is usually high and does not pay. You’ll live with some inefficiencies. And of course, it’s rare to have a problem that needs the last mile of performance.
Solves all computing problems, right? But never at no cost, the indirect is an inherent cost. There are cases where the indirect is indispensable, even in Assembly as optimized as possible have to use. But there are cases that are only there for a facility, these cases are cost abstractions not zero. And the function is only one of them. And the non-use of the indirect can hinder some other optimization, so it is difficult to evaluate.
LINQ
C# has LINQ, it’s cool, many languages have copied, but it has a nice built-in abstraction cost. It has language that can do something more efficient like LINQ without abstraction cost, but only in the simplest methods (it is not impossible to do in others, but it needs an absurd analysis, maybe even use AI).
In Rust for example this (0..1000).sum()
becomes a constant and actually has zero cost, in C or Assembly you would have to give up the abstraction and do in hand the optimization. In C# you will normally pay the running cost, in theory something simple is like this, could make an easy optimization, but in C it is not part of the philosophy of the language to go so far in a case like this, unless you use a macro for this, but would not have the same function that LINQ has.
Garbage Collector
How counterpoint this is a case that you almost always pay even without using. More or less. In certain languages it really is always true, there are some that allow you to turn it off and you are on your own, it is for rare scenarios, but there are ways. In C++ or Rust you only pay if you wear a suit and tend to pay only what you use, it is opt-in. In Rust you only pay if you use the Box
or if you use the Rc
or the Arc
, each with its own cost.
But as I said at other points, this always at the cost of leaking abstraction.
Comparing high level languages
Languages of script don’t care much about typing, where the data is being allocated, if you access it by value or reference, so they are easier, everything is abstracted and you pay for it, and you have no choice, having choices makes it difficult.
And this generates more costs than people realize, all accessed data is a potential danger, always needs a complex processing just to take the data (some cases there are optimizations). For example, these languages do not usually have objects but rather structures of dictionaries which pass through objects, this is a cost in several aspects.
Java has already opted for a path that you have more type controls from where you allocate, but right on top. It is not equal C#, especially in newer versions the control is much higher, you may not pay certain costs if using the most efficient mechanism.
In C++ and Rust this goes further and gives more freedom of choice, eliminating much of the costs, and how it is compatible with C, if you choose this subset of the language can have less cost yet, only losing, only in some cases, to Assembly. Not all decisions are as you would like for a certain scenario.
You being able to give up abstraction doesn’t mean it doesn’t cost anything, but some people think it does.
It is something that the programmer needs to have knowledge to use or is indifferent?
Yes, and I don’t. A good programmer who cares about this should know and ideally he should research and see if he is really fulfilled what was promised, I have already said that there is a lot of marketing in this.
If you really need to squeeze everything you can you need to understand all the costs of all the mechanisms you are using and you can do it by hand in a better way to achieve the ultimate goal of maximum efficiency. Not everything you’re doing needs this.
But who has to worry more is who makes libraries for others to use in an environment where it is valued by zero cost abstraction. If your library does not have zero cost any application using it, including other parts of the library will cost zero, it is contaminant.
It really is a 100% positive point?
As far as I know, it itself, yes. Of course, if it does right, and if it is true and not illusion, or if it exists at one point but harms another.
Being ZCA at one point can prevent being at another, see the example of string
of the C++ I mentioned. Note that others have some costs to avoid certain costs. But you can always claim that you can do it like it is in C and have zero cost or make a kind of your own where you have a cost only in what you actually use. True, but that goes for any language, in C# you can create your type string with the compromises you want, the difference is that the normal type goes together even if you don’t use it, although this will be true in almost all languages, you can only give up such a present type if you give up everything that the standard library has.
There is no undesirable effect on language usage because it has this feature?
Generally and clearly it does not, but the language tends to be a little more complex to achieve this, I do not know if it is something bad, and even if it is, it is still something that may be necessary for other reasons.
The construction of the library is often much more complex and the internal code is often very confusing because of this. If it’s not such a confusing code, or the language is too good, even to be true, or it doesn’t cost zero at all.
There are compilers who make exceptions to the compilation when they know that they need to work at zero cost or at least lower than normal. This is not always good, but it gives an interesting advantage, it is a way of not creating complexity in the language to treat very specific cases. A typical case can be a string, a widely used type that requires out-of-the-box optimizations, and that can be easier because he knows in depth what he’s dealing with.
You can write much more on the subject, almost every sentence opens the door to new questions.
It’s the opposite of what’s trending in multiple layers, which is Zica (Zillion cost abstraction™) - Examples: Docker where you don’t need it, use of Microservices indiscriminately, "desktop" application with Electron, "ERP" web, PDO, OOP in PHP, Laravel, Ror, jQuery for everything that’s on the side, and the list goes far...
– Bacco