78
What is the complexity of an algorithm? And what are the ways to measure it? (Big O, Big Theta...)
78
What is the complexity of an algorithm? And what are the ways to measure it? (Big O, Big Theta...)
68
The complexity of an algorithm has to do with how much time and memory this algorithm takes according to the size of its input. For example, we want to answer questions like "if my algorithm takes 1 minute to process a 1000 byte input, how many minutes will it take to process a 2000 byte input?"
A very direct way to calculate complexity would be to find some formula that gives the exact number of operations done by the algorithm to arrive at the result, depending on the size of the input. For example, in the algorithm
for(i=0; i<N; i++){
print(i);
}
we could say that the time spent is
T(N) =
N*(tempo gasto por uma comparação entre i e N) +
N*(tempo gasto para incrementar i) +
N*(tempo gasto por um print)
However, it takes a lot of work to make a super accurate account of these and it’s usually not worth it. For example, suppose we have worked hard and discovered that a certain algorithm takes time
T(N) = 10*N² + 137*N + 15
In this case the quadratic term 10*N² is more important than the others because for virtually any value of N it will dominate the sum total. From N 14 the quadratic term is already responsible for most of the run time and for N > 1000 it is already responsible for more than 99%. For estimation purposes we could simplify the formula for T(N) = 10*N²
without missing much.
Another point where we can simplify our formula is the constant factor by multiplying the N²
. To predict how fast the runtime grows depending on the input no matter if T(N) = 10*N
or T(N) = 1000*N
; in both cases double the N will quadruple the run time.
So all of this, the most popular way to work with time and space complexity in algorithmic analysis is to asymptotic complexity, that ignores these constant factors and the slower growing terms. It is now that enters the story of O-grande, Θ-grande and Ω-grande: these are notations that we use to characterize an asymptotic complexity.
Let’s start with the big O, which is a way to give an upper limit to the time spent by an algorithm. (g) describes the class of functions that grow at most as fast as the g function and when we say f O(g) we mean that g grows at least as fast as f. Formally:
Given two functions f and g, we say that f O(g) if there are constants x0 and c such that for all x > x0 valley f(x) < c*g(x)
In this definition, the constant c gives us scope to ignore constant factors (which allows us to say that 10*N
is O(N)
) and the x0 constant says that we only care about the behavior of f and g when the N is large and the terms that grow faster dominate the total value.
For a concrete example, consider that time function f(n) = 10*N 2 + 137*N + 15 from before. We can say that its growth is quadratic:
We can say that f O(N²), since for c = 11 and N > 137
10*N² + 137*N + 15 < c * N 2
We can choose other values for c and x0, for example c = 1000 and N > 1000 (which make the account very obvious). The exact value of these pairs doesn’t matter, what matters is being able to convince yourself that at least one of them exists.
In the opposite direction of the O-big we have the Ω-big, which is for lower limits. When we say f is Ω(g), we are saying that f grows at least as fast as g. In our recurring example, we can also say that f(n) grows as fast as a quadratic function:
We can say that f is Ω(N²), since for c = 1 and N > 0 it is worth
10*N² + 137*N + 15 > c*N 2
Finally, the Θ-big has to do with fair approximations, when f and g grow at the same pace (except for a possible constant factor). The difference between Θ-large for O-large and Ω-large is that they admit loose approximations, (e.g., N? O(N³)) in which one of the functions grows much faster than the other.
Among these three notations, the most common to see is that of the O-grande. Usually complexity analyses are concerned only with the runtime in the worst case so the upper bound given by the big O is sufficient.
Tip from a great book on the subject: Cormen, Thomas H. Introduction to Algorithms. MIT press, 2009.
To tell you the truth, you should find this discussion of O-grande in any good algorithm analysis book. I personally am not such a fan of Cormen. I find it kind of boring to read.
38
If you want more theoretical, more accurate information on how this works, the hugomg response is more appropriate.
Have this answer as an alternative for those who have difficulty or lack patience with the theory that is important if your goal is to get accurate and correct results.
But it is important to point out that the most theoretical definition is rarely actually used in the "real world", even if someone insists that it is a mistake, it is nevertheless a fact that is how I demonstrate below that people interpret these concepts and this form works for most cases. I am answering because the subject is important to everyone and also needs to be more accessible. There is no point in correct information when a person cannot understand it.
Enjoy to have in your Bookmark one table of complexities.
Algorithm complexity is the amount of work required to perform a task. This is measured on top of the fundamental functions that the algorithm is able to do (access, insertion, removal, are most common examples in computation), the volume of data (amount of elements to process), and of course, the way it arrives at the result.
This amount of work has to do with time but can also refer to the amount of memory needed to ensure its execution.
You can compare different quantities. You establish complexity by measuring how the algorithm behaves when it needs to manipulate, for example, 10, 100 and 1000 elements. If it can do an operation in all three cases, the algorithm is constant, if each one needs 10 times more operations it is linear and if you need the square of the number of elements, it is obviously quadratic. I remember that they are just some illustrative examples. It is not difficult to notice how this can make a huge performance difference if you have large volumes of data. In algorithms of very high complexity up to small volumes can perform quite badly.
I found this response in the OS that I thought was very accurate and informs well what is most important on the subject. As demonstrated in it the most correct terms and which also changes the perception of what information is upper bound and lower bound, that effectively are not just synonyms for worse and better case.
In theory any formula may be within the Big O or Big Omega or Big Theta notation, but some are more common.
Big-O is the most used, even if naively. See in Wikipedia most common formulas. I would highlight as main: constant (a if
is an example of constant algorithm, a array and a hash or dictionary is an example of data structure that has constant access), logarithmic (binary search is the best example), linear (a for
simple, a linked list in most implementations are examples) and quadratic or exponential (for
s nested).
You cannot measure different things to compare them. You cannot compare a numerical addition operation with a list sort.
We usually refer to Big O as the worst case and even more rarely Big Omega as the best case (usually not relevant information). And still Big Theta when the previous ones get mixed up.
This section should not be taken too literally, it does not have a strong theoretical basis, it is only a way to facilitate understanding by those who have no formal study on the subject and want to have a bird in hand. I’m making simplifications. There’s no room to explain all the details to variations. I’ve just brushed a few aspects.
It is very common to call it Big-O when the information is about the average execution, even if this is not the most accurate definition for it. It is also common to make approximations or disregard exceptional cases. For example, a hash usually has the worst case as O(n), but in general it is said that he has O(1) because this is what usually happens. O(n) occurs only in extreme cases.
In many cases it is better to stick to the average case although it would be important to know before if there really is good distribution to ensure that the average occurs in practice. So the most common is to use the typical case as a reference and not the worst case.
Another important point is that this is not a measure of the actual speed of the algorithm, it depends on practical factors that may vary in ways that cannot be formally specified. This variation can occur depending on the specific implementation of the algorithm and/or architecture it is running on (see a real case report in the comment below mgibsonbr), set and capacity of the processor’s instructions, prediction of branch, re-order of execution, cache, locality, cost of administration of other operations and other more "exoteric" factors (difficult to determine beforehand as need of swap on disk, competition and failures) can make an algorithm behave more or less efficiently in terms of actual speed. In theory these things do not exist. In practice...
Then, in theory and in some cases in practice, an O(1) may be slower than an O(n) or even an O(n m), although in the latter it would be quite rare. If this single operation required for a constant algorithm takes 100ns
and the individual operation of the algorithm which has complexity O(n m) takes 10ns
with N and M worth 2
, it is clear that the second algorithm will run faster, in this rare case. The magnitude of the data makes the execution curve not far enough to make a difference.
But it is more common for you to have an O(log n) faster than an O(n) even for a larger data volume, after all the logarithm usually approaches low numbers. But of course it may not be worth any of this for the above factors. Algorithm O(log n) usually has a problem of reference location so O(n) turns out to be faster in many cases.
Looking at another aspect an O(1) may be slower in some cases than an O(log n) or even greater complexities such as O(n). It may seem strange that an O(1) may be slower, but it is normal because this number indicates the amount of operations needed to achieve a result. It does not talk about the size of this operation, how long the individual operation takes to execute. hugomg objected to this in the comments. I agree with him that this is wrong, but you need to warn everyone about this. People make approximations since they are enough to understand where they will arrive in most cases. An O(1) can lead 100ns
to find an item in a data structure and an O(n), where N is 5 can take only 50ns
if each operation takes time 10ns
. Is that wrong? That may be the way people use it in practice. That’s how algorithms usually get out in the general audience. And yes, there’s always someone contesting those numbers because they’re not quite what the theory defines.
There is another variable. Imagine that to calculate a value hash of a string you have to go through all the characters of this string. It is obvious that the calculation of a hash individual probably has O(n) complexity, even though the collection hash will be used as key can be locate an item with complexity O(1).
In the case of a collection of hashs there is no guarantee that the access will be made in O(1). If there is a collision of these hashs, at least partially the collection starts to look for complexity O(n). But it is an exaggeration to say that the collection has complexity O(n) because of this, in practice it will never reach this complexity on the whole. And it may be that some specific optimization can make this search better than O(n).
See how complex it is to determine complexity? (Pun intended... or not)
To understand how bad it can be in plain language:
O(1) = O(yeah)
O(log n) = O(nice)
O(n) = O(k)
O(n log n) = O(k-ish)
O(n^2) = O(my)
O(2^n) = O(no)
O(n^n) = O(fuck)
O(n!) = O(mg!)
big theta does not mean medium case, it means that the upper limit big-O and the lower limit big-Omega are the same. For example we can say things like the average time spent by the algorithm is Omega(N) or the time spent in the worst case is Theta(N 2)
@hugomg is true, I got confused in the rush, I’m going to make this right now but still need to improve some things after (volleyball :) )
care also when you talk that the complexity of beta and call (beta). what can be O of something are always functions. So, you have to say that the running time of the algorithm is given by a function f that depends on the size of the input and f is O of something. and still, say that f = O is an abuse of notation, since it should be a symbol of belonging. but this is completely irrelevant in practice :P
"This is not a measure of the real speed of the algorithm."When I was studying parallel computing, I did a job that consisted of parallelizing an algorithm into matrices (I don’t remember which, but it was O(n^3)
). In the end, the parallel version became more sluggish that the sequential! Investigating, I saw that both algorithms continued O(n^3)
, but although the constant factor of n^3
had actually decreased, the constant factor of n^2
grew enormously. For although we had taken into account the complexity in time and space, we forgot to consider in the account the exchange of data between the processors...
@Lucasvirgili I withdrew this part since it was never the focus of my answer and why I do not master the subject so deeply.
@mgibsonbr quoted his example. I say that it is important to "ignore" a little the theory in these cases because it proves irrelevant and can lead us to incorrect conclusions. This is why when we are developing software we need to measure reality and not rely on the assumptions (even good, correct) that theory gives us.
@mgibsonbr You have this work published somewhere?
@utluiz This was just a college job, in the undergraduate course, had nothing to publish no. However... I found it in the middle of my old files! P dropbox pro link if you really want to see. It’s about LU decomposition method for solution of linear equation systems. It was tested on a "supercomputer" with 16 processors (by the standards of 2001 it was still good... I think). P.S. the code is not mine.
@mgibsonbr Thank you very much! I have a CP job in the master’s degree to implement the Jacobi’s method to solve linear equations. It will be very interesting to have another approach to compare! :)
@utluiz I hope you help something then! Unfortunately, I don’t have the final report, only the part I wrote (so the file is incomplete). By the way, to be clear (because this has not been mentioned so far): my implementation went wrong because my group decided to divide the matrix in lines - each processor received an equal number of lines to work on; if we had divided it into columns, the overhead exchange of data between processors would be the order of n
, nay n^2
, and in fact the groups that did this achieved the desired results.
Excuse bigown but asymptotic complexity is a well-defined section but the last revision of your answer is saying many things that can end up increasing people’s confusion and giving the impression that the theory is useless. 1) Additions and multiplications are O(1) in the common case where we multiply integers with a fixed number of bits.
@hugomg I was very clear about the theory being important. And I was clear that excessive theory also doesn’t help people understand the problem. In fact, there were people who told me they didn’t understand a word you said. Of course, you can claim that they should try harder, that they’re limited, that... but still, your response to them didn’t help at all. I am not saying that it is bad, I found it very good and it is the most correct. I emphasize this in my reply. I will see your specific objections and if I find any problem I will edit the answer.
@hugomg Improved some things that were confusing, fixed a silly mistake that had not realized, tried to clarify the intention in some cases, insisted on what I see being used in practice, even if in the wrong way, has in the text the caveat q it is not necessary and I left aside what you said that I did not quote at any time (I do not speak of insertions). I am reading your comments if you still have something to say, but understand that my goal is to pass some information to those who can not use all theoretical basis to be able to learn something, even if inaccurately.
I really agree with you @hugomg. Sometimes we need to make it as simple as possible for laypeople to understand something complex.
Browser other questions tagged algorithm computer-theory
You are not signed in. Login or sign up in order to post.
Certainly, this topic will be in the 'Topical Gallery to read forever'.
– emanuelsn