Why learn different algorithms that solve the same problem?

Asked

Viewed 510 times

23

I don’t have training in computer science. For example, whenever I want to sort a number vector x in one of the programming languages I use, just run sort(x) and everything is solved.

However, the people I know who studied computer science had to study how sorting algorithms work. I imagine this is an important subject in this area of knowledge, to the point where there is at least one channel on Youtube who published videos of folk dance companies illustrating how different algorithms work in practice.

Catching a ready list of Wikipedia, 14 different sorting algorithms can be found:

  • Insertion Sort
  • Selection Sort
  • Bubble Sort
  • Comb Sort
  • Merge Sort
  • Heapsort
  • Shell Sort
  • Radix Sort
  • Gnome Sort
  • Counting Sort
  • Bucket Sort
  • Cocktail Sort
  • Timsort
  • Quick Sort

That said, my questions are:

  1. Is there any sort algorithm that is the best of all, can be used in most cases and bring the best results? Is there any study in this sense?

  2. Are some specific sorting problems best solved by any particular algorithm? What examples can be given?

  3. I understand that one of the ways to choose a good algorithm is through its complexity, represented by O(f(x)). But this is a theoretical measure, which as far as I know does not take into account physical limitations of the machine’s memory and disk. There is some measure of complexity of algorithms that does not only take into account the amount of operations performed, but also the memory used for this?

  4. Is there still research being carried out in the area? For example, is it possible that any of these 14 sorting algorithms I listed above is not indicated to work with large volumes of data? Something more advanced has already been or is being developed?

1 answer

22


Because they solve the same problem in different ways. Each has a different commitment. Some use more memory, others are slower depending on the circumstances. We can see these commitments in the table of Big O Cheat Sheet:

Tabela Big O Cheat Sheet

Is there any sort algorithm that is the best of all, can be used in most cases and bring the best results? Is there any study in this sense?

It depends. If you pay attention to the text of the question it fits its interpretation. And "better" would need to be defined. Without saying that "better" would be this, any point can be highlighted.

In terms of performance many consider it to be the Heapsort, which has excellent results in all cases and occupies a minimum of memory. But it depends on which hardware it is running. In modern hardware it does not go so well by having bad reference location, but obviously depends on the implementation. And it does not offer stability.

Many consider it to be the Quicksort, after all it is highly efficient in memory, close to the ideal maximum, and has the best possible performance in the best case, on average, and although it can be bad at worst, it rarely happens in fact, mainly in large volumes (I am excepting from the analysis the algorithms that have restriction of what and how it classifies the data). It is probably the most used because it is easy to implement.

Some people prefer the Introsort that combines the previous two. Incidentally, it is common the real internal implementations of good functions of Sort decide on the best algorithm between a set of at least two or three options.

For better performance assurance it is customary to adopt the Mergesort, as long as memory isn’t a big problem. It is easier to parallelize, which can make it the fastest (which is different from being the most efficient), if well implemented to take advantage of this feature.

More recently the Timsort has been used because it is a Mergesort with more intelligence and may have some expressive gains in some cases, without significantly compromising others. It is more complex and there are cases that there will be no gain.

I do not believe that the others can even be considered for "all" cases with a minimum of efficiency.

Are some specific sorting problems best solved by any particular algorithm? What examples can be given?

Yes, there are several specific situations that one can do better than another, and that can vary many things:

  • the total amount of memory available for your use
  • the available cache
  • the amount of processors available
  • if the classification will not be done in RAM memory
  • whether the data is often too jumbled or whether they should already have a certain order
  • whether it needs stability or not
  • whether the data is too repeated or whether it is guaranteed that there is no repetition
  • or even if it is possible to have some small margin of error in the final classification
  • or if the data have a specific standard that allows classifying without comparing, the best example is the Radix.

If you want to better visualize the performance in some chosen scenarios has a website that does this (caution can be a little misleading if you don’t understand the limitations of this analysis). Note there showing that implementing the same algorithm makes a difference, and would do more if you had other options, some running in parallel.

I understand that one of the ways to choose a good algorithm is through its complexity, represented by O(f(x)). But this is a theoretical measure, which as far as I know does not take into account physical limitations of the machine’s memory and disk. There is some measure of complexity of algorithms that does not only take into account the amount of operations performed, but also the memory used for this?

There is, as the AP itself showed the complexity is given by a function, make it complex enough, considering all variables (can not forget any), and will have what you want. In general "nobody" does, because rarely compensates the effort, and simulate with real data should give better results (closer to reality) with less work.

Is there still research being carried out in the area? For example, is it possible that any of these 14 sorting algorithms I listed above is not indicated to work with large volumes of data? Something more advanced has already been or is being developed?

Yes, totally, there are more than these, and "every day" comes a new one, until it proves that it is no better than others or is under too specific circumstances, or are more complicated than those already existing without an expressive gain. In general are improvements to existing or for use in a data niche.

Concluding

Selection is a difficult pump to overcome in inefficiency, except for Bogosort.

Remembering that for small volumes it makes little difference what to use.

Note that I used the term rating, ordination is wrong.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.