Doubts binary search

Asked

Viewed 2,284 times

3

Assuming that the binary search works only with array of ordered integers, if I had to search for an integer in an ordered vector the speed of the search would be much faster than a sequential search, but my question is, if my vector is not ordered how much would it influence the speed in relation to sequential search if I had to sort it? and if we can go deeper think of a binary search on a String vector the process would be:

  1. Convert each vector item to a decimal number
  2. Sort the vector
  3. Perform the binary search

In these two examples how much would influence the speed in relation to sequential search processes before binary search?

  • It depends on the amount of searches you will perform after sorting. If there are few searches it is not worth it.

3 answers

4


if my vector is not ordered how much it would influence the speed in relation to sequential search in case I have to sort it?

If you have to sort an array to then do a search for sure the process will be slower, it will only compensate order if you have to do multiple searches.

a binary search on a String vector

The class String (in Java) implements the interface Comparable which allows you to use the method binarySearch() directly on your Strings vector, without the need to transform into integers (however you plan to do so).

That is, you can sort the vector simply by doing so:

String[] sa = {"um", "dois", "tres", "quatro"};
Arrays.sort(sa);
System.out.println(Arrays.binarySearch(sa, "um")); //imprime "3" que é a posição do
                                                   //elemento "um" no vetor ordenado

how much would influence the speed in relation to sequential search processes before binary search?

It depends on the size of your vector and how many times you will do a search on it after ordered.

Just know the method sort() has complexity O(n log(n)).

References:

Arrays.Sort() - Java SE7 method

String Class - Java SE7

Interface Comparable - Java SE7

  • Thank you @Math your reply was quite constructive , as I do a binary search on a string array without needing to convert them to integer as does the Java String class?

  • I didn’t understand your question. The code I put in the answer does just that, with Arrays.Sort() and Arrays.binarySearch(). If it’s not that, I could clear up your doubt?

  • My question is about the inner workings of the binarySearch method, in your answer you said that it does binary search without having to convert to integers, how it does this?

  • Every class that wants to be ordered must implement the interface Comparable and overwrite the method compareTo(). The String class does this.charAt(k)-anotherString.charAt(k) to say which string is before another when ordered, this code means that the comparison of two strings is the result of the comparison based on the Unicode value from character to character of each of them. See the compareTo() of String.

3

Well, the speed with which the processes needed to make binary search viable are performed is directly associated with number of times that they are executed.

In this case, if the vector is not ordered and you do not know a "clue" about a way to sort it in a faster and more effective way (based on the vector’s own values), a sequential search will always be the best alternative.

In short, it’s only interesting to do a binary search instead of the sequential one when:

  1. The vector in question is ordered (as already pointed out in your question);
  2. The number of tasks performed to sort a vector from an algorithm already known is minor than the number of value comparisons per value.

Think of the following situation (I will use C# to demonstrate):

  • You want to search for a number (38 for example) within a 40 position vector that is not ordered:

    int valorDesejado = 38;
    int[] vetor = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
                      22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 39};
    

    You have a clue or, in this case, you explicitly know that your vector has the last two inverted values.

  • Comparing the two types of searches (considering processing expense in sum, division, disjunction and comparison operations in an equivalent way), sequentially you would perform 38 comparison tests (taking the comparison of the loop itself) and increment your counter variable i the same number of times:

    for (int i = 0; i < vetor.Length; i++)
        if (valorDesejado == vetor[i])
            Console.WriteLine("Valor encontrado na posicao " + i);
    
    /* Total:
     * 38 Operações
     * 38 Comparações
     */
    

    On the other hand, before carrying out the binary search, you only need three operations to change the position values (for this I will use the XOR exchange algorithm), adding to the total two operations and one comparison per cycle:

    // Usando XOR Swap: 3 operações
    vetor[38] ^= vetor[39];
    vetor[39] ^= vetor[38];
    vetor[38] ^= vetor[39];
    
    // Busca binária
    int esq, meio, dir;
    esq = -1;
    dir = vetor.Length;
    
    while (esq < dir - 1) // Para buscar pelo 38, o ciclo só se repetirá 6 vezes
    {
        meio = (esq + dir) / 2; // Duas operações (soma e divisão)
        if (vetor[meio] < valorDesejado) // Uma comparação
            esq = meio;
        else
            dir = meio;
    }
    Console.WriteLine("Valor encontrado" + dir);
    
    /* Total:
     * 6 * 2 + 3 = 15 Operações
     * 6 * 1 = 6 Comparações
     */
    

3

The efficiencies would be:

  • n. log(n) to order
  • log(n) to search when ordered binarysearch
  • n to look for disorderly

Let’s say we have 10K elements

To sort and seek:

10K.log(10K)+log(10K) = 44K de eficiência

To look for disorderly would just:

10K = 10k eficiência

The disorderly search is quicker than sorting and searching. The problem with sorting the list is that it costs too much. Then it would never be more efficient to order before seeking to only 1 search.

If I had to do more searching then the cost of this initial ordering would be gone.

For example for this search, if it were done 10 times we would have:

Ordenado: 44+40 = 88
Desordenado: 10*10 = 100

Recalling that efficiency O(1) is the most efficient, so 88 is better than 100.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.