Most efficient algorithm to find indices of the 2 elements of a list whose sum is equal to a certain value

Asked

Viewed 114 times

-2

The Challenge is simple, find two numbers from a given list that if summed result in a specific number and return the index of those numbers.

That I solved, the problem is that in addition the performance of the code also counts.

Performance test with a large list of Numbers: Time limit exceeded

What I tried wasn’t fast enough, using two loops for where the first took a number as a base and the second went through all the numbers that were in front of it, which was not fast enough.

So I thought I’d use the method index with the difference between the requested sum and the current number, but also not fast enough. If anyone knows of any faster method or what I have to research on to understand.

   def find_two_sum(numbers, target_sum):
        result = ()
        for i in range(len(numbers) - 1):
            diff = target_sum - numbers[i]
            try:
                j = numbers.index(diff)
                return (i, j)
            except ValueError:
                continue
    if __name__ == "__main__":
        print(find_two_sum([3, 1, 5, 7, 5, 9], 10))
  • maybe returning value directly with Return (i,j) instead of setting to a variable and then giving Return

  • @Fourzerofive This changes practically nothing, the difference is irrelevant, because what interferes with the algorithm are other factors (how to call index all the time, for example)

2 answers

2


The problem is that index always scrolls through the list from the beginning to search for the element index, and you do this search several times (and as in many cases the difference is not in the list, it ends up going through to the end and launching the exception, which should still add a overhead the most).

Instead of checking if the difference is on the list, an alternative is to store the numbers I’ve already visited in a set (which is a structure that does not allow repeated elements and is optimized for searches - see here that the search in set's has constant time, already in lists has linear time).

def find_two_sum(numbers, target_sum):
    visitados = set()
    for i in range(len(numbers)):
        diff = target_sum - numbers[i]
        if diff in visitados:
            return numbers.index(diff), i
        visitados.add(numbers[i])

For example, for the list [3, 1, 5, 7, 5, 9] and the sum 10, first I see the 3 and the difference to 10 (which is 7). I see if 3 is in the set, and as it is not, I add it there. When I arrive at the 7, I’ll see the difference is 3, which will be in the set, and so I know that 3 and 7 are the pair that add up to 10. Then just return the indexes.

The difference to your code is that I call index only if the elements which have been added are equal to target_sum, what is better than searching for all.

Another option is to use a dictionary to store the element index:

def find_two_sum_dict(numbers, target_sum):
    visitados = set()
    indices = {}
    for i in range(len(numbers)):
        diff = target_sum - numbers[i]
        if diff in visitados:
            return indices[diff], i
        visitados.add(numbers[i])
        if numbers[i] not in indices:
            indices[numbers[i]] = i

This avoids calling index at the end, at the cost of maintaining another additional structure (the dictionary of indexes).


I ran some tests on the module timeit:

def find_two_sum_index(numbers, target_sum):    
    for i in range(len(numbers)):
        diff = target_sum - numbers[i]
        try:
            j = numbers.index(diff)
            return (i, j)
        except ValueError:
            continue

def find_two_sum_set(numbers, target_sum):
    visitados = set()
    for i in range(len(numbers)):
        diff = target_sum - numbers[i]
        if diff in visitados:
            return numbers.index(diff), i
        visitados.add(numbers[i])

def find_two_sum_dict(numbers, target_sum):
    visitados = set()
    indices = {}
    for i in range(len(numbers)):
        diff = target_sum - numbers[i]
        if diff in visitados:
            return indices[diff], i
        visitados.add(numbers[i])
        if numbers[i] not in indices:
            indices[numbers[i]] = i

from timeit import timeit
from random import sample, shuffle

# executa 100 vezes cada função
params = { 'number': 100, 'globals': globals() }

i = s = d = 0 # computa quantas vezes cada um foi pior
for _ in range(10):
    # gera um array com mil números
    numbers = list(sample(range(10000), 1000))
    # escolhe 2 aleatoriamente e soma
    soma = sum(sample(numbers, 2))
    # embaralha o array
    shuffle(numbers)
    t_i = timeit('find_two_sum_index(numbers, soma)', **params)
    t_s = timeit('find_two_sum_set(numbers, soma)', **params)
    t_d = timeit('find_two_sum_dict(numbers, soma)', **params)
    if t_i > t_s and t_i > t_d: # index demorou mais
        i += 1
    elif t_s > t_i and t_s > t_d: # set demorou mais
        s += 1
    elif t_d > t_s and t_d > t_i: # dicionário demorou mais
        d += 1
print(i, s, d)

Since the list of numbers I generated is random, the results vary. But in general, most often the solution with set (the first option of this answer) was faster. In some cases the solution with the index dictionary was slower, but in most cases its solution was slower (the variable was removed result of its code, which was not being used, because in the end it did not interfere much in the result, what slowed were the calls to index even).

If you want you can print the values of t_i, t_s and t_d (they indicate how many seconds it took each test), to get an idea (remembering that times can vary greatly from one machine and/or execution to another, because they depend on various factors, such as hardware, whether there were other processes running on the machine, etc). But in general, on my machine, compared to the solution that uses the set, its solution was between 3 and 15 times slower (depending on the generated list), and the solution with the index dictionary was between 1 and 3 times slower.

Since you seem to be submitting the code to one of these "online challenges" and they don’t usually disclose all the test cases, I’m not sure that these solutions will pass. But at least it seems there’s been some improvement over your code.


Note: I also tested with the code of another answer and he’s as slow or as slow as his code, since he does a search on the list, which as already mentioned, has linear time and so is slower than using set.

-4

You should put the list in order, which allows you to do a binary search that performs better than linear search. So you go through the list, taking a portion at a time. With this portion you can calculate the difference, which is the value of the next installment, and then through binary search check if it is in the list.

def find_two_sum(numbers, target_sum):
numbers.sort()
for parcela in numbers:
    diferença = target_sum - parcela
    if diferença in numbers:
        return (parcela, diferença)
return None

if __name__ == "__main__":
    print(find_two_sum([3, 1, 5, 7, 5, 9], 10))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.