What better way to parallelize this function?

Asked

Viewed 81 times

1

I am trying to parallelize a function that calculates the Cosine similarity:

Here is my code:

import numpy as np
def cos_sim(a,b):
    dot_product = np.dot(a,b)
    norm_a = np.linalg.norm(a)
    norm_b = np.linalg.norm(b)
    if(norm_a == 0 or norm_b == 0):
        return 0
    else:
        return dot_product / (norm_a * norm_b)

def newsimilarityitem(matriz):
    cs = []
    for i in range(0, len(matriz)):
        cs.append([0]*len(matriz))

    for i in range(0,len(matriz)-1): #AQUI
        for l in range(i+1,len(matriz)): #AQUI
            a = np.array(matriz[i])
            b = np.array(matriz[l])
            r = cos_sim(a,b)
            cs[i][l] = r
            cs[l][i] = r
    return cs

What the code does:

matriz = [[4,3,0,0,5,0],
          [5,0,4,0,4,0],
          [4,0,5,3,4,0],
          [0,3,0,0,0,5],
          [0,4,0,0,0,4],
          [0,0,2,4,0,5]]

Given a matrix (not necessarily quadratic) where the rows are represented by items and the columns are represented by users and the cells are notes, I will calculate the cosene similarity between the items Cosine similarity. The function is called:

matriz_simi = newsimilarityitem(matriz)

In function the matrix cs (Mandatory quadratic) will present the similarity ie: given the index of an item i and the index of another item l the similarity of an item is cs[i][l] or cs[l][i]. The function cos_sim(a,b) will take two numpy array and calculate the similarity.

I’m trying to parallelize the two ties marked above. Currently the complexity is O(n²/2)(I suppose), but parallelizing will save a lot of time since in recommendations I can have thousands of users and products.

Currently my machine has 4 colors and I am using multiprocessing, but I am open to any kind of library that can facilitate this task.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.