How to subtract two large vectors?

Asked

Viewed 680 times

1

I want to calculate the vector difference between two large vectors. I am able to subtract a list of matrices for a matrix and put it to the power of 2

(train["quest_emb"][0] - train["sent_emb"][0])**2

but not generalize it with a column of the matrix data table for a matrix data table:

train["quest_emb"] - train["sent_emb"]

how it locks my computer.

Array analysis

Here is an example of your content.

>>> print((train["quest_emb"][2]))
[[0.03949683 0.04509903 0.01808935 ... 0.04610749 0.0416535  0.02240689]]

>>> print((train["sent_emb"][2]))
[array([0.03037658, 0.04433101, 0.08135635, ..., 0.06764812, 0.04971079,
       0.02240689], dtype=float32), array([0.05260669, 0.04548098, 0.0382337 , ..., 0.04823414, 0.07656007,
       0.03501297], dtype=float32), array([0.0502927 , 0.04480611, 0.02038252, ..., 0.03942193, 0.03132772,
       0.04595207], dtype=float32), array([0.06769167, 0.03393815, 0.0625218 , ..., 0.05555448, 0.03059104,
       0.03422254], dtype=float32)]

There seems to be a difference in size:

>>> print(len(train["quest_emb"][0]))
1
>>> print(len(train["sent_emb"][0]))
4

Here is what the first array looks like:

>>> print((train["quest_emb"][2][0]))
[0.03949683 0.04509903 0.01808935 ... 0.04610749 0.0416535  0.02240689]

>>> print((train["sent_emb"][2][0]))
[0.03037658 0.04433101 0.08135635 ... 0.06764812 0.04971079 0.02240689]

The vector size 'Train [" quest_emb "]e o mesmo do vetor 'train [" sent_emb "]: 130318

Here are the die types

>>> print(type(train["quest_emb"][2]))
<class 'numpy.ndarray'>

>>> print(type(train["sent_emb"][2]))
<class 'list'>

** Is there any way to make this computable difference to a computer with 8 G of RAM? Or if it’s not an approximate way? **

Intento con theano

I tried to subtract with Theano:

import theano.tensor as T
from theano import function
x = T.dscalar('x')
y = T.dscalar('y')
z = x - y
f = function([x, y], z)   
f(train["quest_emb"],train["sent_emb"])

But it was a mistake:

ValueError: Bad input argument with name "quest_emb" to theano function with name "<ipython-input-41-c53eb459cbc4>:6" at index 0 (0-based).
  • HI, you could convert Train["sent_emb"] to numpy.ndarray . It will speed up interactions and possibly solve your problem.

  • @Davimello I tried and it didn’t work, the problem is efficiency in such a large number of lines (130318). I tried with less than half the data and it calculates without sotcker the results so far. But they don’t ask me to do the job halfway

  • In the variable "sent_emb" has 4 vectors. You want to subtract "quest_emb" from each of them ?

  • That’s a college issue or something you can post to github. That without the data, it’s hard to really help.

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.