1
I want to calculate the vector difference between two large vectors. I am able to subtract a list of matrices for a matrix and put it to the power of 2
(train["quest_emb"][0] - train["sent_emb"][0])**2
but not generalize it with a column of the matrix data table for a matrix data table:
train["quest_emb"] - train["sent_emb"]
how it locks my computer.
Array analysis
Here is an example of your content.
>>> print((train["quest_emb"][2]))
[[0.03949683 0.04509903 0.01808935 ... 0.04610749 0.0416535 0.02240689]]
>>> print((train["sent_emb"][2]))
[array([0.03037658, 0.04433101, 0.08135635, ..., 0.06764812, 0.04971079,
0.02240689], dtype=float32), array([0.05260669, 0.04548098, 0.0382337 , ..., 0.04823414, 0.07656007,
0.03501297], dtype=float32), array([0.0502927 , 0.04480611, 0.02038252, ..., 0.03942193, 0.03132772,
0.04595207], dtype=float32), array([0.06769167, 0.03393815, 0.0625218 , ..., 0.05555448, 0.03059104,
0.03422254], dtype=float32)]
There seems to be a difference in size:
>>> print(len(train["quest_emb"][0]))
1
>>> print(len(train["sent_emb"][0]))
4
Here is what the first array looks like:
>>> print((train["quest_emb"][2][0]))
[0.03949683 0.04509903 0.01808935 ... 0.04610749 0.0416535 0.02240689]
>>> print((train["sent_emb"][2][0]))
[0.03037658 0.04433101 0.08135635 ... 0.06764812 0.04971079 0.02240689]
The vector size 'Train [" quest_emb "]e o mesmo do vetor 'train [" sent_emb "]
: 130318
Here are the die types
>>> print(type(train["quest_emb"][2]))
<class 'numpy.ndarray'>
>>> print(type(train["sent_emb"][2]))
<class 'list'>
** Is there any way to make this computable difference to a computer with 8 G of RAM? Or if it’s not an approximate way? **
Intento con theano
I tried to subtract with Theano:
import theano.tensor as T
from theano import function
x = T.dscalar('x')
y = T.dscalar('y')
z = x - y
f = function([x, y], z)
f(train["quest_emb"],train["sent_emb"])
But it was a mistake:
ValueError: Bad input argument with name "quest_emb" to theano function with name "<ipython-input-41-c53eb459cbc4>:6" at index 0 (0-based).
HI, you could convert Train["sent_emb"] to numpy.ndarray . It will speed up interactions and possibly solve your problem.
– Davi Mello
@Davimello I tried and it didn’t work, the problem is efficiency in such a large number of lines (130318). I tried with less than half the data and it calculates without sotcker the results so far. But they don’t ask me to do the job halfway
– Revolucion for Monica
In the variable "sent_emb" has 4 vectors. You want to subtract "quest_emb" from each of them ?
– Davi Mello
That’s a college issue or something you can post to github. That without the data, it’s hard to really help.
– Davi Mello