How to get the X coordinate if the Y condition is met with Numpy?

Asked

Viewed 41 times

2

I created a function that gives me this array as output in which each line corresponds to a point :

array([[0.57946528, 2.        ],
      [0.35226154, 0.        ],
      [0.26088698, 0.        ],
      [0.56560726, 1.        ],
      [0.41680759, 1.        ],
      [0.55771505, 0.        ],
      [0.8501109 , 0.        ],
      [0.76229916, 1.        ],
      [0.50357436, 0.        ],
      [0.40875861, 1.        ]])

I intend to group the points by the unique value of the Y coordinate (np.Unique(y)) and calculate their mean by Y value (x_media for y = 0, x_media for y = 1,x_media for y = 2).

Thus, the new array would only have three dots (3 rows and 2 columns)

array([[mean(x, y =2) , 2.        ],
      [mean(x, y = 0) / , 0.        ],
      [mean(x, y= 1),0.        ]])

I’ve thought about turning this array into a pandas dataframe but I’m looking to do this with numpy

2 answers

1

I believe I don’t have a numpy method to do this. Certainly with pandas it would be easier.

But since you want to do with numpy, I believe the way is:

>>> import numpy as np

>>> arr = np.array([[0.57946528, 2.        ],
...       [0.35226154, 0.        ],
...       [0.26088698, 0.        ],
...       [0.56560726, 1.        ],
...       [0.41680759, 1.        ],
...       [0.55771505, 0.        ],
...       [0.8501109 , 0.        ],
...       [0.76229916, 1.        ],
...       [0.50357436, 0.        ],
...       [0.40875861, 1.        ]])

>>> n = np.unique(arr[:,1])

>>> n
array([0., 1., 2.])

>>> ga = np.array( [ [i, list(arr[arr[:,1]==i,0])] for i in n], dtype=object )
>>> ga
array([[0.0,
        list([0.35226154, 0.26088698, 0.55771505, 0.8501109, 0.50357436])],
       [1.0, list([0.56560726, 0.41680759, 0.76229916, 0.40875861])],
       [2.0, list([0.57946528])]], dtype=object)

>>> final = np.array( [[i, np.mean(j) ] for i, j in ga] )
>>> final
array([[0.        , 0.50490977],
       [1.        , 0.53836816],
       [2.        , 0.57946528]])
>>>

I hope it helps.

1


The values:

a =  np.array([
      [0.57946528, 2.        ],
      [0.35226154, 0.        ],
      [0.26088698, 0.        ],
      [0.56560726, 1.        ],
      [0.41680759, 1.        ],
      [0.55771505, 0.        ],
      [0.8501109 , 0.        ],
      [0.76229916, 1.        ],
      [0.50357436, 0.        ],
      [0.40875861, 1.        ]])

An alternative with numpy:

import numpy as np

X = a[:,0] # valores de X
Y = a[:,1] # valores de Y
Y_unic = np.unique(Y) # valores únicos de Y

b = [([X[Y == y], y]) for y in Y_unic] # criando uma lista de array numpy com os valores agrupados
np.array([[np.mean(y), x] for y, x in b]) # calculando a média

Exit:

array([[0.50490977, 0.        ],
       [0.53836816, 1.        ],
       [0.57946528, 2.        ]])

Pandas:

import pandas as pd

df = pd.DataFrame({'X':a[:,0], 'Y':a[:,1]})
df = df.groupby('Y').mean().reset_index().reindex(columns=['X','Y']).values
df

Exit:

array([[0.50490977, 0.        ],
       [0.53836816, 1.        ],
       [0.57946528, 2.        ]])

Browser other questions tagged

You are not signed in. Login or sign up in order to post.