1
I’m running a regression where I have 3 parameters and a column with categories.
As sklearn does not recognize categories I turn them into Dummies ( create a column for each category and fill with 1 case belongs to the column category and zero otherwise )
from sklearn import preprocessing
myEncoder = preprocessing.OneHotEncoder()
myEncoder.fit(df_c_f[['segment_id']])
dummies = myEncoder.transform(df_c_f[['segment_id']]).toarray()
So my matrix that initially has n rows and 4 columns now has 3 columns + c columns of categories.
Doubt is on how I can iterate my first 3 columns with all Dummies so I end up with n rows and 3 * c columns.
I ran the following code to do this, but it only works for small matrices, any number a little large the code hangs
matrix = []
def itera_parametros_e_dummies(matrix1,matrix2):
print(len(matrix1))
if len(matrix1) != len(matrix2):
print("matrizes de tamanhos diferentes")
else:
for i in range(len(matrix1)):
matrix.append(np.dot(matrix1[i:i+1],(matrix2[i:i+1]))[0])
return(matrix)
itera_parametros_e_dummies(log_orgc_traf,df_dummies)
I didn’t quite understand what you want to do, how would the structure of data_frame be ready?
– Guilherme Marthe