0
I have a database (excel spreadsheet) about the health of the elderly with about 112 columns and I would like to know the best algorithm to extract some of these columns, maintaining the variability of the data and not losing the reference to the names of the selected ones (this is possible?).
In previous tests, I used the PCA but the resulting components do not have a significant name.
To put it in context, the main idea is to use an algorithm that extracts columns from my database in order to eliminate the strong correlation between them, and then use some sort of classification algorithm (K-Means, DBSCAN...) to classify each person (healthy, unhealthy, among others...).
I’m using the library scikit-Learn at the moment
It would not be easier to mount a query in the query BD, just by selecting the fields you need?
– Nicolas Pereira
I’m actually working with an excel spreadsheet! At first I selected only a few columns, but when talking with a teacher specialized in AM I was informed that choosing the columns randomly is not a good way and that the correct one would be to use some algorithm for this.
– Igor Carvalho