Just what you described in the question makes it difficult to give you a punctual answer. I suggest that next time, or if this answer is not satisfactory, explain a little about what the database describes.
The first thing is to do a pre-processing job. This will depend on the type of algorithm you want to implement. But if it’s an algorithm like K-Means, identifying outliers and doing some sort of imputation is almost essential. After all, it’s based on the average.
The k-Means algorithm is one of the top 10 algorithms used in data mining (link), and it’s been invented for a while. Knowing that, I think it would be a great start to work with this algorithm, but grouping time series. Time series are data that have parameters as a function of time. In your case, one idea is to try to group weeks with similar behaviors among themselves. This you can do with different kinds of attributes. You can see which weeks have a similar behavior when it comes to order_status
, price
, etc..
Another good algorithm for those starting out is DBSCAN, which is a density algorithm (k-Means is prototype-base, I don’t know how to translate). It’s very simple and you don’t even have to worry about outliers, as they are very likely to be discarded. But, I leave the job of seeing where to implement with you, who has a better sense of where this database came from.
Good luck.
Start with the function by seeing the
hclust
. And please don’t post data that way, never a graphic file, put the output fromdput(dados)
.– Rui Barradas
Oops, thanks Rui. Is this was my first post...
– Leonardo Ferreira