1
I am creating a predictive model in R, using the library Caret. When I run on R it takes a long time, and still gives some errors. In comparison, I run the same base on Weka in a matter of a few minutes already get the result.
I’ve already changed the variables to integer, and it still didn’t do much good.
I’ve also tried to use it in parallel, but it didn’t do much good either.
I was wondering what the performance is connected to in this case? What are the factors that most influence poor performance in creating a predictive model?
The
caret
does parameter tuning by default... Are you sure that’s not it? Sometimes he’s training 30 models instead of 1 as you might be thinking and unlike Weka. Also have to see what template you are making. For example to make Forest Random, the Caret can use both the packagerandomForest
as to theranger
(and others), but always has 1 that is faster.– Daniel Falbel
How does the tuning work? I got to run with tuning=10 and 20. But I also did not get better. I was initially doing kNN, because it is faster, but I tested the randomForest and also fall into the same problem... delay and no result. When I turn to a sample of the dataset it even works. Keeping up with memory performance, I believe he’s training several models, just like I set up for one?
– Ricardo Corassa
You have to see your code... You mess with the argument
tuneGrid
. You also have to touch thetuneCtrl
pq Caret also does CV. How to do it is here: https://topepo.github.io/caret/model-training-and-tuning.html#basic-Parameter-tuning If you put a minimal reproducible example in your question it is easier to answer. The way your question is the answer has to be too long to answer.– Daniel Falbel
This parameter
number
is the number of Folds cross-validation. In other words, 10 means that for each element of the parameter grid vc will adjust 10x the model, to evaluate the error in the other part of the base.– Daniel Falbel
inTain <- createDataPartition(y = make.names(rf$class), p = 0.7, list = FALSE)

training <- rf[inTain,]
teste <- rf[-inTain,]

set.seed(234)
train_control <- trainControl(method="cv", number=10)

model <- train(as.factor(class) ~., 
 data = training, 
 trControl=train_control, 
 method="rf")
I am running this code for a table with more or less 50k rows and 60 columns– Ricardo Corassa
Right, but when I spin at Weka I also do a CV of 10.
– Ricardo Corassa
After about an hour of running, it was wrong again. Error in Train.default(x, y, Weights = w, ...) : Stopping In addition: There Were 50 or more warnings (use warnings() to see the first 50)
– Ricardo Corassa