The caret
by default does tuning of some hyperparameters of each model. He tries to do this in a clever way, but that is not always the right one for your problem. Already the rpart
adjusts the model exactly as you defined it.
The caret
is not very clear with that very thing, and sometimes creates confusion...
In this case, to the rpart
will Tunate the hyperparameter cp
(complexity). He decides a grid to test according to the following function:
> getModelInfo("rpart")[[1]]$grid
function (x, y, len = NULL, search = "grid")
{
dat <- if (is.data.frame(x))
x
else as.data.frame(x)
dat$.outcome <- y
initialFit <- rpart(.outcome ~ ., data = dat, control = rpart.control(cp = 0))$cptable
initialFit <- initialFit[order(-initialFit[, "CP"]), , drop = FALSE]
if (search == "grid") {
if (nrow(initialFit) < len) {
tuneSeq <- data.frame(cp = seq(min(initialFit[, "CP"]),
max(initialFit[, "CP"]), length = len))
}
else tuneSeq <- data.frame(cp = initialFit[1:len, "CP"])
colnames(tuneSeq) <- "cp"
}
else {
tuneSeq <- data.frame(cp = unique(sample(initialFit[,
"CP"], size = len, replace = TRUE)))
}
tuneSeq
}
This function basically:
- adjusts a model with all parameters equal to the standard rpart except the
cp
(complexity), using cp = 0.
- picks up the item
cptable
returned, which by definition is:
cptable: a Matrix of information on the optimal prunings based on a
Complexity Parameter.
- adjusts a model to a sequence of
cp
's according to the argument tuneLength
of function train
.
This behavior can be changed. Read here for more information: http://topepo.github.io/caret/model-training-and-tuning.html#customizing-the-tuning-process