different results using rpart and Caret

Question

different results using rpart and Caret

Asked 7 years, 11 months ago

Viewed 162 times

2

Hello,

I’m testing some regression models and I don’t quite understand one thing: I used rpart from the rpart package, and then used Train with rpart method from the Caret package

resultRPart <- rpart(EVADIU ~ ., data = data.rose)
resultCaret <- train(EVADIU ~ ., data = data.rose, method = "rpart")

I expected the two to give the same result (Precision, recall, etc) but that’s not what happened

the first gave

Precision : 0.599

recall : 0.412

the second

Precision : 0.1439

recall : 0.6759

Is that normal or am I comparing oranges to bananas here?

1 answer

Browser other questions tagged r regression

You are not signed in. Login or sign up in order to post.

by Daniel Falbel • **12,504** points · Answer 1 · 2017-08-21T16:14:47+00:00

The caret by default does tuning of some hyperparameters of each model. He tries to do this in a clever way, but that is not always the right one for your problem. Already the rpart adjusts the model exactly as you defined it.

The caret is not very clear with that very thing, and sometimes creates confusion...

In this case, to the rpart will Tunate the hyperparameter cp (complexity). He decides a grid to test according to the following function:

> getModelInfo("rpart")[[1]]$grid
function (x, y, len = NULL, search = "grid") 
{
    dat <- if (is.data.frame(x)) 
        x
    else as.data.frame(x)
    dat$.outcome <- y
    initialFit <- rpart(.outcome ~ ., data = dat, control = rpart.control(cp = 0))$cptable
    initialFit <- initialFit[order(-initialFit[, "CP"]), , drop = FALSE]
    if (search == "grid") {
        if (nrow(initialFit) < len) {
            tuneSeq <- data.frame(cp = seq(min(initialFit[, "CP"]), 
                max(initialFit[, "CP"]), length = len))
        }
        else tuneSeq <- data.frame(cp = initialFit[1:len, "CP"])
        colnames(tuneSeq) <- "cp"
    }
    else {
        tuneSeq <- data.frame(cp = unique(sample(initialFit[, 
            "CP"], size = len, replace = TRUE)))
    }
    tuneSeq
}

This function basically:

adjusts a model with all parameters equal to the standard rpart except the cp(complexity), using cp = 0.
picks up the item cptable returned, which by definition is:

cptable: a Matrix of information on the optimal prunings based on a Complexity Parameter.

adjusts a model to a sequence of cp's according to the argument tuneLength of function train.

This behavior can be changed. Read here for more information: http://topepo.github.io/caret/model-training-and-tuning.html#customizing-the-tuning-process