1
I’m learning Learning machine techniques to predict sheet size values (numerical) from multiple predictors (numerical). However, leaf sizes are conditioned to life form, (trees or grams), which are not balanced. At the moment, I am creating data separation using the "sheet size" values (the variable I want to predict) and generating separate models for each class. My question is: do I need to create separate templates for each class, or is there an option that I can separate the data into training and testing in existing classes and generate a single model that generates the prediction of sheet sizes, taking into account the class (forma_vida) (and if anyone has a tip~for someone who has never dealt with ml before~ of how to deal with the fact that they are not balanced).
library(caret)
# Parte dos dados
> dput(head(df))
structure(list(tam_folha = c(4L, 5L, 3L, 1L, 2L), forma_vida = structure(c(1L,2L, 1L, 2L, 1L), .Label = c("arvore", "grama"), class = "factor"),
X1036 = c(0.349, 0.342, 0.383, 0.325, 0.309), X1037 = c(0.349,
0.342, 0.383, 0.325, 0.309), X1038 = c(0.349, 0.342, 0.383,
0.325, 0.309), X1039 = c(0.349, 0.342, 0.383, 0.325, 0.309
), X1040 = c(0.349, 0.342, 0.383, 0.325, 0.31), X1041 = c(0.349,
0.342, 0.383, 0.326, 0.31)), .Names = c("X", "Y", "X1036","X1037", "X1038", "X1039", "X1040", "X1041"), row.names = c(NA,5L), class = "data.frame")
#Filtrando por classes
arvores = df %>% dplyr::filter(forma_vida=="arvore")
# Data partition
index <- createDataPartition(arvores$tam_folha, p = 0.7, list = FALSE)
train_data <- arvores[index, ]
test_data <- arvores[-index, ]
controle = trainControl(method ="cv",number= 10, repeat=5, selectionFunction = "oneSE")
mod1 <- train(tam_folha ~ ., data = train_data,
method = "pls",
metric = "RMSE",
tuneLength = 4,
trControl = controle)
##repete para o fator::gramas