How to manipulate two data sets at the same time?

Asked

Viewed 51 times

1

I would like to learn how to manipulate two variables at the same time. An example, I have a training base and a test base for Machine Learning. How could I apply the function Factor both at the same time(Since the code is identical, it only changes the basis), instead of applying one by one? Follow example of the code:

base_teste$sex<-factor(base_teste$sex, levels = c(' Female', ' Male'), labels = c(0, 1))
base_treinamento$sex<-factor(base_treinamento$sex, levels = c(' Female', ' Male'), labels = c(0, 1))

Another example, use the function abs() both at the same time, instead of:

base_teste<-abs(base_teste)
base_treinamento<-abs(base_treinamento)

I tried to use the primitive function c(), but I was unsuccessful.

2 answers

5


Is not possible

manipulate two variables at the same time

as it says in the question. But one can manipulate one variable at a time in a simpler way.

The best way is to define functions that transform in a standardized way. This simplifies the call of functions.

In the first case, a function is defined with the values of levels and of labels predefined.

sex2factor <- function(x, levels = c(' Female', ' Male'), labels = c(0, 1)){
  y <- as.character(x)
  factor(y, levels = levels, labels = labels)
}

base_teste$sex <- sex2factor(base_teste$sex)
base_treinamento$sex <- sex2factor(base_treinamento$sex)

In the second case, abs is a generic primitive internal function and methods can be defined for it. The function below is a method for S3 class objects "data.frame".

abs.data.frame <- function(x){
  i <- sapply(x, is.numeric)
  x[i] <- lapply(x[i], abs)
  x
}

base_teste <- abs(base_teste)
base_treinamento <- abs(base_treinamento)
  • Excellent! I thought there was some form and it was simple. Creating functions really helps a lot. They could create a package that would allow this multiple manipulation.

2

Like answered by @Macros-Unes, it is not possible to manipulate two variables at the same time; the best practice in R, if you have any procedure that will repeat several times, is to write a function. What you can do, if you have multiple databases that follow the same pattern, is put the data.frames in a list and use lapply to apply functions to all elements of the list:

base_lista <- list(teste = base_teste, treino = base_treinamento)

sex2factor <- function(df, var = "sex", levels = c("Female", "Male"), labels = c(0, 1)) {
  df[[var]] <- factor(as.character(df[[var]]), levels, labels)
  df
}

base_lista <- lapply(base_lista, sex2factor)

Note that as it will be applied to a list of data frames., the function was written to work with a data.frame and not a variable.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.