Renaming the levels of a factor based on a data frame

Asked

Viewed 899 times

5

Suppose I have the date frame iris, present in the memory of R:

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Suppose I also have a data frame called flores, with the following structure:

flores <- data.frame(Especies=c("setosa", "virginica", "versicolor"), 
                     Nome=c("Flor 1", "Flor 2", "Flor 3"))
    Especies   Nome
1     setosa Flor 1
2  virginica Flor 2
3 versicolor Flor 3

I would like to replace the occurrences of iris$Species by flores$Nome. I mean, I’d like every occurrence of setosa in iris$Species be replaced by Flor 1; each occurrence of virginica in iris$Species be replaced by Flor 2; and each occurrence of versicolor in iris$Species be replaced by Flor 3.

Use something like if or ifelse It’s out of the question, because the data set I’m working with has thousands of occurrences of different species. It would be impossible to type in all the options I have to work with.

2 answers

5

I believe the following code settles the question.
However, I had some problems with the columns involved because they are class factor. First, it includes the argument stringsAsFactors in the creation of the data frame flores. And then I turned the column Species in character.

flores <- data.frame(Especies=c("setosa", "virginica", "versicolor"), 
                     Nome=c("Flor 1", "Flor 2", "Flor 3"),
                     stringsAsFactors = FALSE)

iris$Species <- as.character(iris$Species)

for(s in unique(iris$Species)){
    iris$Species[iris$Species == s] <- flores$Nome[flores$Especie == s]
}

iris$Species <- factor(iris$Species)    # voltar a factor

If the column Nome of flores have to be factor then you must use

iris$Species[inx] <- as.character(flores$Nome[flores$Especie == s])

within the cycle for.

3


I would make a left_join and then delete the variable. For example:

> library(dplyr)
> flores <- data.frame(Especies=c("setosa", "virginica", "versicolor"), 
+                      Nome=c("Flor 1", "Flor 2", "Flor 3"))
> 
> iris <- left_join(iris, flores, by = c("Species" = "Especies")) %>%
+   select(-Species) %>%
+   rename(Species = Nome)
> 
> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  Flor 1
2          4.9         3.0          1.4         0.2  Flor 1
3          4.7         3.2          1.3         0.2  Flor 1
4          4.6         3.1          1.5         0.2  Flor 1
5          5.0         3.6          1.4         0.2  Flor 1
6          5.4         3.9          1.7         0.4  Flor 1

Use case_when could also be an option, but not if you already have this data.frame of names.

In time, there is the function fct_recode package forcats:

Browser other questions tagged

You are not signed in. Login or sign up in order to post.