5
I’m having trouble coding the variable rota
in the R so that it assumes a unique value when the route is the same, independent of the point of origin (first 4 characters of the route variable) and destination (last 4 characters). The base is as follows:
base <- data.frame(rota = c("SBAA - SBEE", "SBAA - SBBR", "SBAA - SBCI",
"SBEE - SBAA", "SBEE - SBBR", "SBBR - SBEE"),
assentos = c(1231, 1021, 715, 989, 759, 695))
base$rota<-as.character(base$rota)
rota assentos
<chr> <dbl>
1 SBAA - SBBE 1231
2 SBAA - SBBR 1021
3 SBAA - SBCI 715
4 SBEE - SBAA 989
5 SBEE - SBBR 759
6 SBBR - SBEE 695
I thought of making a transformation to generate the variable codigo
:
codigo<-as.numeric(as.factor(rota))
However, the output will be different for equal routes (same airports connecting), but having airport of origin and destination reversed. For example, "SBAA - SBBE" and SBBE - SBAA" should have the same code, but will remain as follows:
rota assentos codigo
<chr> <dbl> <dbl>
1 SBAA - SBEE 1231 1
2 SBAA - SBBR 1021 2
3 SBAA - SBCI 715 3
4 SBEE - SBAA 989 4
5 SBEE - SBBR 759 5
6 SBBR - SBEE 695 6
I need the routes that have the same connecting airports to have the same code so that the variable "code" returns the following result:
rota assentos codigo
<chr> <dbl> <dbl>
1 SBAA - SBEE 1231 1
2 SBAA - SBBR 1021 2
3 SBAA - SBCI 715 3
4 SBEE - SBAA 989 1
5 SBEE - SBBR 759 4
6 SBBR - SBEE 695 4
Note that the code for "SBAA - SBEE" is identical to "SBEE - SBAA".
Solution
library(dplyr)
library(stringr)
library(purrr)
base %>%
mutate(codigo = as.integer(factor(map_chr(str_extract_all(rota,
"\\w+"), ~ str_c(sort(.x), collapse=" - ")))))
base[["rota"]] <- as.character(base[["rota"]])
base[["rota_unica"]] <- unlist(lapply(strsplit(base[["rota"]], " - "), function(x){
 x <- sort(x, method="radix")
 paste0(x, collapse= " - ")
}))
– JdeMello