A: Creating a new variable using if and Else

Asked

Viewed 1,276 times

2

I’m with a base of 9 million lines. I’m trying to make a new variable called base$trs_localidade using the variable base$iso_pais being:

base$iso_pais<- c( '076', '840', '076','442' , '052' , '076')

if(x = '076'){
  print('NACIONAL')
} else {
    print = ('INTERNACIONAL')
  }

My limitation is how to create the variable base$trs_localidade using the above logic.

  • Cannot create variables like this. You must create variable base$trs_localidade, assign value to it by country and follow a stream or another based on its content.

3 answers

4


To create variables I suggest using dplyr package with function employment mutate. Following are examples.

library(dplyr)
base <- data.frame(x = c(rep("076", 4), "840", "442",rep("076", 4)))
base
> base
     x
1  076
2  076
3  076
4  076
5  840
6  442
7  076
8  076
9  076
10 076
  • You can continue using the ifelse, but I prefer the structure shown here.
base1 <- base %>% 
  dplyr::mutate(iso_pais = ifelse(x == "076", "NACIONAL", "INTERNACIONAL"))
base1
> base1
     x      iso_pais
1  076      NACIONAL
2  076      NACIONAL
3  076      NACIONAL
4  076      NACIONAL
5  840 INTERNACIONAL
6  442 INTERNACIONAL
7  076      NACIONAL
8  076      NACIONAL
9  076      NACIONAL
10 076      NACIONAL
  • Or you can use case_when of the dplyr package.
base2 <- base %>% 
  dplyr::mutate(iso_pais = dplyr::case_when(x == "076" ~ "NACIONAL",
                                            x != "076" ~ "INTERNACIONAL"))
base2
> base2
     x      iso_pais
1  076      NACIONAL
2  076      NACIONAL
3  076      NACIONAL
4  076      NACIONAL
5  840 INTERNACIONAL
6  442 INTERNACIONAL
7  076      NACIONAL
8  076      NACIONAL
9  076      NACIONAL
10 076      NACIONAL

And depending on your problem, there are other n solutions, like this: How to create a column in R under specific conditions?

3

Even more an alternative. Create the new variable filled with 'INTERNATIONAL' and then change only the lines you want using basic indexing:

base <- data.frame(
  iso_pais = c( '076', '840', '076','442' , '052' , '076')
)

base$trs_localidade <- 'INTERNACIONAL'
base$trs_localidade[base$iso_pais == '076'] <- 'NACIONAL'

> base
  iso_pais trs_localidade
1      076       NACIONAL
2      840  INTERNACIONAL
3      076       NACIONAL
4      442  INTERNACIONAL
5      052  INTERNACIONAL
6      076       NACIONAL

Potentially faster on large bases than using ifelse.

  • I used the above script, it worked! Thank you, very practical!

3

On the basis of bbiasi response with x moved to iso_pais as it comes in the question, and using only R base one can make of these two following modes.

The first is in fact the first ifelse from @bbiasi, but without the dplyr::mutate. The second mode is an indexing trick that can have performance advantages if the base is too large but is less readable than the first.

base <- data.frame(iso_pais = c(rep("076", 4), "840", "442",rep("076", 4)))

base$trs_localidade <- ifelse(base$iso_pais == '076', 'NACIONAL', 'INTERNACIONAL')
base$trs_localidade2 <- c('INTERNACIONAL', 'NACIONAL')[(base$iso_pais == '076') + 1L]
base
#   iso_pais trs_localidade trs_localidade2
#1       076       NACIONAL        NACIONAL
#2       076       NACIONAL        NACIONAL
#3       076       NACIONAL        NACIONAL
#4       076       NACIONAL        NACIONAL
#5       840  INTERNACIONAL   INTERNACIONAL
#6       442  INTERNACIONAL   INTERNACIONAL
#7       076       NACIONAL        NACIONAL
#8       076       NACIONAL        NACIONAL
#9       076       NACIONAL        NACIONAL
#10      076       NACIONAL        NACIONAL

Browser other questions tagged

You are not signed in. Login or sign up in order to post.