What is the use of the functions with underline (_) at the end?

Asked

Viewed 360 times

6

Consider the functions of the following Packages:

dplyr

library(dplyr)

gorup_by_

summarise_

mutate_

transmute_

tidyr

library(tidyr)

gather_

spread_

separate_

unite_
  • What is the usefulness of these functions?
  • When I must use them to the detriment of other functions without the underline at the end?

data.frame reproducible:

set.seed(4321)

dataset <- data.frame(replicate(6, runif(30, 20, 100)), 
                      X7 = rep(c('a', 'b', 'c'), 10, 3))

1 answer

6


The reason these functions still exist today is historical.

The dplyr was thinking about who program interactively, so it provides some facilities to those who are programming as:

  • not having to use quotes to put the variables name
  • do not use the data.frame name all the time.

Examples:

library(dplyr)
mtcars <- mtcars %>% 
  mutate(cyl = cyl*2)

mtcars$cyl <- mtcars$cyl * 2

This makes the user more productive at the time they are programming. What makes this possible is the use of what is called Non Standard Evaluation (NSE). NSE makes programming interactively more enjoyable, but makes code more complicated when you want to create more general functions.

For example, it’s not very intuitive how to do in dplyr if you want the name of a new variable to come from the value of a variable:

> variavel <- "cyl"
> 
> mtcars %>% 
+   mutate(variavel = variavel *2)
Error in variavel * 2 : non-numeric argument to binary operator

The dplyr does not use the value cyl variable. Actually it is trying to create a new variable called variavel whose value will be "cyl" * 2 and it makes a mistake.

Over time, the authors of dplyr proposed several ways to solve this problem. One of them was to include the equivalent functions but with a _ in the end.

These functions used to Standard Evaluation and therefore were useful when creating their own functions, which is called program with dplyr.

See how weird the syntax was using the mutate_ for example:

variavel <- "cyl"
mtcars %>% 
  mutate_(
    .dots = 
      list(lazyeval::interp(~ 2*(var), var = as.name(variavel))) %>% setNames(variavel)
    )

However now the dplyr uses a concept called Tidy Evaluation and this is now the recommended way to program with dplyr. Example:

variavel <- "cyl"
mtcars %>% 
  mutate(!!sym(variavel) := 2*!!sym(variavel))

In short, answering your questions:

  • They were useful for programming with dplyr.
  • Should not be used. The recommended way is to use Tidy Evaluation.

You can find here the version of Vignette which introduced that concept.

Nowadays the documentation of dplyr on these functions says:

dplyr used to Offer twin versions of each Verb suffixed with an underscore. These versions had standard Evaluation (SE) Semantics: rather than taking Arguments by code, like NSE Verbs, they Took Arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses Tidy Evaluation Semantics. NSE Verbs still capture their Arguments, but you can now unquote Parts of These Arguments. This offers full programmability with NSE Verbs. Thus, the underscored versions are now superfluous.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.