How does the `dplyr::n()` function know that it is not being called from the global environment?

Asked

Viewed 71 times

2

When calling the function dplyr::n() in the global environment, an error occurs.

n()
# Error: This function should not be called directly

This error makes sense and I was curious to see how it was implemented.

n
# function () 
# {
#     abort("This function should not be called directly")
# }
# <bytecode: 0x000000001650f200>
# <environment: namespace:dplyr>

To my surprise, however, there is no if or condition check. Just play error. Same does not occur when we call n() in his habitat natural.

mtcars %>% 
  group_by(cyl) %>% 
  summarise(n = n())

# # A tibble: 3 x 2
#     cyl     n
#   <dbl> <int>
# 1     4    11
# 2     6     7
# 3     8    14

So the questions that remain are two:

  1. As the function n() knows that it is being called in another context? and
  2. As the function n() account? (where is the source code of that party)

1 answer

2


The function n only works within the dplyr and is part of an internal part of the package that is called Hybrid Evaluation. The full description is here.

Hybrid Evaluation is one of the leading dplyr be quick to some tasks.

At first, when you make a summarise, for example summarise(n = n()) the dplyr would need to perform this function for each piece of the base. This could be costly if the base has many groups, for example. Hence the dplyr recognizes some expressions such as n(), sum(variavel) and handles them directly using a C code++.

In the case of the function n() the input port for your setting is in this file: https://github.com/tidyverse/dplyr/blob/master/inst/include/dplyr/hybrid/scalar_result/n.h#L1

Therefore, in fact the function of R n() does not know that it is being called in another context, in fact, is the dplyr which changes its meaning when the function is used within a mutate or summarise.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.