Heed: update on magrittr 1.5
From the magrittr 1.5, point (.) of the operator %>%
works with nested calls. Thus, it correctly replaces the point within row.names(.)
and now the example works normally without any modification.
dados <- mtcars %>% mutate(nomes=row.names(.))
head(dados)
mpg cyl disp hp drat wt qsec vs am gear carb nomes
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Datsun 710
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive
5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 Valiant
Answer given before magrittr 1.5
Complementing the response of Roger.
What the %>%
is making?
If you take the code from %>%
, Roughly speaking, it creates a new environment and plays what is on the left side in this environment. Then take the command that is on the right side, modify some things, and run the modified command within this new environment.
For example, if you rotate mtcars %>% mutate(., nomes = row.names(.))
, the left side is mtcars
and the right side is mutate(., nomes = row.names(.))
:
lhs <- substitute(mtcars)
rhs <- substitute(mutate(., nomes = row.names(.)))
Create a new environment and name for the left side:
env <- new.env(parent = parent.frame())
nm <- paste(deparse(lhs), collapse = "")
Save the left side in the new environment with the name created:
env[[nm]] <- eval(lhs, env)
#Para ver que o objeto foi criado:
head(env$mtcars)
Now you need to swap the points on the right. The part that identifies where the points are:
dots <- c(FALSE, vapply(rhs[-1], identical, quote(.),
FUN.VALUE = logical(1)))
But note that it only runs through the first level of the call.
dots
nomes
FALSE TRUE FALSE
When replacing, therefore, only the first point is replaced:
rhs[dots] <- rep(list(as.name(nm)), sum(dots))
e <- rhs
e
# veja que apenas o primeiro ponto foi substituído
mutate(mtcars, nomes = row.names(.))
Thus, when you run the function in the env environment, as there is no object called "." error will occur:
eval(e, env)
Erro em row.names(.) : objeto '.' não encontrado
The solution to this would be for the replacement part to occur at all levels of the call. For example, if we change the other point of e
manually:
e[[3]][[2]] <- as.name("mtcars")
Now it works:
eval(e, env)
# resultado omitido porque é grande
Why did it work with the %.%
putting '_prev'
?
The function behind the %.%
is chain_q
. To see the code, type dplyr:::chain_q
.
function (calls, env = parent.frame())
{
if (length(calls) == 0)
return()
if (length(calls) == 1)
return(eval(calls[[1]], env))
e <- new.env(parent = env)
e$`__prev` <- eval(calls[[1]], env)
for (call in calls[-1]) {
new_call <- as.call(c(call[[1]], quote(`__prev`), as.list(call[-1])))
e$`__prev` <- eval(new_call, e)
}
e$`__prev`
}
Note that the function creates a new environment called e
and save the first call from the chain of commands with the name '_prev'
(e$'__prev' <- eval(calls[[1]], env)
. That’s why you can access the result of the previous command in this way.
Hacking %>% (for illustration only)
If we set up a function that switches all the dots, like this one (based on this question from Soen):
convert.call <- function(x, replacement) {
if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
if (identical(x, quote(.))) as.name(replacement) else
x
}
# testando
expr <- substitute(mean(exp(sqrt(.)), .))
convert.call(expr, "x")
# mean(exp(sqrt(x)), x)
Then we can hack the definition of %>%
to make all points be exchanged:
`%>%` <- function (lhs, rhs)
{
convert.call <- function(x, replacement) {
if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
if (identical(x, quote(.))) as.name(replacement) else
x
}
lhs <- substitute(lhs)
rhs <- substitute(rhs)
if (is.call(rhs) && identical(rhs[[1]], quote(`(`)))
rhs <- eval(rhs, parent.frame(), parent.frame())
if (!any(is.symbol(rhs), is.call(rhs), is.function(rhs)))
stop("RHS should be a symbol, a call, or a function.")
env <- new.env(parent = parent.frame())
nm <- paste(deparse(lhs), collapse = "")
nm <- if (nchar(nm) < 9900 && (is.call(lhs) || is.name(lhs)))
nm
else "__LHS"
env[[nm]] <- eval(lhs, env)
if (is.function(rhs)) {
res <- withVisible(rhs(env[[nm]]))
}
else if (is.call(rhs) && deparse(rhs[[1]]) == "function") {
res <- withVisible(eval(rhs, parent.frame(), parent.frame())(eval(lhs,
parent.frame(), parent.frame())))
}
else {
if (is.symbol(rhs)) {
if (!exists(deparse(rhs), parent.frame(), mode = "function"))
stop("RHS appears to be a function name, but it cannot be found.")
e <- call(as.character(rhs), as.name(nm))
}
else {
e <- convert.call(rhs, nm)
}
res <- withVisible(eval(e, env))
}
if (res$visible)
res$value
else invisible(res$value)
}
See that now mtcars %>% mutate(., nomes = row.names(.))
works. But I put this here just to explain what’s going on, I wouldn’t recommend you using the hacked version of %>%
as it may cause bugs on other occasions --- for example, the way it is you will explicitly have to put the points all the time, as in mtcars %>% filter(., cyl==4) %>% mutate(., nomes = row.names(.))
.
dplyr does not necessarily maintain Row.Names in operations
One last note: dplyr (nor data.table) does not keep Row.Names intact during operations. Note that dplyr replaces Row.Names in filter
and data.table replaces already when you convert the data.frame:
mt_dplyr <- filter(mtcars, cyl==4)
row.names(mt_dplyr)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11"
mt_dt <- data.table(mtcars)
row.names(mt_dt)
1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
[23] "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"
So, at the end of the day, if Row.Names contains relevant information, it seems safer to turn it into a column before further manipulating the data.
An alternative "solution": creating your own function mutate that has a row_names
local
One solution that can be made is the following: you create your own mutate that stores a vector row_names
within its parent environment (which in the context will be the environment of %>%
, but if you use the function alone it will be the global environment, then care) and then perform dplyr mutate in this environment. So if you want to use the line names, just use the object row_names
. Let’s call our mutate
of mutate2
:
mutate2 <- function(x, ...){
assign("row_names", row.names(x), parent.frame())
eval(substitute(mutate(x, ...)), parent.frame())
}
mtcars %>% mutate2(z = cyl^2, nomes=row_names) %>% filter(z==36)
mpg cyl disp hp drat wt qsec vs am gear carb z nomes
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 36 Mazda RX4
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 36 Mazda RX4 Wag
3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 36 Hornet 4 Drive
4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 36 Valiant
5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 36 Merc 280
6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 36 Merc 280C
7 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 36 Ferrari Dino
Good question! I would also like to know.
– Athos
+1 very good question.
– Carlos Cinelli