Add rownames as column using dplyr

Asked

Viewed 1,143 times

13

I would like to do something that is quite simple using the common R syntax, but using the package dplyr.

The task is basically to add the row.names of an object data.frame as column on that same object. Using mtcars as an example, it could be done like this:

dados <- mtcars
dados$nomes <- row.names(mtcars)

I’d like to do something like

dados <- mtcars %>% mutate(nomes=row.names(.))

But that code makes the mistake Error: unsupported type for column 'nomes' (NILSXP) ('Cause I’m doing something wrong).

I wonder if there is a way to solve this "problem".

  • 1

    Good question! I would also like to know.

  • +1 very good question.

4 answers

17


Heed: update on magrittr 1.5

From the magrittr 1.5, point (.) of the operator %>% works with nested calls. Thus, it correctly replaces the point within row.names(.) and now the example works normally without any modification.

dados <- mtcars %>% mutate(nomes=row.names(.))
head(dados)
   mpg cyl disp  hp drat    wt  qsec vs am gear carb             nomes
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4         Mazda RX4
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     Mazda RX4 Wag
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1        Datsun 710
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    Hornet 4 Drive
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 Hornet Sportabout
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1           Valiant

Answer given before magrittr 1.5

Complementing the response of Roger.

What the %>% is making?

If you take the code from %>%, Roughly speaking, it creates a new environment and plays what is on the left side in this environment. Then take the command that is on the right side, modify some things, and run the modified command within this new environment.

For example, if you rotate mtcars %>% mutate(., nomes = row.names(.)), the left side is mtcars and the right side is mutate(., nomes = row.names(.)):

lhs <- substitute(mtcars)
rhs <- substitute(mutate(., nomes = row.names(.)))

Create a new environment and name for the left side:

env <- new.env(parent = parent.frame())
nm <- paste(deparse(lhs), collapse = "")

Save the left side in the new environment with the name created:

env[[nm]] <- eval(lhs, env)

#Para ver que o objeto foi criado:
head(env$mtcars)

Now you need to swap the points on the right. The part that identifies where the points are:

dots <- c(FALSE, vapply(rhs[-1], identical, quote(.), 
                              FUN.VALUE = logical(1)))

But note that it only runs through the first level of the call.

dots
            nomes 
FALSE  TRUE FALSE 

When replacing, therefore, only the first point is replaced:

 rhs[dots] <- rep(list(as.name(nm)), sum(dots))
 e <- rhs
 e
 # veja que apenas o primeiro ponto foi substituído
 mutate(mtcars, nomes = row.names(.))

Thus, when you run the function in the env environment, as there is no object called "." error will occur:

eval(e, env)
Erro em row.names(.) : objeto '.' não encontrado

The solution to this would be for the replacement part to occur at all levels of the call. For example, if we change the other point of e manually:

e[[3]][[2]] <- as.name("mtcars")

Now it works:

eval(e, env)
# resultado omitido porque é grande

Why did it work with the %.% putting '_prev'?

The function behind the %.% is chain_q. To see the code, type dplyr:::chain_q.

function (calls, env = parent.frame()) 
{
    if (length(calls) == 0) 
        return()
    if (length(calls) == 1) 
        return(eval(calls[[1]], env))
    e <- new.env(parent = env)
    e$`__prev` <- eval(calls[[1]], env)
    for (call in calls[-1]) {
        new_call <- as.call(c(call[[1]], quote(`__prev`), as.list(call[-1])))
        e$`__prev` <- eval(new_call, e)
    }
    e$`__prev`
}

Note that the function creates a new environment called e and save the first call from the chain of commands with the name '_prev' (e$'__prev' <- eval(calls[[1]], env). That’s why you can access the result of the previous command in this way.

Hacking %>% (for illustration only)

If we set up a function that switches all the dots, like this one (based on this question from Soen):

convert.call <- function(x, replacement) {
  if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
    if (identical(x, quote(.))) as.name(replacement) else
      x
}
# testando
expr <- substitute(mean(exp(sqrt(.)), .))
convert.call(expr, "x")
# mean(exp(sqrt(x)), x)

Then we can hack the definition of %>% to make all points be exchanged:

`%>%` <- function (lhs, rhs) 
{
  convert.call <- function(x, replacement) {
    if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
      if (identical(x, quote(.))) as.name(replacement) else
        x
  }
  
  lhs <- substitute(lhs)
  rhs <- substitute(rhs)
  if (is.call(rhs) && identical(rhs[[1]], quote(`(`))) 
    rhs <- eval(rhs, parent.frame(), parent.frame())
  if (!any(is.symbol(rhs), is.call(rhs), is.function(rhs))) 
    stop("RHS should be a symbol, a call, or a function.")
  env <- new.env(parent = parent.frame())
  nm <- paste(deparse(lhs), collapse = "")
  nm <- if (nchar(nm) < 9900 && (is.call(lhs) || is.name(lhs))) 
    nm
  else "__LHS"
  env[[nm]] <- eval(lhs, env)
  if (is.function(rhs)) {
    res <- withVisible(rhs(env[[nm]]))
  }
  else if (is.call(rhs) && deparse(rhs[[1]]) == "function") {
    res <- withVisible(eval(rhs, parent.frame(), parent.frame())(eval(lhs, 
                                                                      parent.frame(), parent.frame())))
  }
  else {
    if (is.symbol(rhs)) {
      if (!exists(deparse(rhs), parent.frame(), mode = "function")) 
        stop("RHS appears to be a function name, but it cannot be found.")
      e <- call(as.character(rhs), as.name(nm))
    }
    else {
      e <- convert.call(rhs, nm)
    }
    res <- withVisible(eval(e, env))
  }
  if (res$visible) 
    res$value
  else invisible(res$value)
}

See that now mtcars %>% mutate(., nomes = row.names(.)) works. But I put this here just to explain what’s going on, I wouldn’t recommend you using the hacked version of %>% as it may cause bugs on other occasions --- for example, the way it is you will explicitly have to put the points all the time, as in mtcars %>% filter(., cyl==4) %>% mutate(., nomes = row.names(.)).

dplyr does not necessarily maintain Row.Names in operations

One last note: dplyr (nor data.table) does not keep Row.Names intact during operations. Note that dplyr replaces Row.Names in filter and data.table replaces already when you convert the data.frame:

mt_dplyr <- filter(mtcars, cyl==4)
row.names(mt_dplyr)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

mt_dt <- data.table(mtcars)
row.names(mt_dt)
1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
[23] "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"

So, at the end of the day, if Row.Names contains relevant information, it seems safer to turn it into a column before further manipulating the data.

An alternative "solution": creating your own function mutate that has a row_names local

One solution that can be made is the following: you create your own mutate that stores a vector row_names within its parent environment (which in the context will be the environment of %>%, but if you use the function alone it will be the global environment, then care) and then perform dplyr mutate in this environment. So if you want to use the line names, just use the object row_names. Let’s call our mutate of mutate2:

mutate2 <- function(x, ...){
  assign("row_names", row.names(x), parent.frame())
  eval(substitute(mutate(x, ...)), parent.frame())
}

mtcars %>% mutate2(z = cyl^2, nomes=row_names) %>% filter(z==36)

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb  z          nomes
1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 36      Mazda RX4
2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 36  Mazda RX4 Wag
3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 36 Hornet 4 Drive
4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 36        Valiant
5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 36       Merc 280
6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 36      Merc 280C
7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 36   Ferrari Dino
  • You mean you can’t put rownames in the df column just with the %>% ?

  • @Athos, this, the way the package looks mutate(., nomes = row.names(.)) it won’t work because the second point isn’t replaced, and I can’t imagine a simple way to solve it.

  • 2

    Fantastic reviews! Thank you!

  • @Juliotrecenti added an alternative, creating his own mutate2 that has an Row.No vector if you need to use it

  • 1

    Super stylish! Tks

  • @Juliotrecenti there was a bug in the function, it is now fixed.

Show 1 more comment

8

Julio,

I couldn’t think of a solution using the dplyr, but a simple solution that might make the code cleaner is, create a function row_namesas follows:

row_names <- function(x, var){
  var <- deparse(substitute(var))
  x[var] <- row.names(x)
  return(x)
}

Then you can use it like this:

mtcars %>% row_names(nomes) %>% filter(cyl == 6)

Perhaps thus the efficiency of the dplyr be lost, but look cute..

Edit:

It is possible to write a function that does something similar to Roger’s first solution, using the dplyr

row_names_d <- function(x, var){
  var <- deparse(substitute(var))
  x <- mutate(x, rn = row.names(x))
  names(x)[length(names(x))] <- var
  return(x)
}

mtcars %>% row_names_d(nome)

But I benchmark and it doesn’t seem worth it...

> library(microbenchmark)
> microbenchmark(
+   
+   mtcars %>% row_names(nome),
+   mtcars %>% row_names_d(nome)
+   
+   
+   )
# Unit: microseconds
#                         expr     min       lq   median      uq     max neval
#   mtcars %>% row_names(nome) 183.334 194.0015 202.4965 210.399 326.760   100
# mtcars %>% row_names_d(nome) 244.972 259.5905 268.4810 279.149 551.581   100
  • 1

    Very good, Daniel! I will accept this as an answer because it is in fact a solution. I recommend that everyone look at the answer of Rogério and Carlos, because they show an alternative solution and a class of R, respectively.

  • 1
    • 1 cool Daniel
  • 1

    Just one comment: Sopt’s responses here are getting pretty cool, the community is growing!

5

  • 1

    Thank you! the %.% with `__Prev` help. The other solution doesn’t exactly solve my problem, because I wanted to just "avoid" using the previous df as an argument, making two assignments. For example, I wouldn’t be able to do it if I wanted to dados %>% filter(cyl==2) %>% mutate(names=row.names(dados)), unless I kept the filtered comic in another variable, and I didn’t want that.

4

There is a function called rownames_to_column package tibble that allows you to do this:

mtcars %>% rownames_to_column()

Browser other questions tagged

You are not signed in. Login or sign up in order to post.