Sort the k highest results using dplyr

Asked

Viewed 256 times

5

I can select the k greatest results from a table in R. For example, if k equals 5, I get the following result:

library(dplyr)
library(ggplot2)

top_n(mpg, 5, wt=displ)
# A tibble: 5 × 11
  manufacturer              model displ  year   cyl      trans   drv   cty
         <chr>              <chr> <dbl> <int> <int>      <chr> <chr> <int>
1    chevrolet           corvette   6.2  2008     8 manual(m6)     r    16
2    chevrolet           corvette   6.2  2008     8   auto(s6)     r    15
3    chevrolet           corvette   7.0  2008     8 manual(m6)     r    15
4    chevrolet    k1500 tahoe 4wd   6.5  1999     8   auto(l4)     4    14
5         jeep grand cherokee 4wd   6.1  2008     8   auto(l5)     4    11
# ... with 3 more variables: hwy <int>, fl <chr>, class <chr>

However, my results are not sorted according to the column displ. I would like the table lines to be in descending order, as follows:

top_n(mpg, 5, wt=displ)[order(top_n(mpg, 5, wt=displ)$displ, decreasing=TRUE), ]
# A tibble: 5 × 11
  manufacturer              model displ  year   cyl      trans   drv   cty
         <chr>              <chr> <dbl> <int> <int>      <chr> <chr> <int>
1    chevrolet           corvette   7.0  2008     8 manual(m6)     r    15
2    chevrolet    k1500 tahoe 4wd   6.5  1999     8   auto(l4)     4    14
3    chevrolet           corvette   6.2  2008     8 manual(m6)     r    16
4    chevrolet           corvette   6.2  2008     8   auto(s6)     r    15
5         jeep grand cherokee 4wd   6.1  2008     8   auto(l5)     4    11
# ... with 3 more variables: hwy <int>, fl <chr>, class <chr>

The code works, but I’m finding it ugly. How could I simplify it to get the same result? Note that I use the command top_n(mpg, 5, wt=displ) twice, which I imagine can slow my code down if the table is too big. Is there any way to get this same result more elegantly?

  • About the "ugly" part: use the syntax of dplyr with %>% would not help?

2 answers

4


The dplyr makes use of chaining and pipe Operator (%>%) to improve the reading of the code and make it more succinct. In addition, it also has the function arrange() to sort the results.

res1 <- top_n(mpg, 5, wt=displ)[order(top_n(mpg, 5, wt=displ)$displ, decreasing=TRUE), ]

res2 <- mpg %>% top_n(5, displ) %>% arrange(desc(displ))

identical(res1, res2)
[1] TRUE

The documentation on pipe Operator in stackoverflow is excellent.

  • 1

    I’m starting to use the dplyr and did not know the arrange. Thanks for the tip.

3

Other ways to do the same thing:

library(dplyr)

mpg %>% arrange(desc(displ)) %>% slice(1:5)
mpg %>% filter(row_number(desc(displ)) <= 5)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.