Select multiple lines of a data.frame from the highest R values

Asked

Viewed 4,518 times

4

I have the following date.frame in R:

df <- data.frame(x = c(10,10,2,3,4,8,8,8),
                 y = c(5,4,6,7,8,3,2,4))
df
   x y
1 10 5
2 10 4
3  2 6
4  3 7
5  4 8
6  8 3
7  8 2
8  8 4

First point: I would like to get all lines containing the 5 highest values in the column x, can repeat.

Example:

The five largest in the column x sane: 10, 10, 8, 8, 8.

I can get it with the following code:

rev(sort(df$x))[1:5]
[1] 10 10  8  8  8

But I’d like to take the whole row, not just the column values x. Therefore, the result I desire is:

1 10 5
2 10 4
6  8 3
7  8 2
8  8 4

And not:

> [1] 10 10  8  8  8

2 answers

4


Using the package dplyr:

library(dplyr)
df %>%
  top_n(x, n=5)
   x y
1 10 5
2 10 4
3  8 3
4  8 2
5  8 4

Using order, one of the standard functions of R:

df[order(df$x, decreasing=TRUE), ][1:5, ]
   x y
1 10 5
2 10 4
6  8 3
7  8 2
8  8 4

Realize that the solution with dplyr creates an output unrelated to the old data frame, while the solution with order informs you which lines of the original data frame were kept in this current selection.

  • If I wanted to take the repetitions? In this case, the values of the lines were 10,8,4,3,2. How would it be?

  • I want to take all the lines that it contains in the "X" column the 10,8,4,3,2.

2

To complement, how to do data.table:

library(data.table)
setDT(df)
df[order(x, decreasing = T),][1:5,]
    x y
1: 10 5
2: 10 4
3:  8 3
4:  8 2
5:  8 4

To remove duplicates in the column x, sort by x and catch the first 5:

df[!duplicated(x),][order(x, decreasing = T), ][1:5, ]
  • Show! completed the code with: b <- df[! duplicated(x),][order(x, decreasing = T), ][1:5, ] df[which(df$x %in% b),]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.