How to filter a data frame?

Question

How to filter a data frame?

Asked 11 years, 2 months ago

Viewed 14,389 times

5

I have a 5597 rows and 7 columns data frame. I would like to filter the results of that data frame so that only the lines in which the second column is written "AC" appear. I tried to use the command dr=subset(df, df[2]=="AC"), where df is my own data frame and 2 is the column where "AC" appears. Unfortunately, the command did not work. There is something I can do to improve the code?

At first your code is ok. What error do you receive? Also put a sample of your data.

– Carlos Cinelli

2014/05/28 at 19:05

2 answers

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Carlos Cinelli • **16,826** points · Answer 1 · 2014-05-29T00:42:58+00:00

Well, in principle your code is correct, it should subset the data, what may have occurred is some other problem that would only be possible to verify with the specific case.

Showing in a sample data frame:

set.seed(1)
df <- data.frame(valor= rnorm(100), categoria = rep(c("AB", "AC"), 50), stringsAsFactors=FALSE)
dr <- subset(df, df[2]=="AC")

See that dr has only lines whose second column is "AC":

unique(dr[2])
  categoria
2        AC

head(dr)
        valor categoria
2   0.1836433        AC
4   1.5952808        AC
6  -0.8204684        AC
8   0.7383247        AC
10 -0.3053884        AC
12  0.3898432        AC

There are several other ways to filter a data frame. One of them would be to use the operator [ of R. Example:

dr <- df[df[2]=="AC", ]

or

dr <- df[df$categoria=="AC", ]

There are also specific packages for data handling. An excellent package for this is the dplyr, because it is quite fast and has an intuitive syntax (for example, the filter command is called "filter").

In the dplyr would look like this:

library(dplyr)
dr <- df%>%filter(categoria=="AC")

If you will work a lot with databases, it is worth taking a look.

by bbiasi • **774** points · Answer 2 · 2019-05-27T16:19:11+00:00

I also provide an example from the package data.table using the @Carlos Cinelli example data.

library(data.table)

set.seed(1)
df <- data.frame(valor = rnorm(100),
                 categoria = rep(c("AB", "AC"), 50), 
                 stringsAsFactors=FALSE)

df <- data.table::data.table(df)
df <- df[categoria == "AC"]