How to filter a data frame?

Asked

Viewed 14,389 times

5

I have a 5597 rows and 7 columns data frame. I would like to filter the results of that data frame so that only the lines in which the second column is written "AC" appear. I tried to use the command dr=subset(df, df[2]=="AC"), where df is my own data frame and 2 is the column where "AC" appears. Unfortunately, the command did not work. There is something I can do to improve the code?

  • At first your code is ok. What error do you receive? Also put a sample of your data.

2 answers

6

Well, in principle your code is correct, it should subset the data, what may have occurred is some other problem that would only be possible to verify with the specific case.

Showing in a sample data frame:

set.seed(1)
df <- data.frame(valor= rnorm(100), categoria = rep(c("AB", "AC"), 50), stringsAsFactors=FALSE)
dr <- subset(df, df[2]=="AC")

See that dr has only lines whose second column is "AC":

unique(dr[2])
  categoria
2        AC

head(dr)
        valor categoria
2   0.1836433        AC
4   1.5952808        AC
6  -0.8204684        AC
8   0.7383247        AC
10 -0.3053884        AC
12  0.3898432        AC

There are several other ways to filter a data frame. One of them would be to use the operator [ of R. Example:

dr <- df[df[2]=="AC", ]

or

dr <- df[df$categoria=="AC", ]

There are also specific packages for data handling. An excellent package for this is the dplyr, because it is quite fast and has an intuitive syntax (for example, the filter command is called "filter").

In the dplyr would look like this:

library(dplyr)
dr <- df%>%filter(categoria=="AC")

If you will work a lot with databases, it is worth taking a look.

  • tested and worked!

  • Very good Carlos Cinelli.. Carlos the package would not plyr

  • @Lucascosta non plyr is another package, this is dplyr.

0

I also provide an example from the package data.table using the @Carlos Cinelli example data.

library(data.table)

set.seed(1)
df <- data.frame(valor = rnorm(100),
                 categoria = rep(c("AB", "AC"), 50), 
                 stringsAsFactors=FALSE)

df <- data.table::data.table(df)
df <- df[categoria == "AC"]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.