How to remove a data.frame column in R?

Asked

Viewed 31,940 times

7

Suppose a generic date.frame, such as:

set.seed(1)
dados <- data.frame(y=rnorm(100), x= rnorm(100), z=rnorm(100), w=rnorm(100))
head(dados)
           y           x          z          w
1 -0.6264538 -0.62036668  0.4094018  0.8936737
2  0.1836433  0.04211587  1.6888733 -1.0472981
3 -0.8356286 -0.91092165  1.5865884  1.9713374
4  1.5952808  0.15802877 -0.3309078 -0.3836321
5  0.3295078 -0.65458464 -2.2852355  1.6541453
6 -0.8204684  1.76728727  2.4976616  1.5122127

How do I delete columns from data.frame?

6 answers

6


There are many ways to do this.

The simplest is to assign NULL to the column, for example to remove the column x:

dados$x <- NULL
head(dados)
           y          z          w
1 -0.6264538  0.4094018  0.8936737
2  0.1836433  1.6888733 -1.0472981
3 -0.8356286  1.5865884  1.9713374
4  1.5952808 -0.3309078 -0.3836321
5  0.3295078 -2.2852355  1.6541453
6 -0.8204684  2.4976616  1.5122127

It is also possible to delete several at once by placing the minus sign on the columns you do not want to be selected, for example to delete the first and third columns:

dados<-dados[,-c(1,3)]
head(dados)
            x          w
1 -0.62036668  0.8936737
2  0.04211587 -1.0472981
3 -0.91092165  1.9713374
4  0.15802877 -0.3836321
5 -0.65458464  1.6541453
6  1.76728727  1.5122127

Another way is to reference the columns by name, creating an array of columns to be deleted and leaving in the data.frame only those columns that are not in this array:

excluir <- c("x", "y")
dados <- dados[,!(names(dados)%in% excluir)]
head(dados)
       z          w
1  0.4094018  0.8936737
2  1.6888733 -1.0472981
3  1.5865884  1.9713374
4 -0.3309078 -0.3836321
5 -2.2852355  1.6541453
6  2.4976616  1.5122127

4

To remove only one column, I prefer the mode used by @carloscinelli

dados$x <- NULL
head(dados)
           y          z          w
1 -0.6264538  0.4094018  0.8936737
2  0.1836433  1.6888733 -1.0472981
3 -0.8356286  1.5865884  1.9713374
4  1.5952808 -0.3309078 -0.3836321
5  0.3295078 -2.2852355  1.6541453
6 -0.8204684  2.4976616  1.5122127

As for the other cases, I prefer to use the command subset

To keep the columns x and w, use:

dados <- subset(dados, select = c(x, w))
head(dados)
            x          w
1 -0.62036668  0.8936737
2  0.04211587 -1.0472981
3 -0.91092165  1.9713374
4  0.15802877 -0.3836321
5 -0.65458464  1.6541453
6  1.76728727  1.5122127

to delete the columns x and y, use the sign - before vector with column names

dados <- subset(dados, select = -c(x, y))
head(dados)
           z          w
1  0.4094018  0.8936737
2  1.6888733 -1.0472981
3  1.5865884  1.9713374
4 -0.3309078 -0.3836321
5 -2.2852355  1.6541453
6  2.4976616  1.5122127

It is worth an observation as to the use of the operator []. To keep only the column w we can use:

excluir <- c("x", "y", "z")
dados <- dados[,!(names(dados) %in% excluir)]
head(dados)
[1]  0.8936737 -1.0472981  1.9713374 -0.3836321  1.6541453  1.5122127

but in that case dados is transformed into a vector. To correct use the parameter drop = FALSE

dados <- dados[,!(names(dados) %in% excluir), drop = FALSE]
head(dados)
           w
1  0.8936737
2 -1.0472981
3  1.9713374
4 -0.3836321
5  1.6541453
6  1.5122127

or, if you prefer to use subset

dados <- subset(dados, select = c(w))
head(dados)
           w
1  0.8936737
2 -1.0472981
3  1.9713374
4 -0.3836321
5  1.6541453
6  1.5122127
  • 1

    drop=FALSE is magical, I spent a good few years making codes that had to adapt between date.frame/matrices and vectors...

2

Hello. I usually use the command select:

install.packages("dplyr")
library(dplyr)

dados = dados %>%
  select(x, y, z)

Using your example, I use the select to select only the columns that interest me. To illustrate I did not select the column w. That makes the spine w be excluded.

I hope I’ve helped.

0

With the data.table package it is also possible.

set.seed(1)
dados <- data.frame(y=rnorm(100), x= rnorm(100), z=rnorm(100), w=rnorm(100))
library(data.table)
df <- data.table::as.data.table(dados)
df2 <- df[, names(df)[c(-1)], with = FALSE]
head(df2, 5)
  • Note that when using names(df)[c(-1)] I eliminate the first column of df.
> head(d2f, 5)
             x          z          w
1: -0.62036668  0.4094018  0.8936737
2:  0.04211587  1.6888733 -1.0472981
3: -0.91092165  1.5865884  1.9713374
4:  0.15802877 -0.3309078 -0.3836321
5: -0.65458464 -2.2852355  1.6541453
  • With names(df)[c(-2, -3)] the columns x and z are deleted from the information of df.
df3 <- df[, names(df)[c(-2, -3)], with = FALSE]
head(df3, 5)
> head(df3, 5)
            y          w
1: -0.6264538  0.8936737
2:  0.1836433 -1.0472981
3: -0.8356286  1.9713374
4:  1.5952808 -0.3836321
5:  0.3295078  1.6541453

0

All answers given solve the problem. But, consider this:

  • a database with many variables (e.g. 50 +);

  • there are multiple classes of variables (numeric, character...);

  • that these variables are out of order (e.g. two numeric variables followed by a character class variable and so on).

The best alternative in these cases is to perform the procedure by class of the object, not by its names or indexes.

Borrowing the question database and adding another variable (k), which is of class character, we have:

dados <- data.frame(y=rnorm(100), x= rnorm(100), z=rnorm(100), w=rnorm(100), 
                k = rep(c('a', 'b', 'c', 'd', 'e'), times = 2, each = 10), 
                stringsAsFactors = FALSE)

head(dados)

           y          z          w k
1 -1.5570357  1.5468813  0.8500435 a
2  1.9231637  0.1789210 -0.9253130 a
3 -1.8568296 -0.2825466  0.8935812 a
4 -2.1061184 -0.7672988 -0.9410097 a
5  0.6976485 -0.5764042  0.5389521 a
6  0.9074444 -0.9148558 -0.1819744 a

The function select_if (dplyr) solves this problem when analyzing only the classes of vectors in the data.frame.

Suppose I want to eliminate all class variables numeric:

library(tidyverse)

dados %>% 
  select_if(negate(is.numeric)) %>% 
  head()

  k
1 a
2 a
3 a
4 a
5 a
6 a

Doing this avoids typing "name by name" or searching for indexes manually for each of the variables to be deleted.

0

Unfortunately dados<-dados[,-c(1,3)]didn’t work for me. Already dados <- subset(dados, select = -c(x, y)) worked perfectly but I needed to remove the quotes.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.