Delete lines containing NA in a data frame

Question

Delete lines containing NA in a data frame

Asked 11 years, 2 months ago

Viewed 15,918 times

11

I have a data frame and in the fourth column there are several NA cells. I would like to know how to delete all lines that have NA. I used this command but they keep showing up dataframe1

r <- with(dataframe1, which(dataframe1[4]==NA, arr.ind=TRUE))
newd <- dataframe1[-r, ]

The structure of my data is:

dput(head(dataframe1, 10))

structure(list(Sigla = c("AC", "AC", "AC", "AC", "AC", "AC", 
"AC", "AC", "AC", "AC"), CÃ³digo = c(1200013L, 1200054L, 1200104L, 
1200138L, 1200179L, 1200203L, 1200252L, 1200302L, 1200328L, 1200336L
), MunicÃ.pio = c("AcrelÃ¢ndia", "Assis Brasil", "BrasilÃ©ia", 
"Bujari", "Capixaba", "Cruzeiro do Sul", "EpitaciolÃ¢ndia", "FeijÃ³", 
"JordÃ£o", "MÃ¢ncio Lima"), `numero de homicidios` = c(4L, NA, 
1L, NA, 1L, 1L, NA, 1L, NA, 1L), `media escolaridade` = c(3.268, 
3.72, 3.788, 2.816, 2.417, 4.108, 3.681, 1.948, 1.038, 3.537), 
    rendimento = c(1042.3834261349, 429.2221666106, 2243.2492197717, 
    786.6815828794, 603.835515482, 9363.3159742031, 1503.420009265, 
    1737.0793588989, 130.7838314018, 1040.2388777272), populacao = c(7935L, 
    3490L, 17013L, 5826L, 5206L, 67441L, 11028L, 26722L, 4454L, 
    11095L)), .Names = c("Sigla", "CÃ³digo", "MunicÃ.pio", "numero de homicidios", 
"media escolaridade", "rendimento", "populacao"), row.names = c(NA, 
10L), class = "data.frame")

Follow the structure of my code using dput(head(dataframe1, 1)):dput(head(dataframe1, 1)) Structure(list(Acronym = "AC", CÃ³digo = 1200013L, Municã.pio = "Acrelã ndia", numero de homicidios = 4L, media escolaridade = 3.268, yield = 1042.3834261349, population = 7935L), . Names = c("Acronym", "CÃ³digo", "Municã.pio", "number of homicides", "schooling media", "performance", "populace"), Row.Names = 1L, class = "data.frame")

– user7004

2014/05/27 at 23:49

2 answers

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Carlos Cinelli • **16,826** points · Answer 1 · 2014-05-27T23:09:01+00:00

There are two solutions. If you want to omit all NA from the data.frame, you can use the function na.omit.

For example, suppose a date.frame with two columns, where there are NA’s in both columns.

### Construindo um data.frame de exemplo ###
set.seed(1)
df <- data.frame(x=rnorm(100), y = rnorm(100))
df[sample(1:100,20),1] <- NA
df[sample(1:100,20),2] <- NA

The command na.omit will remove all lines that have at least one NA:

df2 <- na.omit(df)

But if you want to omit only lines that have NA in a specific column, you can use the function is.na to do the data.frame. subset is.na returns TRUE if the value is NA, this way you will deny ! the result in subset.

For example, the command below only removes lines that have NA in x:

df3 <- df[!is.na(df$x),]

by Laura • **123** points · Answer 2 · 2015-01-24T01:06:05+00:00

Another option is to use the function complete.cases

df2 <- df[complete.cases(df),]

The function complete.cases returns a logical vector of TRUE and FALSE. As used in the example above, only cases are selected with all observations that do not contain any variable with AN.