Delete lines containing NA in a data frame

Asked

Viewed 15,918 times

11

I have a data frame and in the fourth column there are several NA cells. I would like to know how to delete all lines that have NA. I used this command but they keep showing up dataframe1

r <- with(dataframe1, which(dataframe1[4]==NA, arr.ind=TRUE))
newd <- dataframe1[-r, ]

The structure of my data is:

dput(head(dataframe1, 10))

structure(list(Sigla = c("AC", "AC", "AC", "AC", "AC", "AC", 
"AC", "AC", "AC", "AC"), Código = c(1200013L, 1200054L, 1200104L, 
1200138L, 1200179L, 1200203L, 1200252L, 1200302L, 1200328L, 1200336L
), MunicÃ.pio = c("Acrelândia", "Assis Brasil", "Brasiléia", 
"Bujari", "Capixaba", "Cruzeiro do Sul", "Epitaciolândia", "Feijó", 
"Jordão", "Mâncio Lima"), `numero de homicidios` = c(4L, NA, 
1L, NA, 1L, 1L, NA, 1L, NA, 1L), `media escolaridade` = c(3.268, 
3.72, 3.788, 2.816, 2.417, 4.108, 3.681, 1.948, 1.038, 3.537), 
    rendimento = c(1042.3834261349, 429.2221666106, 2243.2492197717, 
    786.6815828794, 603.835515482, 9363.3159742031, 1503.420009265, 
    1737.0793588989, 130.7838314018, 1040.2388777272), populacao = c(7935L, 
    3490L, 17013L, 5826L, 5206L, 67441L, 11028L, 26722L, 4454L, 
    11095L)), .Names = c("Sigla", "Código", "MunicÃ.pio", "numero de homicidios", 
"media escolaridade", "rendimento", "populacao"), row.names = c(NA, 
10L), class = "data.frame")
  • Follow the structure of my code using dput(head(dataframe1, 1)):dput(head(dataframe1, 1)) Structure(list(Acronym = "AC", Código = 1200013L, Municã.pio = "Acrelã ndia", numero de homicidios = 4L, media escolaridade = 3.268, yield = 1042.3834261349, population = 7935L), . Names = c("Acronym", "Código", "Municã.pio", "number of homicides", "schooling media", "performance", "populace"), Row.Names = 1L, class = "data.frame")

2 answers

10

There are two solutions. If you want to omit all NA from the data.frame, you can use the function na.omit.

For example, suppose a date.frame with two columns, where there are NA’s in both columns.

### Construindo um data.frame de exemplo ###
set.seed(1)
df <- data.frame(x=rnorm(100), y = rnorm(100))
df[sample(1:100,20),1] <- NA
df[sample(1:100,20),2] <- NA

The command na.omit will remove all lines that have at least one NA:

df2 <- na.omit(df)

But if you want to omit only lines that have NA in a specific column, you can use the function is.na to do the data.frame. subset is.na returns TRUE if the value is NA, this way you will deny ! the result in subset.

For example, the command below only removes lines that have NA in x:

df3 <- df[!is.na(df$x),]
  • 1

    Thank you very much :)

3

Another option is to use the function complete.cases

df2 <- df[complete.cases(df),]

The function complete.cases returns a logical vector of TRUE and FALSE. As used in the example above, only cases are selected with all observations that do not contain any variable with AN.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.