How to remove line that has Missing?

Question

How to remove line that has Missing?

Asked 10 years, 3 months ago

Viewed 8,758 times

8

I have a database that has some missings (NA’s), but only one variable (one column), and I need to remove the entire row that has Missing.

5 answers

3

The function subset resolve this directly and more clearly, in my opinion. This can be done in conjunction with the function is.na the variable of interest being applied.

> data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
    x           y    z
1   1  1.02572367 TRUE
2   2  0.03988014 TRUE
3   3 -0.33269252   NA
4   4  0.05357787 TRUE
5   5 -0.05166907 TRUE
6   6 -0.68981171   NA
7   7  1.14728375 TRUE
8   8 -0.76820827 TRUE
9   9 -0.45425148   NA
10 10 -0.27369393 TRUE
11 11 -0.12687725 TRUE
12 12 -0.38773276   NA

> df <- data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
> subset(df, !is.na(z))
    x          y    z
1   1 -0.2223889 TRUE
2   2 -0.7398008 TRUE
4   4 -1.6382330 TRUE
5   5  1.2596270 TRUE
7   7  1.0555701 TRUE
8   8 -1.5904792 TRUE
10 10 -0.0942284 TRUE
11 11 -0.3278851 TRUE

And it is also possible to include more rules in the filter.

> subset(df, !is.na(z) & x %% 2 == 0)
    x          y    z
2   2 -0.7398008 TRUE
4   4 -1.6382330 TRUE
8   8 -1.5904792 TRUE
10 10 -0.0942284 TRUE

1

The function subset is not recommended for trivial operations like this: This is a convenience Function intended for use Interactively. For Programming it is Better to use the standard subsetting functions like [, and in particular the non-standard Evaluation of argument subset can have unanticipated Consequences.. It would be simpler to do df[!is.na(df$z),]

– Molx

2015/09/19 at 02:22
2

@Molx you have a point, but for me readability counts a lot and the command subset expresses better goal order. For this reason several new R packages specific for data manipulation (dplyr, magrittr, ...) use the non-standard Evaluation.

– Wilson Freitas

2015/09/19 at 22:29
1

I found this question in the OR discussing this issue of subset http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset

– Wilson Freitas

2015/09/23 at 16:39

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Daniel Falbel • **12,504** points · Answer 1 · 2015-09-18T20:16:27+00:00

Consider the following database:

> dados <- data.frame(
+     var1 = c(NA, 1),
+     var2 = c(1, NA)
+   )
>   
>   dados
  var1 var2
1   NA    1
2    1   NA

You can delete all lines that have at least one Missing using the na.omit:

> na.omit(dados)
[1] var1 var2
<0 linhas> (ou row.names de comprimento 0)

Or delete all lines that have Missing (NA) in any variable:

> dados[!is.na(dados$var1),]
  var1 var2
2    1   NA
> dados[!is.na(dados$var2),]
  var1 var2
1   NA    1

To check whether a vector element is NAin R, we use the function is.na:

> is.na(NA)
[1] TRUE
> is.na(1)
[1] FALSE

To actually remove the missings cases from the data.frame, you need to over-write:

dados <- na.omit(dados)

by Carlos Cinelli • **16,826** points · Answer 2 · 2015-09-19T13:08:03+00:00

You can also use the function filter of dplyr:

Creating sample data (based on Daniel’s data):

dados <- data.frame(var1 = c(NA, 1, 3), var2 = c(1, NA, 3))

Carrying the dplyr:

library(dplyr)

Remove NAs only of the column var1

dados %>% filter(!is.na(var1))

Remove NAs only of the column var2

dados %>% filter(!is.na(var2))

To remove all NAs, use na.omit() same. You can fit into the Piping chain easily:

# remove todos NAs
dados %>% na.omit

by Daniel • **149** points · Answer 3 · 2015-09-18T15:56:52+00:00

5

To remove rows without data in R, you must use the complete case function.().

Pro example in a dataset {x}:

y <- x[complete.cases(x),]
str(y)

complete.cases(x) is a logical vector that will return TRUE to the lines with data and FALSE to lines without data.

I understand, but it’s at base as NA and not empty.

– Wagner Jorge

2015/09/18 at 16:08
You could use the function na.omit() .

– Daniel

2015/09/18 at 16:14
But this does not remove the line that has the NA.

– Wagner Jorge

2015/09/18 at 17:35
@Wagnerjorge Se seu dados fore NA o complete.cases should work. If it didn’t work, add the result of dput(dados) to the question, maybe it’s empty strings or something.

– Molx

2015/09/18 at 17:46

by neves • **5,644** points · Answer 4 · 2021-06-22T17:11:37+00:00

You can also use the function drop_na of tidyr:

Dice:

df_1 <- data.frame(
  x = c(NA, 1:4), 
  y = c(1:4, NA)
)

Codes:

library(tidyr)

drop_na(df_1) # para remover NA de todo o banco de dados

#  x y
#1 1 2
#2 2 3
#3 3 4

and

drop_na(df_1, x) # para remover NA apenas de `x`

#  x  y
#1 1  2
#2 2  3
#3 3  4
#4 4 NA