8
I have a database that has some missings (NA’s), but only one variable (one column), and I need to remove the entire row that has Missing.
8
I have a database that has some missings (NA’s), but only one variable (one column), and I need to remove the entire row that has Missing.
5
Consider the following database:
> dados <- data.frame(
+ var1 = c(NA, 1),
+ var2 = c(1, NA)
+ )
>
> dados
var1 var2
1 NA 1
2 1 NA
You can delete all lines that have at least one Missing using the na.omit
:
> na.omit(dados)
[1] var1 var2
<0 linhas> (ou row.names de comprimento 0)
Or delete all lines that have Missing (NA) in any variable:
> dados[!is.na(dados$var1),]
var1 var2
2 1 NA
> dados[!is.na(dados$var2),]
var1 var2
1 NA 1
To check whether a vector element is NA
in R, we use the function is.na
:
> is.na(NA)
[1] TRUE
> is.na(1)
[1] FALSE
To actually remove the missings cases from the data.frame, you need to over-write:
dados <- na.omit(dados)
5
You can also use the function filter
of dplyr
:
Creating sample data (based on Daniel’s data):
dados <- data.frame(var1 = c(NA, 1, 3), var2 = c(1, NA, 3))
Carrying the dplyr
:
library(dplyr)
Remove NA
s only of the column var1
dados %>% filter(!is.na(var1))
Remove NA
s only of the column var2
dados %>% filter(!is.na(var2))
To remove all NA
s, use na.omit()
same. You can fit into the Piping chain easily:
# remove todos NAs
dados %>% na.omit
5
To remove rows without data in R, you must use the complete case function.().
Pro example in a dataset {x}:
y <- x[complete.cases(x),]
str(y)
complete.cases(x) is a logical vector that will return TRUE to the lines with data and FALSE to lines without data.
3
The function subset
resolve this directly and more clearly, in my opinion.
This can be done in conjunction with the function is.na
the variable of interest being applied.
> data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
x y z
1 1 1.02572367 TRUE
2 2 0.03988014 TRUE
3 3 -0.33269252 NA
4 4 0.05357787 TRUE
5 5 -0.05166907 TRUE
6 6 -0.68981171 NA
7 7 1.14728375 TRUE
8 8 -0.76820827 TRUE
9 9 -0.45425148 NA
10 10 -0.27369393 TRUE
11 11 -0.12687725 TRUE
12 12 -0.38773276 NA
> df <- data.frame(x=1:12, y=rnorm(12), z=c(TRUE, TRUE, NA))
> subset(df, !is.na(z))
x y z
1 1 -0.2223889 TRUE
2 2 -0.7398008 TRUE
4 4 -1.6382330 TRUE
5 5 1.2596270 TRUE
7 7 1.0555701 TRUE
8 8 -1.5904792 TRUE
10 10 -0.0942284 TRUE
11 11 -0.3278851 TRUE
And it is also possible to include more rules in the filter.
> subset(df, !is.na(z) & x %% 2 == 0)
x y z
2 2 -0.7398008 TRUE
4 4 -1.6382330 TRUE
8 8 -1.5904792 TRUE
10 10 -0.0942284 TRUE
The function subset
is not recommended for trivial operations like this: This is a convenience Function intended for use Interactively. For Programming it is Better to use the standard subsetting functions like [, and in particular the non-standard Evaluation of argument subset can have unanticipated Consequences.. It would be simpler to do df[!is.na(df$z),]
@Molx you have a point, but for me readability counts a lot and the command subset
expresses better goal order. For this reason several new R packages specific for data manipulation (dplyr, magrittr, ...) use the non-standard Evaluation.
I found this question in the OR discussing this issue of subset
http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset
0
You can also use the function drop_na
of tidyr
:
Dice:
df_1 <- data.frame(
x = c(NA, 1:4),
y = c(1:4, NA)
)
Codes:
library(tidyr)
drop_na(df_1) # para remover NA de todo o banco de dados
# x y
#1 1 2
#2 2 3
#3 3 4
and
drop_na(df_1, x) # para remover NA apenas de `x`
# x y
#1 1 2
#2 2 3
#3 3 4
#4 4 NA
Browser other questions tagged r
You are not signed in. Login or sign up in order to post.
I understand, but it’s at base as NA and not empty.
– Wagner Jorge
You could use the function na.omit() .
– Daniel
But this does not remove the line that has the NA.
– Wagner Jorge
@Wagnerjorge Se seu dados fore NA o
complete.cases
should work. If it didn’t work, add the result ofdput(dados)
to the question, maybe it’s empty strings or something.– Molx