Changing Missing data <NA> by "something else"

Asked

Viewed 1,449 times

6

When I import a file .sav i don’t want the <NA> character to appear in my mydata1. I would like instead of NA to appear, for example, "Something else".

mydata = read.spss('mydata.sav', use.value.labels = TRUE, to.data.frame = TRUE,
               max.value.labels = Inf, trim.factor.names = FALSE,
               trim_values = FALSE, reencode = "UTF-8")



(mydata1<- mydata[10:20,25:31])
   Q_16_O3 Q_16_O4 Q_16_O5 Q_16_O6 Q_16_O7 Q_16_O8 Q_16_O9
10    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
11    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
12    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
13    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
14    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
15    Trem    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
16    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
17    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
18    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
19    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
20    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>

Putting the str and the dput

str(mydata1)

'data.frame':   11 obs. of  7 variables:

 $ Q_16_O3: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA 4 NA NA NA NA ...

 $ Q_16_O4: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA NA NA NA NA NA ...

 $ Q_16_O5: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA NA NA NA NA NA ...

 $ Q_16_O6: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA NA NA NA NA NA ...

 $ Q_16_O7: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA NA NA NA NA NA ...

 $ Q_16_O8: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA NA NA NA NA NA ...

 $ Q_16_O9: Factor w/ 10 levels "Ônibus","Vans",..: NA NA NA NA NA NA NA NA NA NA ...


dput(head(mydata1))


    structure(list(Q_16_O3 = structure(c(NA, NA, NA, NA, NA, 4L), .Label = c("Ônibus", 
    "Vans", "Metrô", "Trem", "BRT", "Barca", "Catamarã", "Fretados", 
    "VLT/Monotrilho", "Lotação (micro-ônibus especial)"), class = "factor"), 
    Q_16_O4 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("Ônibus", 
    "Vans", "Metrô", "Trem", "BRT", "Barca", "Catamarã", "Fretados", 
    "VLT/Monotrilho", "Lotação (micro-ônibus especial)"), class = "factor"), 
    Q_16_O5 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("Ônibus", 
    "Vans", "Metrô", "Trem", "BRT", "Barca", "Catamarã", "Fretados", 
    "VLT/Monotrilho", "Lotação (micro-ônibus especial)"), class = "factor"), 
    Q_16_O6 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("Ônibus", 
    "Vans", "Metrô", "Trem", "BRT", "Barca", "Catamarã", "Fretados", 
    "VLT/Monotrilho", "Lotação (micro-ônibus especial)"), class = "factor"), 
    Q_16_O7 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("Ônibus", 
    "Vans", "Metrô", "Trem", "BRT", "Barca", "Catamarã", "Fretados", 
    "VLT/Monotrilho", "Lotação (micro-ônibus especial)"), class = "factor"), 
    Q_16_O8 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("Ônibus", 
    "Vans", "Metrô", "Trem", "BRT", "Barca", "Catamarã", "Fretados", 
    "VLT/Monotrilho", "Lotação (micro-ônibus especial)"), class = "factor"), 
    Q_16_O9 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = c("Ônibus", 
    "Vans", "Metrô", "Trem", "BRT", "Barca", "Catamarã", "Fretados", 
    "VLT/Monotrilho", "Lotação (microônibusespecial)"),class="factor")),.Names=c("Q_16_O3", 
    "Q_16_O4", "Q_16_O5", "Q_16_O6", "Q_16_O7", "Q_16_O8", "Q_16_O9"
     ), row.names = 10:15, class = "data.frame")
  • 2

    What technology you are using. Try to use the tags to indicate better what you are working on.

  • 1

    I don’t know much of r, but it seems if you do > mydata[is.na(mydata)] <- 0, in this case replaces <NA> for 0, see if it helps: http://stackoverflow.com/questions/18562680/replacing-nas-with-0s-in-r-dataframe

  • Doesn’t work...

  • In fact, if it were only NA instead of <NA> it would work

  • 1

    Vasco, you could put the result of str(mydata) question? (and if possible the result of the dput(head(mydata)) It would make it easier to understand what’s going on.

  • 1

    Hi Carlos Cinelli. See if it looks good... Thank you

  • Cinelli, actually I want to do a "freq" in one of the rows of my new "mydata1"

  • 1

    Vasco, I think I understand your problem, I answered there, see if you answer, abs

Show 3 more comments

1 answer

4


The pattern of read.spss is to transform categorical variables into factors (categories, factors).

When a variable is a factor, she only accepts what you define as the levels for her. So when you try to make a mydata[is.na(mydata)]<- "Outra coisa" the R will give you the following message:

 Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = "outra coisa") :
  invalid factor level, NA generated

I mean, he’s warning you that there is no level "Something else" and so you’re putting NA in place.

The first thing you have to keep in mind is this: why are you going to replace an NA with another category? In general, the NA means that the observation does not exist, so perhaps the most appropriate would be to leave it as NA, because the R can handle this kind of thing.

For example, if you want to make a frequency table from the first column of mydata1, you can use the command table and it will omit the NA (here I am using the data you put in the dput(head(mydata1)) only the first 6 observations):

table(mydata1[,1])
                         Ônibus                            Vans                           Metrô 
                              0                               0                               0 
                           Trem                             BRT                           Barca 
                              1                               0                               0 
                       Catamarã                        Fretados                  VLT/Monotrilho 
                              0                               0                               0 
Lotação (micro-ônibus especial) 
                              0 

If you want him to count the NA’s as well, just put the argument useNA="always":

 table(mydata1[,1], useNA="always")

                          Ônibus                            Vans                           Metrô                            Trem                             BRT 
                              0                               0                               0                               1                               0 
                          Barca                        Catamarã                        Fretados                  VLT/Monotrilho Lotação (micro-ônibus especial) 
                              0                               0                               0                               0                               0 
                           <NA> 
                              5

Note that a <NA> field has now appeared with the 5 observations that are NA.

But, assuming you really want to change the NA to something else, then I think the easiest way would be the following. First transform the factors of your data.frame in characters and then replace the NA with something else.

For example, with the command below you select all columns of mydata1 who are factors and turns them into character:

mydata1[sapply(mydata1, is.factor)] <- lapply(mydata1[sapply(mydata1, is.factor)], as.character)

Now you can send one mydata1[is.na(mydata1)] <- "Outra coisa" that will not generate error message.

Making the table from the first column, notice that we now have 5 "Something else" and 1 "Train":

table(mydata1[,1])
Outra coisa        Trem 
          5           1 
  • sensational ...I caught up to find this answer

Browser other questions tagged

You are not signed in. Login or sign up in order to post.