Warning message: Those introduced by coercion

Asked

Viewed 1,362 times

-1

I’m having trouble formatting "Chr" to "num".

When I load the table to column VA.VG comes as “chr”, when I execute the change the error appears:

#carregar:
> t2019_12  = read.csv ("C:/Users/joao.moura/Desktop/Unificador/Apogi/12_2019 APOGI.csv", fill =F, dec=",", header = T, sep = ";", stringsAsFactors = F)

##Convertendo em "num":
> t2019_12$VA.VG <- as.numeric(t2019_12$VA.VG, rm.na = T)

Warning message:
NAs introduzidos por coerção 
  • Welcome to Stackoverflow! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Please take a look at this link and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.

  • Can you please, edit the question with the departure of dput(t2019_12$VA.VG) or, if the base is too large, dput(head(t2019_12$VA.VG, 20))? And yet, rm.na it’s never for sure, it might be na.rm but in the case of the function as.numeric that argument does not exist.

  • Note: read.csv2("C:/Users/joao.moura/Desktop/Unificador/Apogi/12_2019 APOGI.csv", stringsAsFactors = FALSE) is simpler and is exactly the same instruction. read.table with certain values of the arguments gives read.csv, read.csv2, etc. See help("read.table") for details.

2 answers

0

Note: This is not an answer, it is a broad comment, trying to explain a possible way to solve the problem.

First a dataset with a class column "character" to transform into "numeric".

t2019_12 <- data.frame(VA.VG = c(1, pi, "NA", "12.345.6"), stringsAsFactors = FALSE)

Now, when the question error, it is best to use a temporary vector where to have values NA, and then determine what’s wrong with these vector elements.

tmp <- as.numeric(t2019_12$VA.VG)
#Warning message:
#NAs introduced by coercion

If they were introduced NA's, where are they?

na <- which(is.na(tmp))
na
#[1] 3 4

And what values were in the original vector? This is the fundamental step, only if we know exactly what the problem is that we can solve it.

t2019_12$VA.VG[na]
#[1] "NA"        "12.345.6"

Well, in the first case it seems that it is even a missing value, in the second case it seems to have been dealt with dec="," a value that had dots separating thousands. The first should be left as is but the second comes from an error that may be corrected.

t2019_12$VA.VG[4] <- sub("\\.", "", t2019_12$VA.VG[4])
t2019_12$VA.VG <- as.numeric(t2019_12$VA.VG)
#Warning message:
#NAs introduced by coercion 

And the basis is now

t2019_12
#         VA.VG
#1     1.000000
#2     3.141593
#3           NA
#4 12345.600000

Final cleaning, we no longer need tmp.

rm(tmp)

0

Browser other questions tagged

You are not signed in. Login or sign up in order to post.