Inconsistent numerical format

Asked

Viewed 696 times

3

I am an experienced SAS programmer but a beginner in R. I am working with Rstudio Version 0.99.903 - 2009-2016 Rstudio, Inc. and Windows 8. I have the following question:

  1. The file "a_us" has 4 numeric fields and 2 alphanumerics as follows:

    str(a_us) //command to display the file structure

'data.frame': 1039992 obs. of  7 variables:  
$ 'dsSisOriginario'             : chr  "Construcard" "Construcard" "Construcard" "Construcard" ...  
$ 'nrContrato'                  : chr "000002160000023630," "000002160000116565," "000002160000225267," ...  
$ 'vlCredInadimplenciaLancadoCa': num  9570 4455 6791 2678 4483 ...  
$ 'dtCredInadimplenciaEntradaCa': chr "03/11/2002" "17/10/2004" "25/03/2007" "15/12/2006" ...  
$ 'vlCredFcvsCessao'            : num  271 216 329 130 217 ...  
$ PercentPagoCarteira           : num  0.0283 0.0484 0.0484 0.0484 ...  
$ QtdCredDiasAtraso             : int  5110 4396 3507 3607 2768 2407 2640 ...
  1. Using Summary(a_us) the result comes out as expected, that is, the statistics for the numerical variables are perfect.

  2. However, when I try to take, for example, the average (Mean()) or any other quantitative procedure, such as hist(), of these same numerical variables ('vlCredInmplicationLancadoCa', 'vlCredFcvsCessao', Percentpagocarteira, Qtdcreddiasatraso), it works only for the variables (Percentpagocarteira, Qtdcreddiasatraso), para as outras ('vlCredInadimplenciaLancadoCa', 'vlCredFcvsCessao'), I get the message:

> mean(a_us$'vlCredFcvsCessao')
>     [1] NA
>     Warning message:
>     In mean.default(a_us$vlCredFcvsCessao) :
>       argumento não é numérico nem lógico: retornando NA

Although the variable is numerical, I get this error message!

Can someone give me a hint of what’s going on and how to fix it?

  • I’m not the one who denied your question, but her formatting is very confusing and rather difficult to fix. With this, it is difficult to answer your question because it compromises the understanding of the data format, which is fundamental for someone to answer it. However, since I don’t know anything about R, I don’t have much to help.

2 answers

1


The way your data was imported, some columns got quotes in the name. This prevents the operator $ to work the way you expected. The best way to fix it is to re-import the base. But it is also possible to refer to the column this way:

mean(a_us$`'vlCredFcvsCessao'`)

Note the accent that is involving the name of the column.

Look at this simple example:

> df <- dplyr::data_frame("'colunacomaspas'" = 1, colunasemaspas = 1)
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   1 obs. of  2 variables:
 $ 'colunacomaspas': num 1
 $ colunasemaspas  : num 1
> mean(df$`'colunacomaspas'`)
[1] 1
> mean(df$'colunacomaspas')
[1] NA
Warning messages:
1: Unknown column 'colunacomaspas' 
2: In mean.default(df$colunacomaspas) :
  argument is not numeric or logical: returning NA

See that the str shows differently the name of the columns with quotation marks and without quotation marks in your example as well.

Another way to correct would be to rename the columns by removing these quotes. Example:

> names(df) <- gsub("'", "", names(df))
> mean(df$colunacomaspas)
[1] 1
  • 1

    Thank you Daniel, that’s right. I imported from MS Access, but I have no idea why he put those quotes. Anyway, the problem has been solved. Thanks!

  • @Marcelo glad it worked out! These quotes are hard to find anyway...

0

I will follow the complaint pattern of a good number of collaborators of the English version :). It is critical that we can try to reproduce the problem that you put a piece of routine R that can be easily copied and pasted to other environments and then compared by people who try to help you. Also important to mention the version of R, whether it is using Rstudio or not and the version of the operating system

As there is no example of data frame here is a small example of possibilities to reference the columns of the data frame and how you can send more details about your problem, is not an answer yet. " Apparently" everything is perfect.

df <- data.frame(a= seq(1:10),b=seq(11:20))
summary(df)

# testar a classe de uma coluna
class(df$a)

mean(df$a)
mean(df[,'a'])
mean(df$'a')
  • Thanks jcarlos, I made an issue of the question and I think it should be clearer. I’m learning how to work the text here in this environment. Anyway, my problem is that some numerical variables are not recognized as numerical. You’ve been through this problem?

  • Hi @Matcio Daniel already answered :). Be sure to mark his reply as accepted. It is another procedure that we often forget when beginners in stackoverflow.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.