Why when creating a zoo series object the columns change from Numeric or integer to Character?

Asked

Viewed 41 times

0

Good evening!!! I am confused enough. I will explain the pass-by-step until I get to the doubts that I finally could not solve:

My goal is to create a file where tickers quotes are stored where each row is a trading date.

I downloaded the quotations from Yahoo Finance:

tickers <- c("PETR4.SA", "^BVSP", "^DJI", "^FTSE", "CL=F")
teste.dados <- NULL

for (Ticker in tickers){
    teste.dados <- cbind(teste.dados, getSymbols.yahoo(Ticker, env = NULL, return.class = "xts", index.class = "Date", from = dataInicial.teste, to = dataFinal.teste, thresh.bad.data = 0.75, auto.assign = FALSE)[, 4])
}

dataInicial = Sys.Date() - 10 # para a amostra do exemplo

dataFinal = Sys.Date() # hoje

teste.dados is an xts/zoo object.

> str(teste.dados)
An 'xts' object on 2020-05-08/2020-05-17 containing:
  Data: num [1:8, 1:6] 18.5 NA 18.1 18.1 17.6 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:6] "PETR4.SA.Close" "BVSP.Close" "DJI.Close" "FTSE.Close" ...
  Indexed by objects of class: [Date] TZ: UTC
  xts Attributes:  
List of 2
 $ src    : chr "yahoo"
 $ updated: POSIXct[1:1], format: "2020-05-18 21:24:04"

DOUBT 1: Why the hell is the file I write by code:write.csv(teste.dados, "C:\\Users\\...\\teste.dados_cotacao_serie_xts_zoo.csv"), does not write the time column???? A column " appears with a numerical sequence equal to the order of counting the observations. inserir a descrição da imagem aqui

To get around this, I had the brilliant idea of taking and changing the object from xts/zoo to date.:

teste.dados.df <- data.frame(teste.dados) 
write.csv(teste.dados.df, "C:\\Users\\...\\Mercado_Financeiro\\teste.dados_cotacao_data_frame.csv")

Now the column " " appears and dates appear as a factor. Perfect. inserir a descrição da imagem aqui

DOUBT 2: Why, lightning and thunder, when I create a zoo object, the columns are no longer classified as "one" and go to "Chr"?????????????:

teste.dados.bd <- read.csv("C:\\Users\\...\\teste.dados_cotacao_data_frame.csv", sep = ",")

It’s a data.frame:

> str(teste.dados.bd)
'data.frame':   8 obs. of  7 variables:
 $ X               : Factor w/ 8 levels "2020-05-08","2020-05-10",..: 1 2 3 4 5 6 7 8
 $ PETR4.SA.Close  : num  18.5 NA 18.1 18.1 17.6 ...
 $ BVSP.Close      : int  80263 NA 79065 77872 77772 79011 77557 NA
 $ DJI.Close       : num  24331 NA 24222 23765 23248 ...
 $ FTSE.Close      : num  NA NA 5940 5995 5904 ...
 $ CL.F.Close      : num  24.6 24.7 24.5 25.2 25.7 ...
 $ PETR4.SA.Close.1: num  NA 18.5 NA 18.1 18.1 ...

Now I create a date object, tempo <- as.Date(teste.dados.bd$X, format = "%Y-%m-%d"). And I’m going to index it to the data.frame to create a time series, teste.dados.bd.ts <- zoo(teste.dados.bd, tempo). At this moment, this occurs:

> str(teste.dados.bd.ts)
'zoo' series from 2020-05-08 to 2020-05-17
  Data: chr [1:8, 1:7] "2020-05-08" "2020-05-10" "2020-05-11" "2020-05-12" ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:7] "X" "PETR4.SA.Close" "BVSP.Close" "DJI.Close" ...
  Index:  Date[1:8], format: "2020-05-08" "2020-05-10" "2020-05-11" "2020-05-12" "2020-05-13" ...

I remove the "X" column, because it’s unnecessary now:teste.dados.bd.ts.1 <- subset(teste.dados.bd.ts, select = -c(X)) And this occurs:

> > tail(teste.dados.bd.ts.1)
           PETR4.SA.Close BVSP.Close DJI.Close FTSE.Close CL.F.Close
2020-05-11 18.15          79065      24221.99  5939.7     24.51     
2020-05-12 18.14          77872      23764.78  5994.8     25.23     
2020-05-13 17.59          77772      23247.97  5904.1     25.69     
2020-05-14 17.40          79011      23625.34  5741.5     27.73     
2020-05-15 17.15          77557      23685.42  5799.8     29.65     
2020-05-17 <NA>           <NA>       <NA>      <NA>       30.27     
           PETR4.SA.Close.1
2020-05-11 <NA>
2020-05-12 18.15           
2020-05-13 18.14
2020-05-14 17.59           
2020-05-15 17.40
2020-05-17 17.15

> str(teste.dados.bd.ts.1)
'zoo' series from 2020-05-08 to 2020-05-17
  Data: chr [1:8, 1:6] "18.48" NA "18.15" "18.14" "17.59" "17.40" "17.15" NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:6] "PETR4.SA.Close" "BVSP.Close" "DJI.Close" "FTSE.Close" ...
  Index:  Date[1:8], format: "2020-05-08" "2020-05-10" "2020-05-11" "2020-05-12" "2020-05-13" ...

I did not understand why the columns that were "num" or "integer" in the data frame became "Chr" in the time series???

I couldn’t find where to upload the two files here, but the images depict exactly what they are. In short: I downloaded quotes from Yahoo Finance - it’s an xts object. I want to store it in a csv file, but it doesn’t load the time column. I switch to a data.frame object, which loads the date column, but when I turn that column into the time index of a zoo object, the columns change from one or integer to Chr. I have no idea why.

I appreciate any and all help.

  • Have you tried removing the X column before turning it into a zoo-like object? That’s for problem 2

1 answer

1


Your two questions have to do with the format of the zoo class and the operation of matrices in R. A zoo object is an ordered matrix and with this order, in your case the dates, serving as a new attribute called Index.

Doubt 2

R only understands matrices with one type, so if a matrix is created with different types R always tries to convert in a way so that all values have the same type. In your case R converts all values to Character because there is no way to transform the dates to Numeric. The solution would be not to use these values in the first argument of the function zoo.

library(zoo)

#Criando uma tabela com 1 coluna de data e duas numéricas
tabela_inicial <- data.frame(
  data = as.Date("2004-01-01") + 0:9,
  valor1 = 1:10,
  valor2 = 11:20
)

#Utilizando todo o data frame

tabela_zoo <- zoo(tabela_inicial, tabela_inicial[, 1])

str(tabela_zoo)
#> 'zoo' series from 2004-01-01 to 2004-01-10
#>   Data: chr [1:10, 1:3] "2004-01-01" "2004-01-02" "2004-01-03" "2004-01-04" ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:3] "data" "valor1" "valor2"
#>   Index:  Date[1:10], format: "2004-01-01" "2004-01-02" "2004-01-03" "2004-01-04" "2004-01-05" ...

#Retirando a coluna não numérica os valores se mantém numéricos
tabela_zoo <- zoo(tabela_inicial[,-1], tabela_inicial[, 1])

str(tabela_zoo)
#> 'zoo' series from 2004-01-01 to 2004-01-10
#>   Data: int [1:10, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:2] "valor1" "valor2"
#>   Index:  Date[1:10], format: "2004-01-01" "2004-01-02" "2004-01-03" "2004-01-04" "2004-01-05" ...

#Mostrando que o objeto da classe zoo é uma matriz
is.matrix(tabela_zoo)
#> [1] TRUE

Doubt 1

The function write.csv is written to the data in date.frame or Matrix format. Since the zoo class objects are in the background matrices, the function only considers the matrix dimensions for the table construction, which does not include the new index attribute. The solution in the case is to turn into data.frame before, as you did, so the function write.csv uses the date values, since for data.frames she considers the line names.

# Dimensões da matriz zoo

ncol(tabela_zoo)
#> [1] 2

dimnames(tabela_zoo)
#> [[1]]
#> NULL
#> 
#> [[2]]
#> [1] "valor1" "valor2"
#Matriz só possui duas dimensões, as duas colunas numéricas

dimnames(as.data.frame(tabela_zoo))
#> [[1]]
#>  [1] "2004-01-01" "2004-01-02" "2004-01-03" "2004-01-04" "2004-01-05"
#>  [6] "2004-01-06" "2004-01-07" "2004-01-08" "2004-01-09" "2004-01-10"
#> 
#> [[2]]
#> [1] "valor1" "valor2"
#As datas são os nomes das linhas, que são lidos pela função write.csv

It also has a function called write.zoo in the zoo package that already repairs this of the matrices, but then you would have to touch a little in it for the result to be a csv.

  • Jorge, good afternoon. Thank you very much.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.