Select vector only with dates shared by all matrices in the list

Asked

Viewed 49 times

2

I have a database with a lot of stock market assets. After running several filters to get only assets with imposed number of observations, I arrive at a large data frame and a list, as will be exposed in the following code

# Pacotes necessários ---------------------------------------------------
library(BatchGetSymbols)
library(quantmod)
library(tidyverse)

lista$SUZB3.SA

# Inputs necessários ------------------------------------------------------
#Definição do número de observações mínimo que se deseja obter
#aconselho olhar o arquivo base01, ver um número de observações máximo para o período para maioria dos ativos
nmin<-2200

#data inicial
first.date <- as.Date("2010-01-01")

#data final
last.date <- Sys.Date()

#frequencia das observações
freq.data <- 'daily'


#Tickers para o teste inicial
tickers<-c("^BVSP","^GSPC","BRL=X","ABCB4.SA","EALT3.SA","EALT4.SA","ADHM3.SA",
           "TIET3.SA","TIET4.SA","AFLT3.SA",
           "BRGE3.SA","BRGE5.SA","BRGE7.SA","BRGE8.SA","BRGE11.SA",
           "BRGE12.SA","CRIV3.SA","CRIV4.SA","RPAD3.SA","RPAD5.SA","RPAD6.SA","BRIV3.SA","BRIV4.SA","ALSC3.SA",
           "AALR3.SA","ALPA3.SA","ALPA4.SA",
           "ALUP3.SA","ALUP4.SA","BAZA3.SA","ABEV3.SA",
           "CBEE3.SA","ANIM3.SA","ARZZ3.SA","ATOM3.SA",
           "AZEV3.SA","AZEV4.SA","AZUL4.SA","BTOW3.SA","BAHI3.SA","BPAN4.SA","BGIP3.SA","BGIP4.SA","BEES3.SA",
           "BEES4.SA","BRSR3.SA","BRSR5.SA","BRSR6.SA","BDLL4.SA","BTTL3.SA",
           "BALM4.SA","BBSE3.SA","BMKS3.SA","BIOM3.SA",
           "BSEV3.SA","BOBR4.SA","BBRK3.SA","BRML3.SA",
           "BPHA3.SA","BRPR3.SA","BBDC3.SA","BBDC4.SA","BRAP3.SA","BRAP4.SA","BBAS3.SA","AGRO3.SA",
           "BRKM3.SA","BRKM5.SA","BRKM6.SA","BRFS3.SA",
           "BPAC3.SA","BPAC5.SA","CAMB4.SA","CAML3.SA",
           "CRFB3.SA","CCRO3.SA","CCXC3.SA","CEBR5.SA","CEBR6.SA","CEDO3.SA","CEDO4.SA",
           "EEEL3.SA","EEEL4.SA","CEGR3.SA","CLSC3.SA","CLSC4.SA","CELP3.SA","CELP5.SA",
           "CELP6.SA","CELP7.SA","CEPE5.SA","CEPE6.SA","RANI3.SA","RANI4.SA",
           "MAPT4.SA","CMIG3.SA","CMIG4.SA","CESP3.SA","CESP5.SA","CESP6.SA","HGTX3.SA",
           "CIEL3.SA","CEEB3.SA",
           "COCE3.SA","COCE5.SA","CGAS3.SA","CGAS5.SA",
           "CALI4.SA","CSMG3.SA","CPLE3.SA","CPLE6.SA","CORR4.SA","RLOG3.SA",
           "CSAN3.SA","CSRN3.SA","CSRN5.SA","CTNM3.SA","CTNM4.SA","CPFE3.SA","CPRE3.SA",
           "CRDE3.SA","CRPG5.SA","CRPG6.SA","CARD3.SA","CVCB3.SA","CCPR3.SA","CYRE3.SA",
           "DASA3.SA","PNVL3.SA","PNVL4.SA","DIRR3.SA","DOHL3.SA","DOHL4.SA","DMMO3.SA","DTCY3.SA",
           "DTEX3.SA","ECOR3.SA","ELEK3.SA","ELEK4.SA","EKTR4.SA","ELET3.SA",
           "ELET5.SA","ELET6.SA","LIPR3.SA","ELPL3.SA","EMAE4.SA","EMBR3.SA","ECPR3.SA",
           "ENBR3.SA","ENMT3.SA","ENMT4.SA","ENGI3.SA","ENGI4.SA","ENEV3.SA","EGIE3.SA","EQTL3.SA","ESTC3.SA",
           "ESTR4.SA","ETER3.SA","EUCA3.SA","EUCA4.SA","EVEN3.SA","BAUH4.SA","EZTC3.SA",
           "FHER3.SA","FESA3.SA","FESA4.SA",
           "FLRY3.SA","FJTA3.SA",
           "FJTA4.SA","FRAS3.SA","GFSA3.SA","GSHP3.SA",
           "GEPA3.SA","GEPA4.SA","GOAU3.SA","GOAU4.SA","GGBR3.SA","GGBR4.SA","GOLL4.SA",
           "GPCP3.SA","CGRA3.SA","CGRA4.SA","GRND3.SA","GUAR3.SA","HBTS5.SA",
           "HAGA3.SA","HAGA4.SA","HBOR3.SA","HETA4.SA",
           "HOOT4.SA","HYPE3.SA",
           "IDNT3.SA","IGBR3.SA","IGTA3.SA","PARD3.SA",
           "MEAL3.SA","ROMI3.SA","IDVL3.SA","IDVL4.SA",
           "INEP3.SA","INEP4.SA","MYPK3.SA","IRBR3.SA",
           "ITSA3.SA","ITSA4.SA","ITUB3.SA","ITUB4.SA","JBDU3.SA",
           "JBDU4.SA","JBSS3.SA","JHSF3.SA","JFEN3.SA","JOPA3.SA","JSLG3.SA",
           "CTKA3.SA","CTKA4.SA","KEPL3.SA","KLBN3.SA","KLBN4.SA","KROT3.SA",
           "LLIS3.SA","LIGT3.SA","LINX3.SA","RENT3.SA","LCAM3.SA",
           "LOGN3.SA","LAME3.SA","LAME4.SA","AMAR3.SA","LREN3.SA","LPSB3.SA",
           "LUPA3.SA","MDIA3.SA","MGLU3.SA","MAGG3.SA","MGEL4.SA","POMO3.SA","POMO4.SA",
           "MRFG3.SA","MSPA3.SA","MSPA4.SA","MEND5.SA","MEND6.SA",
           "BMEB3.SA","BMEB4.SA","MERC4.SA","BMIN3.SA","BMIN4.SA","MTIG4.SA","LEVE3.SA",
           "FRIO3.SA","MTSA3.SA","MTSA4.SA","MILS3.SA","BEEF3.SA",
           "MNPR3.SA","MMXM3.SA","MOAR3.SA","MOVI3.SA","MRSA3B.SA","MRSA5B.SA","MRSA6B.SA",
           "MRVE3.SA","MULT3.SA","MPLU3.SA","NAFG3.SA","NAFG4.SA","NATU3.SA",
           "BNBR3.SA","NORD3.SA","NUTR3.SA","ODPV3.SA","OIBR3.SA","OIBR4.SA","OMGE3.SA",
           "OSXB3.SA","OFSA3.SA","PCAR3.SA","PCAR4.SA","PATI3.SA","PATI4.SA","PEAB3.SA","PEAB4.SA",
           "PMAM3.SA","PDGR3.SA","RPMG3.SA","BRDT3.SA","PETR3.SA","PETR4.SA","PRIO3.SA","PTNT3.SA",
           "PTNT4.SA","PINE4.SA","PLAS3.SA","FRTA3.SA","PSSA3.SA","PTBL3.SA",
           "POSI3.SA","PFRM3.SA","QGEP3.SA","QUAL3.SA","RADL3.SA","RAPT3.SA",
           "RAPT4.SA","RCSL3.SA","RCSL4.SA","REDE3.SA","RNEW3.SA","RNEW4.SA","RSUL4.SA","RDNI3.SA",
           "RSID3.SA","RAIL3.SA","SBSP3.SA","SAPR3.SA","SAPR4.SA","SNSY3.SA","SNSY5.SA",
           "SANB3.SA","SANB4.SA","CTSA3.SA","CTSA4.SA","STBP3.SA","SCAR3.SA","SMTO3.SA",
           "SLED3.SA","SLED4.SA","SHUL4.SA","CSAB3.SA","CSAB4.SA",
           "SNSL3.SA","SEER3.SA","CSNA3.SA","SSBR3.SA","SLCE3.SA","SMLS3.SA","SEDU3.SA",
           "SOND6.SA","SPRI3.SA","SPRI5.SA","SGPS3.SA","AHEB3.SA",
           "SULA3.SA","SULA4.SA",
           "SUZB3.SA","TAEE3.SA","TAEE4.SA","TRPN3.SA","TECN3.SA","TCSA3.SA","TCNO3.SA","TCNO4.SA",
           "TOYB3.SA","TOYB4.SA","TGMA3.SA","TEKA4.SA","TKNO4.SA","TELB3.SA","TELB4.SA","VIVT3.SA","VIVT4.SA",
           "TEND3.SA","TESA3.SA","TXRX4.SA","TIMP3.SA","SHOW3.SA",
           "TOTS3.SA","TRPL3.SA","TRPL4.SA","LUXM4.SA","TRIS3.SA","TPIS3.SA","TUPY3.SA",
           "UGPA3.SA","UCAS3.SA","UNIP3.SA","UNIP5.SA","UNIP6.SA",
           "USIM3.SA","USIM5.SA","USIM6.SA","VALE3.SA","VLID3.SA",
           "VVAR3.SA","VIVR3.SA","VULC3.SA","DISB34.SA","WEGE3.SA",
           "MWET3.SA","MWET4.SA","WHRL3.SA","WHRL4.SA","WIZS3.SA")



# Importando ativos -------------------------------------------------------
ativos <- BatchGetSymbols(tickers = tickers, 
                          first.date = first.date,
                          last.date = last.date, 
                          freq.data = freq.data,
                          cache.folder = file.path(tempdir(), 
                                                   'BGS_Cache') ) 


#Eliminar Tickers que não foram encontrados nenhum dado
base01<-as.data.frame(ativos$df.control)

#vetor vazio para receber tickers corrigidos
tickerscorrigidos<-NULL

#vetor com tickers corrigidos, contendo apenas aqueles que tem o número mínimo de observações desejado
for(i in 1:nrow(base01)){
  if(base01[i,4] >= nmin){
    tickerscorrigidos<-c(tickerscorrigidos,as.character(base01[i,1]))
  }
}

tickerscorrigidos
rm(ativos)

# Importando ativos NOVAMENTE, PORÉM AGORA SÓ OS QUE TEM NÚMERO DE OBSERVAÇÕES MÍNIMO DESEJADO --------------------------------------------
ativos <- BatchGetSymbols(tickers = tickerscorrigidos, 
                          first.date = first.date,
                          last.date = last.date, 
                          freq.data = freq.data,
                          cache.folder = file.path(tempdir(), 
                                                   'BGS_Cache') )
#cabeçalho para base de dados
cabecalho<-c("abertura","máxima", "mínima", "fechamento", "volume", "ajuste", "data", "ativo")

#inserir cabecalho
colnames(ativos$df.tickers)<-cabecalho



# Contador de obsevações válidas ------------------------------------------
#transformando em data.frame
ativos<-as.data.frame(ativos$df.tickers)
#Eliminando valores faltantes na base de dados
ativos<-na.omit(ativos)

#Contar quantas observações por ativo
nobservacao<-matrix(nrow = NROW(tickerscorrigidos), ncol = 2,0)
nobservacao[,1]<-tickerscorrigidos

for(i in 1:NROW(tickerscorrigidos)){

  temp<-ativos[ativos[,8] == tickerscorrigidos[i], ]
  nobservacao[i,2]<-nrow(temp)

}

nobservacao<-nobservacao[nobservacao[,2] >= nmin, ]

tickerscorrigidos<-nobservacao[,1]


# Importando ativos NOVAMENTE 2, PORÉM AGORA SÓ OS QUE TEM NÚMERO DE OBSERVAÇÕES MÍNIMO DESEJADO --------------------------------------------
ativos <- BatchGetSymbols(tickers = tickerscorrigidos, 
                          first.date = first.date,
                          last.date = last.date, 
                          freq.data = freq.data,
                          cache.folder = file.path(tempdir(), 
                                                   'BGS_Cache') )
#inserir cabecalho
colnames(ativos$df.tickers)<-cabecalho 

#transformando em data.frame
ativos<-as.data.frame(ativos$df.tickers)

#eliminando colunas desnecessárias
ativos<-ativos[,-10:-9]

# Validação dos Dados ---------------------------------------------
#Eliminar valores NA
ativos<-na.omit(ativos)

# Cálculos ----------------------------------------------------------------

#Variação diária
#criando uma matriz para calculo da variação diária
variacao<-matrix( ncol = 1, nrow = nrow(ativos), 0)

#______________________________________________________
#calculando variação percentual diária
for(i in 1:(nrow(ativos)-1)){
  #se for o mesmo ativo, calcular a variação percentual
  if(ativos[i+1,8] == ativos[i,8]){
    variacao[i+1,1] <-  (ativos$fechamento[i+1]-ativos$fechamento[i])/ativos$fechamento[i]*100
    #se não for, calcular zero
  }else{ variacao[i+1,1] <- 0 }
}

#inserindo vetor gerado na base de dados
ativos<-cbind(ativos,variacao)
rm(variacao)

#____________________________________________________________________________

#Statuso de Sobe ou Desce

#crianto matriz para receber status
status<-matrix(ncol = 1, nrow = nrow(ativos),0)

#Inserindo variável categórica de sobe/desce
for(j in 1:nrow(ativos)){
  #se variação for positiva Sobe
  if(ativos[j,9]>0){status[j]<-"Sobe"}
  #Se variação for negativa Desce
  if(ativos[j,9]<=0){status[j]<-"Desce"}
}

#Inserindo vetor status na base de dados
ativos<-cbind(ativos,status)
#removendo objeto desnecessário
rm(status)

#___________________________________________________________________________
#variação acumulada
#vetor para variação acumulada
acumulado<-matrix( ncol = 1, nrow = nrow(ativos), 0)

#calculando variação acumulada
for(i in 1:(nrow(ativos))){
  #se for o mesmo ativo, calcular a variação percentual acumulada
  if(ativos[i+1,8] == ativos[i,8]){
    acumulado[i+1,1] <-acumulado[i,1]+ativos$variacao[i+1]
    #se não for, calcular zero
  }else{acumulado[i+1,1] <- 0 }
}

ativos<-cbind(ativos,acumulado)
rm(acumulado)

# Lista com data.frame dos ativos -----------------------------------------
#tem que colocar os ativos em lista
#função split cria uma lista separada pelo criterio de nome de ativo
lista<-split(ativos, ativos$ativo)

After running this code will have the result described above. The list of assets is large, but it is necessary to solve the problem.

I now want to get an array where I have only the dates on which all assets share, that is, a database only with dates that all actions have observation .

I tried to do it this way:

# Filtro para data em que todos os ativos possuem observação --------------
data.igual<- ativos %>% count(ativos$data) %>% filter(n == 
NROW(tickerscorrigidos))

#Objeto com todas as datas comuns a todos os ativos
data.igual<-as.data.frame.Date(data.igual$`ativos$data`)
NROW(data.igual)

But it is not working well, because it is lost more than half of the observations, fact that does not make sense because there are several previous filters in the code so that when arriving in this part the file only contains actions with number of very similar observations, where the number was determined at the beginning of the code.

Realize that if you do

NROW(data.igual)

the result will be less than 1000, and it should be more than 2000, so many dates are being deleted erroneously.

Therefore, I would like help to arrive at a vector that contains only dates that all actions share.

1 answer

3


Your code is correct. I came to another way to make the intersection between the data and got the same 841 observations that you got:

datas.comuns <- as.Date(Reduce(intersect, lapply(lista, `[[`, "data")))
length(datas.comuns)
## 841

I believe that the result is correct, however strange it may seem to you. If you are doubting the answer, I suggest you manually search the dates that nay appear in the vector data.comuns that I created above, and see why at least one action is not reporting anything on this day.

For example, February 18, 2019 was a business day, Bovespa closed down, but it is a day that is not registered for all your actions. As there are 167 assets, it will take a little work, but it is a start. Who knows so it will be possible for you to find out what is going on.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.