Update data and add new data

Asked

Viewed 87 times

1

I am looking for data from IBGE (SIDRA), I need to add the last data and update the 3 previous periods, without changing or messing with the old data.

Follows the code:

library(sidrar) 

df <- get_sidra (api = "/t/1846/n1/all/v/all/p/all/c11255/90707/d/v585%200")
#Selecionando as colunas que me interessa
df1 <- df[,c(7,10,13)]

This returns me the historical series of GDP

df1 <- Trimestre (Código)           Setores e subsetores           Valor
        199601                      PIB a preço de mercado         189323
        ...                             ...                         ...
        201903                      PIB a preço de mercado         1842700
        201904                      PIB a preço de mercado         1842740

And saved this file on my computer

read.xlsx("df1.xlsx", sheetName = "PIB", header = TRUE)

When a new data comes out, 202001, IBGE revises 3 previous data (201902, 201903, 201904) and publishes 202001,

then I just run the code to fetch the last 4 data, to avoid having to fetch the complete series

df2 <- get_sidra (api = "/t/1846/n1/all/v/all/p/last%203/c11255/90707/d/v585%200")

After having my df2, I want to update the data for 201902, 201903 and 201904, and add the data for 202001

To add only the new data, I was using :

wb <- loadWorkbook("df1.xlsx")
addDataFrame(df2,getSheets(wb)$df2,  col.names = FALSE, row.names = FALSE, startRow = nrow(read.xlsx("df1.xlsx", sheetName = "PIB"))+2)
saveWorkbook(wb,"df1.xlsx")

But this function doesn’t help me update my previous information. Could someone tell me how I can do this?

2 answers

0

I don’t know a function to update directly in excel but you can build a data.frame updated with a function, and then save it in excel.

library(readr) #só para ler as tabelas
df1 <- read_table("Trimestre_(Código)           Setores_e_subsetores           Valor
                  199601                      PIB_a...         189323
                  199602                      PIB_a...         189324
                  201902                      PIB_a...         1845626
                  201903                      PIB_a...         1842700
                  201904                      PIB_a...         1842740")

df2 <- read_table("Trimestre_(Código)           Setores_e_subsetores           Valor
                  201902                      PIB_a...         0
                  201903                      PIB_a...         1
                  201904                      PIB_a...         2
                  202001                      PIB_a...         123456")



update_ultimos_valores <- function(tabela_antiga, tabela_nova){
  coluna_codigo <- names(tabela_antiga)[1]
  indice_entradas_novas <- 1:3

  codigos_para_atualizar <- tabela_nova[[coluna_codigo]][indice_entradas_novas]

  tabela_antiga[tabela_antiga[[coluna_codigo]] %in% codigos_para_atualizar,] <- 
    tabela_nova[indice_entradas_novas,]

  tabela_atualizada <- rbind(tabela_antiga, tabela_nova[4,])
  return(tabela_atualizada)
}


df_atualizado <- update_ultimos_valores(df1, df2)

df_atualizado
#> # A tibble: 6 x 3
#>   `Trimestre_(Código)` Setores_e_subsetores  Valor
#>                  <dbl> <chr>                 <dbl>
#> 1               199601 PIB_a...             189323
#> 2               199602 PIB_a...             189324
#> 3               201902 PIB_a...                  0
#> 4               201903 PIB_a...                  1
#> 5               201904 PIB_a...                  2
#> 6               202001 PIB_a...             123456

#Salvar com write_xlsx ou write.xlsx do mesmo modo que df1 foi salvo

Created on 2020-05-27 by the reprex package (v0.3.0)

  • Hi, Jorge! Thanks for the help, I also managed to play here!

0


There is no option for Excel files (as well as for flat data files such as CSV) to update specific lines, just add new ones. The solution is to load the entire saved spreadsheet, delete the lines to be updated, attach the new data and rewrite the spreadsheet in the file.

# Dados de exemplo
df1 <- data.frame(Trimestre = c(201801:201804, 201901:201904), Valor = 1:8)
df2 <- data.frame(Trimestre = c(201902:201904, 202001), Valor = 11:14)

> rbind(df1[!df1$Trimestre %in% df2$Trimestre,], df2)
  Trimestre Valor
1    201801     1
2    201802     2
3    201803     3
4    201804     4
5    201901     5
6    201902    11
7    201903    12
8    201904    13
9    202001    14

If for any reason you need to update specific lines without loading and rewriting the whole set, you will have to save it in a database.

  • Hello, thank you so much for the help, it worked!!! your suggestion was even simpler than what you were trying. Thanks!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.