How to calculate percentage change with 3 variables in R

Asked

Viewed 1,149 times

4

I have the following data:

library(sidrar)  
Tab1612SojaQde <-get_sidra(1612,variable = 214, period = c("last"=22), 
  geo="State",classific = 'c81', category = list(2713))

head(Tab1612SojaQde)
  Unidade da Federação (Código) Unidade da Federação Ano (Código)  Ano
2                            11             Rondônia         1996 1996
3                            11             Rondônia         1997 1997
4                            11             Rondônia         1998 1998
5                            11             Rondônia         1999 1999
6                            11             Rondônia         2000 2000
7                            11             Rondônia         2001 2001
  Variável (Código)             Variável
2               214 Quantidade produzida
3               214 Quantidade produzida
4               214 Quantidade produzida
5               214 Quantidade produzida
6               214 Quantidade produzida
7               214 Quantidade produzida
  Produto das lavouras temporárias (Código) Produto das lavouras temporárias
2                                      2713                   Soja (em grão)
3                                      2713                   Soja (em grão)
4                                      2713                   Soja (em grão)
5                                      2713                   Soja (em grão)
6                                      2713                   Soja (em grão)
7                                      2713                   Soja (em grão)
  Unidade de Medida (Código) Unidade de Medida Valor
2                       1017         Toneladas  1090
3                       1017         Toneladas  1296
4                       1017         Toneladas 15790
5                       1017         Toneladas 16100
6                       1017         Toneladas 36222
7                       1017         Toneladas 68687

How I calculate the variation of the value from one year to the next, per unit of the federation, because here appears only the "Federal District, but there are 11 states.

1 answer

7


One way to do this with a few lines of code is through the package dplyr. In addition, I recommend studying it and the tidyverse if you want to learn to manipulate data effectively in the R.

I believe the code below solves your problems:

Tab1612SojaQde %>%
  select(`Unidade da Federação`, Ano, Valor) %>%
  group_by(`Unidade da Federação`) %>%
  mutate(Difference = Valor - lag(Valor))

I will describe in items what each command above does:

  • %>% is a command of R called pipe. He is present in the package dplyr. What he does is catch the output of a line and use it with input of the next line

  • select is another function of dplyr. With it, I’m only selecting the columns Unidade da Federação, Ano and Valor, because it seems to me that the other information present in this data set is not important at this time. This step is optional. I put it here because, with fewer columns, we can better visualize the result obtained.

  • group_by is one more function of dplyr. With her, I report to R that it must group the data according to a column of the data set. This grouping will allow us to apply the same function k times when k is the number of levels of the grouped variable. In this case, it will be the units of the federation.

  • finally, the function of mutate, also of dplyr, is used together with the function lag to calculate the change year by year. With mutate, I create a new column in my dataset, called Difference (could be any other name). The function lag takes the result of Valor[n]-Valor[n-1]. Note, in the result below, that there is no variation for the year 1996. Of course, there is no value for 1995, so it is not possible to know what was the variation from 1995 to 1996.

    # A tibble: 594 x 4
    # Groups:   Unidade da Federação [27]
       `Unidade da Federação` Ano    Valor Difference
       <chr>                  <chr>  <dbl>      <dbl>
     1 Rondônia               1996    1090         NA
     2 Rondônia               1997    1296        206
     3 Rondônia               1998   15790      14494
     4 Rondônia               1999   16100        310
     5 Rondônia               2000   36222      20122
     6 Rondônia               2001   68687      32465
     7 Rondônia               2002   83782      15095
     8 Rondônia               2003  126396      42614
     9 Rondônia               2004  163029      36633
    10 Rondônia               2005  233281      70252
    # ... with 584 more rows
    

With this result of the gross difference, it is possible to calculate the percentage difference. Just change a small chunk of the above code by adapting the difference formula:

Tab1612SojaQde %>%
  select(`Unidade da Federação`, Ano, Valor) %>%
  group_by(`Unidade da Federação`) %>%
  mutate(Difference = 100*(Valor - lag(Valor))/lag(Valor))

    # A tibble: 594 x 4
    # Groups:   Unidade da Federação [27]
       `Unidade da Federação` Ano    Valor Difference
       <chr>                  <chr>  <dbl>      <dbl>
     1 Rondônia               1996    1090      NA   
     2 Rondônia               1997    1296      18.9 
     3 Rondônia               1998   15790    1118.  
     4 Rondônia               1999   16100       1.96
     5 Rondônia               2000   36222     125.  
     6 Rondônia               2001   68687      89.6 
     7 Rondônia               2002   83782      22.0 
     8 Rondônia               2003  126396      50.9 
     9 Rondônia               2004  163029      29.0 
    10 Rondônia               2005  233281      43.1 
    # ... with 584 more rows
  • It worked perfectly @Marcus Nunes. I’m trying to learn how to manipulate data on R. Thank you.

  • It’s great to know that my response has helped you in some way. So consider vote and accept the answer, so that in the future other people who experience the same problem have a reference to solve it.

  • Warning: previously the last line of the code was mutate(Difference = 100*(Valor - lag(Valor))/Valor). This is wrong. The correct code is mutate(Difference = 100*(Valor - lag(Valor))/lag(Valor)), as I put in the current issue of the post.

  • 1

    Yes, I did > mutate(Difference = (Value/lag(Value)-1)*100) did work tb

Browser other questions tagged

You are not signed in. Login or sign up in order to post.