How to consolidate data set with dplyr?

Asked

Viewed 47 times

1

Hello,

I have a data set sinan18 that contains information about food outbreaks in Brazil. Formatting the original table - which is huge with dplyr, I arrived at the following result:

> str(sinan18)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   13163 obs. of  8 variables:
 $ estado  : chr  "DF" "RS" "RS" "RS" ...
 $ regiao  : Factor w/ 5 levels "Centro-Oeste",..: 1 5 5 5 5 5 5 5 5 5 ...
 $ data    : chr  "05/05/2000" "22/05/2000" "22/05/2000" "22/05/2000" ...
 $ ano     : int  2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
 $ agente1 : chr  "Ignorado" "Salmonella spp." "Salmonella spp." "Ignorado" ...
 $ agente2 : chr  "" "" "" "" ...
 $ alimento: chr  "Alimentos mistos" "Ovos e produtos à base de ovos" "Ovos e produtos à base de ovos" "Ovos e produtos à base de ovos" ...
 $ local   : Factor w/ 13 levels "Asilo","Casos dispersos em mais de um município",..: 6 12 12 12 12 12 10 12 12 12 .

sinan18 %>%
  group_by(agente1)%>%
  count(regiao)

 1 " Cryptosporidium"      Sudeste     12
 2 Adenovírus              Nordeste     2
 3 Adenovírus              Sudeste      2
 4 Aeromonas               Nordeste     2
 5 Aeromonas               Sudeste      5
 6 Aeromonas               Sul          2
 7 "Aeromonas hidrophila " Nordeste     1
 8 "Aeromonas hidrophila " Sudeste      1
 9 Aeromonas spp.          Nordeste     3
 10 Amebíase                Sul          1

However, I need to consolidate a table where the regions are the variables (columns) and the microorganisms the observations (rows). How do I do it?

  • 1

    Hello, use the command dput(sinan18) to take a sample of your data, and edit your topic, so it will be easier to understand your question and test in your data.

  • Are you looking for (OS in English)?

1 answer

1

To transform the values into columns, it is possible to use the function spread() package . See more about these transformations here.

sinan_count <- tibble::tribble(
                ~agente1,    ~regiao,  ~n,
       "Cryptosporidium",  "Sudeste", 12L,
            "Adenovírus", "Nordeste",  2L,
            "Adenovírus",  "Sudeste",  2L,
             "Aeromonas", "Nordeste",  2L,
             "Aeromonas",  "Sudeste",  5L,
             "Aeromonas",      "Sul",  2L,
  "Aeromonas hidrophila", "Nordeste",  1L,
  "Aeromonas hidrophila",  "Sudeste",  1L,
        "Aeromonas spp.", "Nordeste",  3L,
              "Amebíase",      "Sul",  1L
  )

tidyr::spread(sinan_count, regiao, n)
#> # A tibble: 6 x 4
#>   agente1              Nordeste Sudeste   Sul
#>   <chr>                   <int>   <int> <int>
#> 1 Adenovírus                  2       2    NA
#> 2 Aeromonas                   2       5     2
#> 3 Aeromonas hidrophila        1       1    NA
#> 4 Aeromonas spp.              3      NA    NA
#> 5 Amebíase                   NA      NA     1
#> 6 Cryptosporidium            NA      12    NA

Created on 2019-03-24 by the reprex package (v0.2.1)

If the absence of information in this count represents a distinct value of AN (missing values), for example zero cases, it is possible to inform it with the argument fill.

tidyr::spread(sinan_count, regiao, n, fill = c(n = 0L))
#> # A tibble: 6 x 4
#>   agente1              Nordeste Sudeste   Sul
#>   <chr>                   <int>   <int> <int>
#> 1 Adenovírus                  2       2     0
#> 2 Aeromonas                   2       5     2
#> 3 Aeromonas hidrophila        1       1     0
#> 4 Aeromonas spp.              3       0     0
#> 5 Amebíase                    0       0     1
#> 6 Cryptosporidium             0      12     0
  • It worked perfectly here! Thank you very much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.