Data frame handling

Asked

Viewed 99 times

2

Hello,

I’m trying to manipulate a database with 300+ observations, but each observation has a different number of columns. The problem is that from a certain point, the columns start to repeat themselves, but they have different values and I wanted to somehow join them.

What I essentially want is, within an object, to save the information of each observation in a way that helps in the visualization.

For example:

            a   b   c   d    e  c   d   e   c     d    e
    Obs_1   AA  BB  CC  8.3  A  SSD 2.3 RN  S76   5    A
    Obs_2   SS  DA  LL  5    A  SDD         AD23  8.2  A

And I’d like to turn it into something like this:

         a   b    c      d   e
Obs_1   AA  BB  CC76    8.3  A
                SSD89   2.3  RN
                S76      5   A
Obs_2   SS  DA  LL       5   A
                SDD     
                AD23    8.2  A

Is it possible to do this in a non-manual way in R? Note that there are values in the database that have no value, some values R complete with NA, others it simply leaves blank.

  • 1

    Please, if the data.frame is called dados, edit the question with the output of dput(head(dados, 20)), to have an exact copy of the data structure.

  • The way the question was asked, without a sample dataset, and difficult to understand exactly the nature of the problem.

1 answer

2

See if this code helps. I’m testing with mtcars. You need to install the packages dplyr, tidyr and janitor

mtcars
# renomeia as colunas para ficar parecido com seu exemplo
names(mtcars) <- c("a", "b", "c", "d", "b", "c", "d", "a", "b", "c", "d")

mtcars %>% 
  slice(c(1, 2))

# A tibble: 2 x 11
      a     b     c     d     b     c     d     a     b     c     d
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   21.    6.  160.  110.    6.  160.  110.   21.    6.  160.  110.
2   21.    6.  160.  110.    6.  160.  110.   21.    6.  160.  110.


res <- mtcars %>% 
  # pega as primeiras linhas para ficar parecido com seu exemplo
  head(2) %>% 
  # transforma os nomes em unicos
  janitor::clean_names() %>% 
  # empilha todos os dados
  gather() %>% 
  # separa as colunas respectivas e os respectivos índices
  separate(key, c("letra", "numero"), sep = "_", fill = "right") %>% 
  # substitui NA por 1 no caso da primeira separação
  replace_na(list(numero = "1")) %>% 
  # completa as combinacoes de indices e letras que nao apareceram
  complete(letra, numero) %>% 
  # essa parte nao sei se precisa 
  group_by(letra, numero) %>% 
  summarise(res = first(value)) %>% 
  ungroup() %>% 
  # espalha as letras nas colunas
  spread(letra, res)


res

# A tibble: 3 x 5
  numero     a     b      c      d
  <chr>  <dbl> <dbl>  <dbl>  <dbl>
1 1        21.  6.00 160.   110.  
2 2         0.  3.90   2.62  16.5 
3 3        NA   1.00   4.00   4.00

Browser other questions tagged

You are not signed in. Login or sign up in order to post.