Problem organizing a tidyr dataframe in R

Question

Problem organizing a tidyr dataframe in R

Asked 7 years, 9 months ago

Viewed 47 times

3

I have this dataframe and I need to organize it so that the single dates are the first column and the columns are the shares of Bovespa with their values being their respective prices:

df<-structure(list(data = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("02/01/2018", 
    "03/01/2018", "04/01/2018", "05/01/2018", "06/01/2018", "07/01/2018"
    ), class = "factor"), code = structure(1:99, .Label = c("AALR3", 
    "AAPL34", "ABCB2", "ABCB4", "ABCP11", "ABEV3", "ADHM3", "AEFI11", 
    "AFLT3", "AGCX11", "AGRO3", "ALMI11", "ALPA4", "ALSC3", "ALUP11", 
    "ALUP3", "ALUP4", "AMAR3", "AMZO34", "ANCR11B", "ANIM3", "ARZZ3", 
    "ATOM3", "ATTB34", "AZEV4", "AZUL4", "BAUH4", "BAZA3", "BBAS3", 
    "BBDC3", "BBDC4", "BBFI11B", "BBPO11", "BBRC11", "BBRK3", "BBSD11", 
    "BBSE3", "BBVJ11", "BCFF11", "BCIA11", "BCRI11", "BDLL4", "BEEF3", 
    "BEES3", "BEES4", "BERK34", "BGIP4", "BIOM3", "BKBR3", "BMEB3", 
    "BMEB4", "BMIN4", "BMKS3", "BMLC11B", "BOAC34", "BOBR4", "BOVA11", 
    "BOVV11", "BPAC11", "BPAC3", "BPAC5", "BPAN4", "BPFF11", "BPHA3", 
    "BRAP3", "BRAP4", "BRAX11", "BRCR11", "BRDT3", "BRFS3", "BRIN3", 
    "BRIV3", "BRIV4", "BRKM5", "BRML3", "BRPR3", "BRSR3", "BRSR6", 
    "BSEV3", "BTOW3", "BVMF3", "CAMB4", "CAML3", "CARD3", "CARE11", 
    "CBOP11", "CCPR3", "CCRO3", "CCXC3", "CELP3", "CEOC11", "CEPE5", 
    "CESP3", "CESP5", "CESP6", "CGAS3", "CGAS5", "CGRA3", "CGRA4"
    ), class = "factor"), price = c(14.94, 56.81, 4.11, 16.83, 16.1, 
    21.33, 2, 158.55, 4.98, 1244.01, 12.45, 1777.99, 17.33, 18.2, 
    18.45, 7.8, 5.35, 7.3, 1921.2, 2000, 28.22, 54.7, 3.5, 126, 1.02, 
    27, 12.45, 25.05, 32.09, 32.28, 34.05, 2979.9, 143.89, 143.49, 
    1, 60.84, 28.65, 60.11, 78.47, 117.49, 106.99, 29, 10.77, 3.64, 
    3.82, 644.3, 35.5, 8.99, 17.6, 6.49, 5.01, 18.24, 350, 84.45, 
    97.3, 4.97, 74, 77.56, 19.1, 8.48, 6.5, 1.87, 97.99, 3.65, 24.96, 
    28.84, 62.5, 106.15, 17, 36.97, 12.3, 5.62, 6.46, 42.88, 12.65, 
    10.67, 24, 14.91, 4.51, 20.47, 22.89, 8.99, 7.85, 9.85, 2.2, 
    740.3, 8.9, 16.3, 0.63, 1.81, 89.01, 15.5, 11.12, 20, 13.19, 
    55.5, 59.35, 27.12, 27.3)), class = "data.frame", row.names = c(NA, 
    -99L))

I used this command: tidyr::spread(df, code, price), but in my original dataframe the dates are scrambled, IE, not in chronological order. as I correct this?

I tried to correct by transforming the Data vector into a date``vector but it didn’t work.

1 answer

Browser other questions tagged r dplyr

You are not signed in. Login or sign up in order to post.

by Daniel Falbel • **12,504** points · Answer 1 · 2018-09-26T21:53:24+00:00

From what I understand, you want after the spread, the dates are sorted. I don’t know if this is possible using the function spread.

I would wear a arrange right after the spread in the pipeline. Something like:

library(tidyverse)
library(lubridate)

df %>%
  spread(code, price) %>%
  arrange(dmy(data))

Note that I use the lubridate to convert the dates that are in the format dd/mm/yyyy in an R date format so that the ordering is correct.

The idea of tidyverse is that each function performs only one task and that combining several functions (as if it were a Lego) you arrive at the result you want.