Select first with conditional

Asked

Viewed 88 times

2

I am starting in R, and I have a data frame similar to this one below:

x <- data.frame(cod_produto = c(1,1,1,2,2,2), 
            valordia = c(0,0,150.23,110.98,18.65,0), 
            data = c("2019-01-01","2019-01-02","2019-01-03","2019-01-01","2019-01-02","2019-01-03"))

How to return the first day the sale took place (valordia > 0)?

I tried to use the package dplyr.

2 answers

2

With dplyr:

library(dplyr)

x %>% 
  filter(valordia > 0) %>% 
  arrange(data) %>% 
  slice(1)

cod_produto valordia       data
          2   110.98 2019-01-01

filter filters the values > 0;

arrange sorts the dates;

slice returns only the first observation, in this case.

If you want to know when the first sale took place, no matter the date, just remove the arrange function. It would look like this:

x %>% 
  filter(valordia > 0) %>% 
  slice(1)

  cod_produto valordia       data
            1   150.23 2019-01-03
  • Thanks for the return. There is possibility to use within a Summarize?

  • How would the use (for example, add the valordia)?

  • Something similar to the code below. Somando o valor total das vendas e incluindo dois campos com a data da primeira e da última venda:&#xA;&#xA;y <- x %>% &#xA; dplyr::group_by(cod_produto) %>%&#xA; dplyr::summarize(totalVenda = sum(valordia),&#xA; dataPrimeiraVenda = ??? , dataUltimaVenda = ???)

  • Instead of summarise which returns a simple summary of an operation, use mutate. Make sure you answer to what you want: y <- x %>% &#xA; group_by(cod_produto) %>% &#xA; mutate(totalVenda = sum(valordia)) %>% &#xA; arrange(data) %>% &#xA; slice(c(1, n()))

  • The code you created returns two lines for each product (first date and last date). However, what I need is to group by product code (that’s why I was using the summarise). The result would look something like this (in single record for each product): cod_produto, totalVenda, dataPrimeiraVenda, dataUltimaVenda. The dates of the first and last sale must be by field valordia > 0

  • Do you want the date in separate columns? This is, dataPrimeiraVenda and dataUltimaVenda and, if the valordia > 0? If so, it follows: library(tidyr)&#xA;&#xA;x %>% group_by(cod_produto) %>% &#xA; mutate(totalVenda = sum(valordia)) %>% &#xA; arrange(data) %>% &#xA; slice(c(1, n())) %>% &#xA; spread(key = data, value = totalVenda) %>% &#xA; filter(valordia > 0). The totalVenda is inside the dated columns.

  • For the question date frame, the result should be like this: structure(data.frame(cod_produto = c(1,2),totalVenda = c(150.23,129.63), dataPrimeiraVenda = c("2019-01-03","2019-01-01"), dataUltimaVenda = c("2019-01-03","2019-01-02")))

Show 2 more comments

0

With R base can be as follows.
First transform the column data in a class column "Date".

x$data <- as.Date(x$data)

Now, first get an index of the table rows ordered by date and then extract the smallest row with valordia > 0.

i <- order(x$data)
x[i, ][min(which(x$valordia[i] > 0)), ]
#  cod_produto valordia       data
#4           2   110.98 2019-01-01
  • Rui, thanks for the feedback.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.