Select first with conditional

Question

Select first with conditional

Asked 6 years, 9 months ago

Viewed 88 times

2

I am starting in R, and I have a data frame similar to this one below:

x <- data.frame(cod_produto = c(1,1,1,2,2,2), 
            valordia = c(0,0,150.23,110.98,18.65,0), 
            data = c("2019-01-01","2019-01-02","2019-01-03","2019-01-01","2019-01-02","2019-01-03"))

How to return the first day the sale took place (valordia > 0)?

I tried to use the package dplyr.

2 answers

Browser other questions tagged r dplyr

You are not signed in. Login or sign up in order to post.

by neves • **5,644** points · Answer 1 · 2019-05-16T18:18:30+00:00

2

With dplyr:

library(dplyr)

x %>% 
  filter(valordia > 0) %>% 
  arrange(data) %>% 
  slice(1)

cod_produto valordia       data
          2   110.98 2019-01-01

filter filters the values > 0;

arrange sorts the dates;

slice returns only the first observation, in this case.

If you want to know when the first sale took place, no matter the date, just remove the arrange function. It would look like this:

x %>% 
  filter(valordia > 0) %>% 
  slice(1)

  cod_produto valordia       data
            1   150.23 2019-01-03

Thanks for the return. There is possibility to use within a Summarize?

– Ceconello

2019/05/16 at 19:38
How would the use (for example, add the valordia)?

– neves

2019/05/16 at 19:40
Something similar to the code below. Somando o valor total das vendas e incluindo dois campos com a data da primeira e da última venda:

y <- x %>% 
 dplyr::group_by(cod_produto) %>%
 dplyr::summarize(totalVenda = sum(valordia),
 dataPrimeiraVenda = ??? , dataUltimaVenda = ???)

– Ceconello

2019/05/17 at 11:36
Instead of summarise which returns a simple summary of an operation, use mutate. Make sure you answer to what you want: y <- x %>% 
 group_by(cod_produto) %>% 
 mutate(totalVenda = sum(valordia)) %>% 
 arrange(data) %>% 
 slice(c(1, n()))

– neves

2019/05/17 at 12:19
The code you created returns two lines for each product (first date and last date). However, what I need is to group by product code (that’s why I was using the summarise). The result would look something like this (in single record for each product): cod_produto, totalVenda, dataPrimeiraVenda, dataUltimaVenda. The dates of the first and last sale must be by field valordia > 0

– Ceconello

2019/05/17 at 12:34
Do you want the date in separate columns? This is, dataPrimeiraVenda and dataUltimaVenda and, if the valordia > 0? If so, it follows: library(tidyr)

x %>% group_by(cod_produto) %>% 
 mutate(totalVenda = sum(valordia)) %>% 
 arrange(data) %>% 
 slice(c(1, n())) %>% 
 spread(key = data, value = totalVenda) %>% 
 filter(valordia > 0). The totalVenda is inside the dated columns.

– neves

2019/05/17 at 12:54
For the question date frame, the result should be like this: structure(data.frame(cod_produto = c(1,2),totalVenda = c(150.23,129.63), dataPrimeiraVenda = c("2019-01-03","2019-01-01"), dataUltimaVenda = c("2019-01-03","2019-01-02")))

– Ceconello

2019/05/17 at 13:04

Show 2 more comments

by Rui Barradas • **15,422** points · Answer 2 · 2019-05-16T18:31:08+00:00

With R base can be as follows.
First transform the column data in a class column "Date".

x$data <- as.Date(x$data)

Now, first get an index of the table rows ordered by date and then extract the smallest row with valordia > 0.

i <- order(x$data)
x[i, ][min(which(x$valordia[i] > 0)), ]
#  cod_produto valordia       data
#4           2   110.98 2019-01-01