Create R column from the number of records by 2 Ids

Asked

Viewed 50 times

3

I am starting in R and stackflow. I ask for your help. I have the following dataframe:

library(lubridate)

ID = c("000225", "000225", "000225", "000225", "000226", "000226", "000227", "000227", "000227", "000227", "000225", "000225", "000225", "000225", "000226", "000226", "000227", "000227", "000227", "000227")
Hr = c("08:00","12:00","13:00" ,"17:00", "13:00" ,"17:00","08:00","12:00","13:00" ,"17:00",
       "08:00","12:00","13:00" ,"17:00", "13:00" ,"17:00","08:00","12:00","13:00" ,"17:00")
data =dmy(c("12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020"))

dados = data.frame(ID, Hr, data)
dados

The first three columns contain the point records of a company’s employees. What I need is to generate a fourth column that identifies the type of record:

       ID    Hr       data   tipo
1  000225 08:00 2020-11-12 início
2  000225 12:00 2020-11-12 almoço
3  000225 13:00 2020-11-12  volta
4  000225 17:00 2020-11-12  saída
5  000226 13:00 2020-11-12 início
6  000226 17:00 2020-11-12 almoço
7  000227 08:00 2020-11-12 início
8  000227 12:00 2020-11-12 almoço
9  000227 13:00 2020-11-12  volta
10 000227 17:00 2020-11-12  saída
11 000225 08:00 2020-11-13 início
12 000225 12:00 2020-11-13 almoço
13 000225 13:00 2020-11-13  volta
14 000225 17:00 2020-11-13  saída
15 000226 13:00 2020-11-13 início
16 000226 17:00 2020-11-13 almoço
17 000227 08:00 2020-11-13 início
18 000227 12:00 2020-11-13 almoço
19 000227 13:00 2020-11-13  volta
20 000227 17:00 2020-11-13  saída

Although I created this data to illustrate, the BD where I will collect the data forces the user to record the times in the sequence: "start, lunch, return, exit". Thus, at first, I do not need to order the schedules for a collaborator within that day, because the bank already provides the data in that order.

So, what I need is: when the collaborator has only 1 record on the day, identify as "start". If he has 2 records, identify the first as "start" and the second as "lunch". If 3 records, "start", "lunch" and "back", etc. I count on your help. Thank you!

1 answer

5


One solution: number lines by category and convert numbers to types. Packages dplyr and data.table facilitate operation by categories. Conversion can be done with a dictionary.

# Dicionário
tipos <- setNames(c("inicio", "almoco", "volta", "saida"), 1:4)

Dplyr

library(dplyr)

dados %<>% group_by(ID, data) %>%
           mutate(tipo = tipos[1:n()])

Date.table

library(data.table)

setDT(dados)

dados[, tipo := tipos[1:.N], .(ID, data)]

In both cases, the result is the same:

> head(dados, 10)
        ID    Hr       data   tipo
 1: 000225 08:00 2020-11-12 inicio
 2: 000225 12:00 2020-11-12 almoco
 3: 000225 13:00 2020-11-12  volta
 4: 000225 17:00 2020-11-12  saida
 5: 000226 13:00 2020-11-12 inicio
 6: 000226 17:00 2020-11-12 almoco
 7: 000227 08:00 2020-11-12 inicio
 8: 000227 12:00 2020-11-12 almoco
 9: 000227 13:00 2020-11-12  volta
10: 000227 17:00 2020-11-12  saida
  • 2

    Carlos, thank you so much for your help. I had researched how to do this and had not found a solution. His answer, besides serving as a solution, was an apprenticeship, because I learned to group in R, it will make my life too easy. I’m starting now and finding the R amazing, but a little hard to learn. I’ll try harder and harder. Again, thanks for the help!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.