Create R column from the number of records by 2 Ids

Question

Create R column from the number of records by 2 Ids

Asked 4 years, 8 months ago

Viewed 50 times

3

I am starting in R and stackflow. I ask for your help. I have the following dataframe:

library(lubridate)

ID = c("000225", "000225", "000225", "000225", "000226", "000226", "000227", "000227", "000227", "000227", "000225", "000225", "000225", "000225", "000226", "000226", "000227", "000227", "000227", "000227")
Hr = c("08:00","12:00","13:00" ,"17:00", "13:00" ,"17:00","08:00","12:00","13:00" ,"17:00",
       "08:00","12:00","13:00" ,"17:00", "13:00" ,"17:00","08:00","12:00","13:00" ,"17:00")
data =dmy(c("12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "12-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020", "13-11-2020"))

dados = data.frame(ID, Hr, data)
dados

The first three columns contain the point records of a company’s employees. What I need is to generate a fourth column that identifies the type of record:

       ID    Hr       data   tipo
1  000225 08:00 2020-11-12 início
2  000225 12:00 2020-11-12 almoço
3  000225 13:00 2020-11-12  volta
4  000225 17:00 2020-11-12  saída
5  000226 13:00 2020-11-12 início
6  000226 17:00 2020-11-12 almoço
7  000227 08:00 2020-11-12 início
8  000227 12:00 2020-11-12 almoço
9  000227 13:00 2020-11-12  volta
10 000227 17:00 2020-11-12  saída
11 000225 08:00 2020-11-13 início
12 000225 12:00 2020-11-13 almoço
13 000225 13:00 2020-11-13  volta
14 000225 17:00 2020-11-13  saída
15 000226 13:00 2020-11-13 início
16 000226 17:00 2020-11-13 almoço
17 000227 08:00 2020-11-13 início
18 000227 12:00 2020-11-13 almoço
19 000227 13:00 2020-11-13  volta
20 000227 17:00 2020-11-13  saída

Although I created this data to illustrate, the BD where I will collect the data forces the user to record the times in the sequence: "start, lunch, return, exit". Thus, at first, I do not need to order the schedules for a collaborator within that day, because the bank already provides the data in that order.

So, what I need is: when the collaborator has only 1 record on the day, identify as "start". If he has 2 records, identify the first as "start" and the second as "lunch". If 3 records, "start", "lunch" and "back", etc. I count on your help. Thank you!

1 answer

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Carlos Eduardo Lagosta • **5,497** points · Answer 1 · 2020-11-14T18:02:58+00:00

One solution: number lines by category and convert numbers to types. Packages dplyr and data.table facilitate operation by categories. Conversion can be done with a dictionary.

# Dicionário
tipos <- setNames(c("inicio", "almoco", "volta", "saida"), 1:4)

Dplyr

library(dplyr)

dados %<>% group_by(ID, data) %>%
           mutate(tipo = tipos[1:n()])

Date.table

library(data.table)

setDT(dados)

dados[, tipo := tipos[1:.N], .(ID, data)]

In both cases, the result is the same:

> head(dados, 10)
        ID    Hr       data   tipo
 1: 000225 08:00 2020-11-12 inicio
 2: 000225 12:00 2020-11-12 almoco
 3: 000225 13:00 2020-11-12  volta
 4: 000225 17:00 2020-11-12  saida
 5: 000226 13:00 2020-11-12 inicio
 6: 000226 17:00 2020-11-12 almoco
 7: 000227 08:00 2020-11-12 inicio
 8: 000227 12:00 2020-11-12 almoco
 9: 000227 13:00 2020-11-12  volta
10: 000227 17:00 2020-11-12  saida