Create new column in the partial match-based dataframe of the string without repeats

Asked

Viewed 55 times

2

I have a dataframe with two columns, being them GL GLDESC and wanted to add a third column called KIND based on column data GLDESC.

Dataframe:

      GL                             GLDESC
1 515100                        Payroll-ISL
2 515900                        Payroll-ICA
3 532300                           Bulk Gas
4 551000                          Supply AB
5 551000                        Supply XPTO
6 551100                          Supply AB
7 551300                   Material Interno

Whereas:

  • If the GLDESC contain the word Payroll anywhere string, KIND get back to me Payroll

  • If the GLDESC contain the word Supply anywhere string, KIND get back to me Supply

  • In all other cases, KIND is Other.

What is solved without problems with:


DF$KIND <- ifelse(grepl("supply", DF$GLDESC, ignore.case = T), "Supply", 
         ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))

But with that, I have everything you quote Supply, for example, classified. However, as in lines 4 and 5 of the DF, the same GL has two Supply, which for me is unnecessary. Actually, I need just one kind of GLDESC be classified case for the same GL the string repeats itself.

How to?

Edited: Deleting duplicates is not an output I can take. I need to keep everything where it is, just sort the first and skip the second.

1 answer

3

You can use the grepl to give logical indices and then calculate positions in the intended result vector.

i <- grepl("Payroll", dados$GLDESC)
j <- grepl("Supply", dados$GLDESC)
dados$KIND <- c("Other", "Payroll", "Supply")[1 + i + 2*j]

dados
#      GL           GLDESC    KIND
#1 515100      Payroll-ISL Payroll
#2 515900      Payroll-ICA Payroll
#3 532300         Bulk Gas   Other
#4 551000        Supply AB  Supply
#5 551000      Supply XPTO  Supply
#6 551100        Supply AB  Supply
#7 551300 Material Interno   Other

dice.

dados <- read.table(text = "
      GL                             GLDESC
1 515100                        Payroll-ISL
2 515900                        Payroll-ICA
3 532300                           'Bulk Gas'
4 551000                          'Supply AB'
5 551000                        'Supply XPTO'
6 551100                          'Supply AB'
7 551300                   'Material Interno'
", header = TRUE)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.