Tell Distinguished in R

Asked

Viewed 5,361 times

1

I would like a help to build a distinct (distinct) count of a data frame. Follow the data:

Filial  Matrícula   Valor
ABC      100       R$ 500,00
XYZ      200       R$ 850,00
XYZ      100       R$ 320,00
JCT      300       R$ 512,00
JCT      300       R$ 98,00
ABC      300       R$ 1.012,00

I would like the R to give me the consolidated showing the distinct count of the column "Registration" as well as the sum of the column "Value". Similar to what the Excel dynamic table already does. The result I want is:

Filial  Contagem/Matricula      Valor
ABC          2                R$ 1.512,00
JCT          1                R$ 610,00
XYZ          2                R$ 1.170,00

3 answers

1

Your example of outcome seems to be wrong, there is two lines with Filial equal to JCT.
Also, we cannot use class objects character, who have R$, so I just read the numbers, without the unit.

agg <- aggregate(Valor ~ Filial, dados, sum)
agg$Contagem <- tapply(dados$Matrícula, dados$Filial, FUN = function(x) length(unique(x)))
agg <- agg[, c(1, 3, 2)]
agg
#  Filial Contagem Valor
#1    ABC        2  1512
#2    JCT        1   610
#3    XYZ        2  1170

DICE.

dados <-
structure(list(Filial = structure(c(1L, 3L, 3L, 2L, 2L, 1L), .Label = c("ABC", 
"JCT", "XYZ"), class = "factor"), Matrícula = c(100L, 200L, 100L, 
300L, 300L, 300L), Valor = c(500, 850, 320, 512, 98, 1012)), .Names = c("Filial", 
"Matrícula", "Valor"), class = "data.frame", row.names = c(NA, 
-6L))
  • Thanks Rui, however the registration column is not counting correctly (separate count), because the expected result for the column Affiliate JCT should be equal to 1. That is, the registration "300" should be counted only once.

  • @Brunoavila Look now, edited code. Looks like it’s okay.

  • Rui Barrada, how would you count differently with more than one variable? In my example, only the variable "FILIAL" was used. How would it be using "tapply" in more than variable? Vlw

  • @Brunoavila Seria tapply(dados$Matrícula, list(var1, var2), FUN = ...). But to know if it works it is always necessary to test, it is better to edit the question with a case like this or maybe to do another question with a link to this.

1


Using the package dplyr

library(dplyr)
dados %>% 
  group_by(Filial) %>% 
  summarise(Contagem = length(unique(Matrícula)), Valor = sum(Valor))
  • Thanks Rafael, but the registration column is not counting correctly (distinct count), because the expected result for the column Branch JCT must be equal to 1. That is, the registration "300" must be counted only once.

  • Because the registration 300 of the JCT Branch should be counted only once if in your data it appears two?

  • Anyway, I updated the code that generates the result you expect

-1

Dplyr:

library('dplyr')
dados <- dados %>% 
         group_by(Filial) %>% 
         summarise(Contagem = n_distinct(Filial),
                   Valor = sum(Valor))

Browser other questions tagged

You are not signed in. Login or sign up in order to post.