Group and sum columns - r

Asked

Viewed 422 times

0

Dice:

P<-c("Alemanha", "USA", "Alemanha", "USA", "USA", "França")
Citacoes<-c(1,5,8,0,9,20)
df<-data.frame(P,Citacoes)

Each P (country) represents 1 document and each document has an amount of Citacoes (citations) associated with it.

I need to group P and add Citações.

What do I get with the code below:

     library(dplyr)
a<-
group_by(df,P)%>%
  summarise(Total=sum(Citacoes))
a

But in addition, I also need to present, in the same table, a sum of the number of documents per country. In this case, "USA" has three documents, "Alemanha" has two documents and "França" has a.

That is, at the end I need a table with 3 columns pais, Total of Citacoes for pais associated and sum of documents.

Finally, I would like to create a new column with the average of Citacoes for pais, tried the mutate, but without success. And sort these data in descending order by the number of documents from each country.

I’m open to trying solutions beyond dplyr.

Grateful

1 answer

2


The question code is almost there, just include the count n():

library(dplyr)

a <- df %>%
  group_by(P) %>%
  summarise(Total = sum(Citacoes),
            Count = n())
a
# A tibble: 3 x 3
#  P        Total Count
#  <fct>    <dbl> <int>
#1 Alemanha     9     2
#2 França      20     1
#3 USA         14     3

To calculate the averages, it is sufficient to include the calculation formula in the summarise.

a <- df %>%
  group_by(P) %>%
  summarise(Total = sum(Citacoes),
            Count = n(),
            Media = Total/Count)
a
## A tibble: 3 x 4
#  P        Total Count Media
#  <fct>    <dbl> <int> <dbl>
#1 Alemanha     9     2  4.5 
#2 França      20     1 20   
#3 USA         14     3  4.67
  • Thank you! E to create a column that is the average quote per parent? I tried mutate, but I couldn’t. Finally, it would be interesting to also sort by the amount of the Count column...

  • @Gustavooliveirapinto No summarise, just include Media = Total/Count. I’ll edit.

  • Rui, I tested it this way and it worked as well: a<- group_by(df,P)%>% summarise(Total=sum(Citations), Count = n()%>% mutate(ratio = Total / Count) a b=a[order(a$Count,decreasing=T),]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.