Occurrence count of a dataframe giving error

Asked

Viewed 92 times

2

I’m a beginner, and I know my code is still a little awkward, but let’s take it one step at a time. I have a dataframe with 2 columns (SO source and DT doc type), where I need to separate the OS by DT, count the frequencies (OS) and generate a ranking with 15+ for each DT. I made a code looking at the examples given on this site, count the frequencies and then separate by type of document, however when I check the initial frequencies with the after the separation the last one gives a number always lower. I’ll put a little sample here.

SO                                              DT 
ACM SIGMIS DATABASE                             ARTICLE
ACM SIGPLAN NOTICES                             ARTICLE
MODERN CASTING                                  BOOK
MODERN DEVELOPMENTS IN POWDER METALLURGY        BOOK
ELECTRICAL COMMUNICATION                        CONFERENCE PAPER
ELECTRONIC DESIGN                               CONFERENCE PAPER
ELECTRONIC ENGINEERING (LONDON)                 CONFERENCE PAPER
ELECTRONIC PACKAGING AND PRODUCTION             CONFERENCE PAPER

Initially my data was on a date

q1
q1_so <- data.frame(q1$SO, q1$DT) # pega a coluna SO  e DT e transforma em df
names(q1_so)[1:2] <- c("SO", "DT") # renomeando nome coluna p facilitar
# cria a coluna Freq e conta a frequencia de SO
q1_soma_dt <- data.frame(with(q1_so,table(DT)))

q1_freq <- with(q1_so,table(SO,DT)) 
q1_freq <- data.frame(q1_freq) # quantidade de SO por classe DT

article cut

q1_art <- subset(q1_freq,DT =='ARTICLE' & Freq >0) 
library(plyr)
q1_art <-arrange(q1_art,desc(Freq)) # ordena em ordem decrescente
sum(q1_art$Freq)

rank 20+

q1_art <- q1_art[1:20, ]  

Thanks for your help

  • I decided with the result Thanks Rui

1 answer

3

This solution uses the package dplyr.

library(dplyr)

dados %>%
  group_by(DT, SO) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  slice(1:15)
## A tibble: 8 x 3
## Groups:   DT [3]
#  DT               SO                                       count
#  <chr>            <chr>                                    <int>
#1 ARTICLE          ACM SIGMIS DATABASE                          1
#2 ARTICLE          ACM SIGPLAN NOTICES                          1
#3 BOOK             MODERN CASTING                               1
#4 BOOK             MODERN DEVELOPMENTS IN POWDER METALLURGY     1
#5 CONFERENCE PAPER ELECTRICAL COMMUNICATION                     1
#6 CONFERENCE PAPER ELECTRONIC DESIGN                            1
#7 CONFERENCE PAPER ELECTRONIC ENGINEERING (LONDON)              1
#8 CONFERENCE PAPER ELECTRONIC PACKAGING AND PRODUCTION          1

Data in format dput.

dados <-
structure(list(SO = c("ACM SIGMIS DATABASE", 
"ACM SIGPLAN NOTICES", "MODERN CASTING", 
"MODERN DEVELOPMENTS IN POWDER METALLURGY", 
"ELECTRICAL COMMUNICATION", "ELECTRONIC DESIGN", 
"ELECTRONIC ENGINEERING (LONDON)", 
"ELECTRONIC PACKAGING AND PRODUCTION"), 
DT = c("ARTICLE", "ARTICLE", "BOOK", "BOOK", 
"CONFERENCE PAPER", "CONFERENCE PAPER", 
"CONFERENCE PAPER", "CONFERENCE PAPER")), 
row.names = c(NA, -8L), class = "data.frame")
  • Thank you. I can see the answer for the first 10 lines, but my smallest sample has ARTICLE 662, ARTICLE 162, BOOK 8, CONFERENCE PAPER, 542,...with which I can save this information ?

  • I thought the problem had been solved, but it only works for a small sample, I can’t see the result. For a large sample, using: library(dplyr) q1_so %>% group_by(DT, SO) %>% summarise(Count = n()) % arrange(desc(Count)) %>% Slice(1:5) %>% head(20) I see the first 5 responses of the categories until they arrive in 20 lines, then they are cut Increasing the head(50) it cuts in 10 lines, and I can’t see the result of the 8 categories

  • I tried to use the result = q1_so, to save the result, however it shows the original q1_so dataframe without the changes. This is the return I have, # A Tibble: 21 x 3 # Groups: DT [8] DT SO Count <fct> <fct> <int> 1 ARTICLE ACM SIGPL 17 2 ARTICLE METAL POW 16 3 ARTICLE IBM TECHNIETIN 15 4 "ARTICLE " JOURNAL O 17 5 "ARTICLE " DATAMATIO 8 6 "ARTICLE " JOURNAL ONT 6

  • 7 BOOK NA 3 8 BOOK HUM-COMPU 2 9 BOOK AI, GRAPHIC 1 10 CONFERENCE PAPER NA # ... with 11 more Rows

Browser other questions tagged

You are not signed in. Login or sign up in order to post.