2
I’m a beginner, and I know my code is still a little awkward, but let’s take it one step at a time. I have a dataframe with 2 columns (SO source and DT doc type), where I need to separate the OS by DT, count the frequencies (OS) and generate a ranking with 15+ for each DT. I made a code looking at the examples given on this site, count the frequencies and then separate by type of document, however when I check the initial frequencies with the after the separation the last one gives a number always lower. I’ll put a little sample here.
SO DT
ACM SIGMIS DATABASE ARTICLE
ACM SIGPLAN NOTICES ARTICLE
MODERN CASTING BOOK
MODERN DEVELOPMENTS IN POWDER METALLURGY BOOK
ELECTRICAL COMMUNICATION CONFERENCE PAPER
ELECTRONIC DESIGN CONFERENCE PAPER
ELECTRONIC ENGINEERING (LONDON) CONFERENCE PAPER
ELECTRONIC PACKAGING AND PRODUCTION CONFERENCE PAPER
Initially my data was on a date
q1
q1_so <- data.frame(q1$SO, q1$DT) # pega a coluna SO e DT e transforma em df
names(q1_so)[1:2] <- c("SO", "DT") # renomeando nome coluna p facilitar
# cria a coluna Freq e conta a frequencia de SO
q1_soma_dt <- data.frame(with(q1_so,table(DT)))
q1_freq <- with(q1_so,table(SO,DT))
q1_freq <- data.frame(q1_freq) # quantidade de SO por classe DT
article cut
q1_art <- subset(q1_freq,DT =='ARTICLE' & Freq >0)
library(plyr)
q1_art <-arrange(q1_art,desc(Freq)) # ordena em ordem decrescente
sum(q1_art$Freq)
rank 20+
q1_art <- q1_art[1:20, ]
Thanks for your help
I decided with the result Thanks Rui
– user108753