How to count the number of frequencies for each column in a date.frame in R?

Asked

Viewed 812 times

1

I need to count the number of frequencies in each month, where each month is represented by a column:

       USUARIO jan fev mar abr mai jun jul ago set out nov dez 
         1160   0   1   1   1   1   1   1   1   1   1   1   1  
         2505   1   1   1   1   1   0   1   0   1   1   1   0           
         3042   1   1   1   0   0   0   1   1   1   1   1   0              
         3554   1   1   0   1   1   1   1   0   1   1   1   0     

How do I get the total frequencies of each month?

2 answers

5


First of all, let’s play back the data:

txt <- "USUARIO jan fev mar abr mai jun jul ago set out nov dez 
 1160   0   1   1   1   1   1   1   1   1   1   1   1  
 2505   1   1   1   1   1   0   1   0   1   1   1   0           
 3042   1   1   1   0   0   0   1   1   1   1   1   0              
 3554   1   1   0   1   1   1   1   0   1   1   1   0 "

dados <- read.table(text = txt, header = TRUE)
dados
  USUARIO jan fev mar abr mai jun jul ago set out nov dez
1    1160   0   1   1   1   1   1   1   1   1   1   1   1
2    2505   1   1   1   1   1   0   1   0   1   1   1   0
3    3042   1   1   1   0   0   0   1   1   1   1   1   0
4    3554   1   1   0   1   1   1   1   0   1   1   1   0

Now with the data in hand, what we’ll do is turn that data into a format long or Tidy.

library(tidyverse)
long <- dados %>% 
  gather(mes, n, -USUARIO)

With the data in the long format we can create a summary with the sum of the 1s that appear for each month.

long %>% 
  group_by(mes) %>% 
  summarise(n = sum(n))
# A tibble: 12 x 2
   mes       n
   <chr> <int>
 1 abr       3
 2 ago       2
 3 dez       1
 4 fev       4
 5 jan       3
 6 jul       4
 7 jun       2
 8 mai       3
 9 mar       3
10 nov       4
11 out       4
12 set       4
  • 3

    Suggestion: Before the first pipe lvls <- format(seq(as.Date("2018-01-01"), as.Date("2018-12-01"), by = "month"), format = "%b") and finish the gather with %>% mutate(mes = factor(mes, levels = lvls)) to have the result ordered by calendar month and not by alphabetical order of months.

5

A way to solve this problem using the R basic is through the function apply. using the data set provided by Tomás, we have the following:

txt <- "USUARIO jan fev mar abr mai jun jul ago set out nov dez 
 1160   0   1   1   1   1   1   1   1   1   1   1   1  
 2505   1   1   1   1   1   0   1   0   1   1   1   0           
 3042   1   1   1   0   0   0   1   1   1   1   1   0              
 3554   1   1   0   1   1   1   1   0   1   1   1   0 "

dados <- read.table(text = txt, header = TRUE)

The function apply has three arguments:

  1. The data set that we will analyze in array format (can be data frame or matrix, for example)

  2. A value equal to 1 or 2. 1 indicates that we will apply a function in the array rows, while 2 indicates that we will apply a function in its columns

  3. The function we will apply.

For your trouble, apply may be applied as follows:

apply(dados, 2, sum)[-1]
# jan fev mar abr mai jun jul ago set out nov dez 
#  3   4   3   3   3   2   4   2   4   4   4   1 

I used the data frame dados, with the function sum applied in your columns (2). As it has a user column, it is possible to remove it at the end using [-1], that informs to the R remove the first position of the vector resulting from the application of apply.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.