Create sequential counter

Asked

Viewed 726 times

3

I need to create an occurrences count column of a value present in another column. For example, counting in column "y" of elements present in column "x":

    x     y
1   A     1
2   B     1
3   A     2
4   C     1
5   B     2
6   A     3

I’ll probably have to create some loop, but I couldn’t develop something efficient.
Usually only the final result of the counter is considered (as an even occurrence counter). However, I need in some type of counter to "store" all the steps of the count, to create an identification number for each occurrence.

  • Just to understand, you want to group the data by y and count the values of x within each group, correct?

2 answers

4

No need to create a loop. You can solve this problem using the package dplyr:

dados <- structure(list(x = structure(c(1L, 3L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 2L), 
  .Label = c("A", "B", "C"), class = "factor")), .Names = "x", 
  row.names = c(NA, -10L), class = "data.frame")

library(dplyr)
dados %>%
  group_by(x) %>%
  mutate(y = 1:n())

# A tibble: 10 x 2
# Groups:   x [3]
        x     y
   <fctr> <int>
 1      A     1
 2      C     1
 3      B     1
 4      B     2
 5      B     3
 6      C     2
 7      C     3
 8      C     4
 9      B     4
10      B     5

2

Your problem is quiet, and like Marcus said, dplyr can handle it. But I found his solution not too general.

The following code counts occurrences of x in each group of y (note that I slightly altered its matrix to give a count greater than 1).

df <- 
    data.frame(
    x = c('A', 'B', 'A','C','B','A', 'A'),
    y = c(1,1,2,1,2,3,1)
)

df %>% 
    group_by(y, x) %>% 
    count()

Resulting in:

# A tibble: 6 x 3
# Groups:   y, x [6]
      y      x     n
    <dbl> <fctr> <int>
1     1      A     2
2     1      B     1
3     1      C     1
4     2      A     1
5     2      B     1
6     3      A     1

Another way to count the elements of a group is by using the function n() within a summarise:

df %>% 
    group_by(y, x) %>% 
    summarise(contagem = n())

The result is the same as the previous one. If you need to separate the table into several smaller tables according to the values of y, you can do so:

df %>% 
    group_by(y, x) %>% 
    count %>% 
    split(.$y)

Resulting in a list of tibbles (easily convertible to data frames):

$`1`
# A tibble: 3 x 3
# Groups:   y, x [3]
      y      x     n
    <dbl> <fctr> <int>
1     1      A     2
2     1      B     1
3     1      C     1

$`2`
# A tibble: 2 x 3
# Groups:   y, x [2]
      y      x     n
    <dbl> <fctr> <int>
1     2      A     1
2     2      B     1

$`3`
# A tibble: 1 x 3
# Groups:   y, x [1]
      y      x     n
    <dbl> <fctr> <int>
1     3      A     1
  • I ended up using the dplyr suggested by Marcus. I will try this way too, William. I thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.