How to count the cumulative number of occurrences of an element in a vector?

Asked

Viewed 7,167 times

4

Suppose I have a vector in R:

x <- c("a", "a", "b", "a", "b", "c")

I need to determine for each position i of the vector, how many times the element x[i] appeared until then. The result for the vector above would be

[1] 1 2 1 3 2 1

How to do this calculation efficiently and without using native code (C)? By efficient I mean something that runs in a second or less on a 1 million element vector (ex.: x <- sample.int(1:99, size=999999, replace=T))

1 answer

3


It is possible to use the function ave as follows:

contagem <- ave(rep(1, length(x)), x, FUN=cumsum)

First, a vector containing only numbers 1 is generated, the size of the input vector (rep(1, length(x))). Within this vector groups are selected according to the value of x, and within these groups the cumulative sum is calculated (cumsum). Since the vector only contains the number 1, the cumulative sum results in sequence 1, 2, 3, ..., i.e., a count.

  • Rodrigo, in this beta version of stackoverflow in Portuguese, it is common practice to ask questions and answer them ourselves right away (to stimulate the community)?

  • 1

    @carloscinelli, the practice hasn’t been very common, but I think it’s beneficial. It is the community that will decide which questions are relevant and which answers are most appropriate. I have answered my own question and I have accepted someone else’s answer. There is a discussion of this practice in the OS: http://meta.stackexchange.com/questions/17845/etiquette-for-answering-your-own-question

  • I also think it’s very good. It’s not very different from posting on a blog, but for a larger audience. I also think it helps to filter out superficial questions, which in a quick search will already find these answers here.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.