My dear, see the recommendations in the comment as they are very important to ensure the quality of the questions and answers. As you are arriving here in the OS I will give a discount and I will answer the question, let it be said in passing is nothing trivial, especially for those who are starting with the R.
REPRODUCIBLE EXAMPLE
The first thing you should have done in your question is to provide a reproducible example, so I will create a series of numbers that might well be from a count.
data = sample(x = 1:10, size = 100000, replace = T)
What this code does is basically take a sample, with replacement, of 100,000 numbers between 1 and 10. Obviously there will be some repetitions and I want to know how many repetitions followed by 5 numbers there are.
FINDING ANY REPETITION OF 5 NUMBERS IN A ROW
For that I will use the command rollapply
package zoo
. This function creates a "window" that walks along the sequence of numbers. Inside this window you can execute any function as the average for example. Here in case I will use the average:
> data[1:10]
[1] 8 2 8 6 10 10 4 6 7 10
> rollapply(data = data[1:10], 5, mean)
[1] 3.6 4.6 5.8 6.2 4.6 4.0
Note that the function returned only 6 numbers. This happened because I can only walk with the "window" up to 6 times when the dataset has only 10 numbers. See this figure:
i surrounded the window as function rollapply
makes. In case, for each window, which can rotate 6 times, he calculated the average. Now how this can help you find sequences of 5 consecutive days with the same number??
Just calculate in each window something that gives zero when all numbers are repeated! The function that does this is the standard deviation. If all numbers are equal the standard deviation is 0. So just use the rollapply
along with the function sd
, This settles the invoice! For example:
> dados = c(5,5,5,5,5,1,3,4,5,5,5,5,5)
> rollapply(dados, 5, sd)
[1] 0.0000000 1.7888544 1.7888544 1.6733201 1.6733201
[6] 1.6733201 0.8944272 0.4472136 0.0000000
here are two sequences of fives and therefore appeared two standard deviations equal to 0. Now just count how many times the zero appeared.
> sum(rollapply(dados, 5, sd) == 0)
[1] 2
Now doing the same in the data that I generated, which is a much larger sequence that we wouldn’t be able to visualize, we have:
> sum(rollapply(data, 5, sd) == 0)
[1] 6
SPECIFICALLY FINDING A SEQUENCE OF ZEROS
In case your series repeats several different numbers in a row and you are only interested in the zeros and not in the repetition of other numbers, what you can do is pass a function in the data and enter a noise in the numbers that are not zeros.
data2 <- ifelse(data == 0, data, jitter(data))
sum(rollapply(data2, 5, sd) == 0)
now the sequence number followed only with zero will be returned since other values that were previously repeated now with noise no longer repeat anymore.
CARING
See that in the previous example I generated 100,000 numbers by a random draw of number 1 to 10 and, still, I found six repeated sequences of numbers! The warning is: when finding this type of pattern in both series, especially if they are long rain series over 30 years old for example, ask yourself the following question: is this pattern I found so unlikely that I say it is significant? It’s not difficult for you to assign meanings to random noises in the data. There are many academic papers out there with this problem.
Welcome to Stackoverflow! Unfortunately, this question cannot be reproduced by anyone trying to answer it. Please, take a look at this link and see how to ask a reproducible question in R. So, people who wish to help you will be able to do this in the best possible way.
– Marcus Nunes