Integer overflow in R

Asked

Viewed 154 times

4

I’m working on a population dynamics simulation that involves generating whole numbers. Due to the hypotheses of my model, I am generating random numbers through the function rmultinom R. However, I am having overflow problems in my simulation.

The largest integer R can generate is defined by .Machine$integer.max. In my PC, this number is equal to 2147483648. In scientific notation, this value is equivalent to approximately 2,147 x 10 9.

But many of the simulations I run exceed this limit. The parameter size of rmultinom will be greater than 10 (10) or even 10 (12). And if so, I cannot generate random numbers with the distribution I desire.

What could I do to solve this problem? Any suggestions?

1 answer

2


From what I understand, what the rmultinom do, it’s like this:

  • You own lenght(prob) different types of balls, where prob is the paretero prob of function.

  • Then you will withdraw size balls independently, according to these probabilities. (size is the parameter size of function)

  • This procedure will be repeated n, times. (n is p parameter n of function)

Given this scenario I imagine you can use sizes larger than what fits in the R, as follows:

A possible function that takes samples with the maximum size of the R

library(bit64)
rmultinom2 <- function(size, prob){

  n <- size %/% 1000000000L
  resto <- size %% 1000000000L

  amostra <- rmultinom(n = as.integer(n), size = 1000000000L, prob = prob)
  amostra_resto <- rmultinom(n = 1, size = resto, prob = prob)

  return(rowSums(cbind(amostra, amostra_resto)))
}

Repeating experiment 100x

amostra <- plyr::ldply(1:100, function(x, size, prob) {
  rmultinom2(size, prob)
}, size = as.integer64("10000000000"), prob = c(1,2,4,5)
)

I think the big balcony here is to use the package bit64 which supports larger integers and take several samples independently to then add up. It is possible that it is also necessary to transform the lines of the amostra (inside the function) in large integers so that the sum also does not explode.

Now, if lenght(prob) is bigger than the whole largest, I don’t know.

  • Yes, yes, this is the definition of the multinomial distribution. The length(prob) it won’t be a problem for me, because it is fixed as 3. I checked your function here and, in my tests, it stopped overflowing in cases where this occurred.

  • 1

    Good! I think it will give problem if the size is > 1000000000L 2 It makes sense to do the tests independently and then add right?

  • It makes perfect sense because they are independent draws. One way to check this is to run your function amostra a large number of times (say 100000 instead of 100) for a smaller size value. Here runs the original version of rmultinom for the same replication values, size and size. If category estimates match what has been defined in prob, everything will be all right. I circled here and everything hit as it should.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.