How to generate random values for a known distribution?

Asked

Viewed 4,952 times

5

I am aware that, for example runif(1000,0,3) generates 1000 random values for a uniform distribution for x between 0 and 3. But how to do this for any probability density function? Every clue is grateful!

  • 3

    I’m no expert, but a quick search seems to indicate that the problem is more mathematical/statistical than programming. Apparently the solution involves manipulating its function so that it can be fed random (uniform) values, and hence generate the values with such a distribution.

  • According to this link, you can get an approximation of this simply by reversing the function and solving to values with uniform distribution (if the function is discrete). Ideally you would post the function so we can try to solve the problem.

  • 2

    Each distribution has a specific function in R to obtain samples. For example, to obtain values from a binomial distribution, the function is rbinom, for the distribution of Poisson, rpois. Both in the package Stats which is already installed with R. On the project website (in English) there is a list with various distributions and packages that support them. What is listed in Base Functionality is included in any updated version of R.

  • If the density function is known, it is worth the comment of @Ailtonandradedeoliveira, because there is a function for each distribution. If it is unknown, but you have a sample, the question becomes complicated (eg. what is the domain of the function? Real? Non-negative Real? Defined range?).

1 answer

5

Every statistical distribution can be defined by a cumulative distribution function F(x).

A well-known result states that if you have a random variable U with uniform distribution in the range (0.1). Then inserir a descrição da imagem aqui follows the distribution defined by F.

The proof of this result is simple:

inserir a descrição da imagem aqui

So if you can generate random numbers with the uniform distribution and know the cumulative distribution function, you can also generate random numbers according to that distribution.

In the R, this is already programmed for several distributions: see list that Ailton posted. But, it’s not too complicated to program for another distribution if you can reverse your F(x).


If you only know the cumulative distribution function, you can write as an optimization problem. Define the cumulative distribution ( Here is the cumulative exponential distribution):

dF <- function(x, lambda = 0.5){
  1 - exp(-lambda*x)
}

Draw a random number between 0 and 1 with uniform distribution. In my case I obtained:

n <- 0.917915

Then you have to find x of distribution dF which is closer to n. This can be done as follows:

op <- optim(runif(1), function(x){
  abs(dF(x, 0.5) - n)
}, method = "BFGS")

See that:

> op$par
[1] 5

And that :

> dF(5, 0.5)
[1] 0.917915

This wikipedia article explains more fully what I said here: https://en.wikipedia.org/wiki/Inverse_transform_sampling


This procedure can be repeated for, from a random sample of a random variable with uniform distribution we obtain a sample with cumulative distribution defined on the object dF. In this case we use the exponential distribution:

dF <- function(x, lambda = 0.5){
  1 - exp(-lambda*x)
}
amostra <- runif(1000)

To obtain the sample with exponential distribution it is necessary to find the inverse value of dF for each random number generated. This can be obtained as follows:

inverter_distribuicao <- function(x){
  m <- nlm(function(y){
      abs(dF(y, 0.5) - x)
    }, p = 1
  )
  return(m$estimate)
}
amostra_exp <- sapply(amostra, inverter_distribuicao)

See now the histogram of the generated sample:

inserir a descrição da imagem aqui

Browser other questions tagged

You are not signed in. Login or sign up in order to post.