I want to overlay two graphs

Question

I want to overlay two graphs

Asked 5 years, 2 months ago

Viewed 43 times

1

I’m trying to overlay two graphs, one from the binomial distribution and the other from Poisson. They should have close values for large N. N here was called x and saved the values generated by runif within a vector x. The way I’m packing, the function lines is always limiting Y to about 170 and I don’t know why. What other function can I use?

My intention is to simulate even n=x=1000.

k<-2
p<-0.0001
i<-1
x<-0
while(i<=1000){
  x[i]<-runif(1,min=5,max=200)
  i<-i+1
}
fun<-function(x){
  (exp(-x*p))*((x*p)**k)/factorial(k)
}

plot(x, fun(x), main="b(k=2;n=60:180,p=0.2)", col="red", type="l")

func<-function(x){
  (factorial(x)/(factorial(k)*factorial(x-k)))*(p**k)*(1-p)**(x-k)
}
lines(x, func(x), col="blue")

2 answers

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Marcus Nunes • **17,915** points · Answer 1 · 2020-06-04T23:43:34+00:00

The R will always plot the points in the order they appear in the vector. See the example below, in which I plot the function x 2:

x <- c(1, 5, 3, 6, 2, 4)
plot(x, x^2, type = "l")

See how the chart seems to make no sense. The best way to organize these results and view the desired chart is by sorting the values of x using the command sort:

x.ord <- sort(x)
plot(x.ord, x.ord^2, type = "l")

See that now the graph makes much more sense.

Your data suffers from the same problem. Like the values of x are generated randomly, they are not ordered. Therefore, just sort them to have a chart that makes sense:

k<-2
p<-0.0001
i<-1
x<-0
while(i<=1000){
    x[i]<-runif(1,min=5,max=200)
    i<-i+1
}

fun<-function(x){
    (exp(-x*p))*((x*p)**k)/factorial(k)
}

x.ord <- sort(x)

plot(x.ord, fun(x.ord), main="b(k=2;n=60:180,p=0.2)", col="red", type="l")

func<-function(x){
    (factorial(x)/(factorial(k)*factorial(x-k)))*(p**k)*(1-p)**(x-k)
}
lines(x.ord, func(x.ord), col="blue")

The graph of the binomial is cut around 170 because the R has a number problem for factorial above 170. It cannot calculate (overflows, laughs), which implies that the density value of the binomial cannot be calculated.

factorial(171)/(factorial(171-2)*factorial(2))
## Inf

One way to solve this is by using the function choose, which calculates the combination of two numbers:

choose(171, 2)
## [1] 14535

Another way to solve this is to use the function lfactorial, that calculates the logarithm of the factorial. Thus, overflow is avoided. Just apply some properties of logarithm and the desired result comes out:

exp(lfactorial(171) - (lfactorial(171-2)+lfactorial(2)))
## [1] 14535

by Rui Barradas • **15,422** points · Answer 2 · 2020-06-05T12:11:16+00:00

The problem

As Marcus Nunes says in his reply, the problem comes from calculating the mass probability function of the binomial distribution. Factorials grow very fast and an overflow condition occurs.

The problem of calculating the function dbinom is a very difficult problem. R code is an implementation of

Catherine Loader (2000). Fast and Accurate Computation of Binomial Probabilities.

See a recent thread of r-devel on the calculation of dbinom:

thread start,
current link for the article of C. Loader.

Later I will calculate the values of func with dbinom. This new function will be called func2.

The question

The code to simulate uniform variables can be greatly simplified. Just see that in R many functions are vectorized, and one of them is runif. Any of the three modes of generating the vector x (or x2 or y) which gives identical results.

set.seed(2020)
i <- 1
x <- 0
while(i <= 1000){
  x[i] <- runif(1, min = 5, max = 200)
  i <- i + 1
}

set.seed(2020)
x2 <- numeric(1000)
for(i in seq.int(1000)){
  x2[i] <- runif(1, min = 5, max = 200)
}

set.seed(2020)
y <- runif(1000, min = 5, max = 200)

identical(x, x2)    #[1] TRUE
identical(x, y)     #[1] TRUE

In addition, the support sets of binomial and Poisson distributions are sets of integer numbers. It makes no sense to use the runif.

fun <- function(x, p){
  (exp(-x*p))*((x*p)**k)/factorial(k)
}

func <- function(x, p){
  choose(x, k)*(p**k)*(1 - p)**(x-k)
}

n <- 1000
k <- 2
p <- 0.0001

set.seed(2020)
x <- sample(n, n, TRUE)
x.ord <- sort(x)
plot(x.ord, fun(x.ord, p), main="b(k=2;n=60:180,p=0.2)", col="red", type="l")
lines(x.ord, func(x.ord, p), col="blue", lty = "dotted")

Now with dbinom.

func2 <- function(x, p) dbinom(k, size = x, prob = p)

plot(x.ord, fun(x.ord, p), main="b(k=2;n=60:180,p=0.2)", col="red", type="l")
lines(x.ord, func2(x.ord, p), col="blue", lty = "dotted")