Use of Seed in R

Asked

Viewed 5,826 times

2

About the Seed, I don’t understand when to use: seed(1), seed(123), seed(12345). What would be the difference between them?

  • 2
    1. No difference, can use any of them. 2) All the difference! This serves to initialize the pseudo-random number generator, whenever you want to repeat the results of the program, just use the same value of seed.

2 answers

4

The function set.seed is used to reproduce the results of pseudo-random number generators (RNG). This is important to have data analysis results in which Rngs generators are used.
For example, when you run simulations. Or when you want to adjust a classification model and need two subsets of the data, one to train the model and one to test it.

Seed value is not important as long as it is consistent in its use. What is truly important is that what the programme does can be reproduced faithfully.

See the following examples.
First I’ll create two vectors, x and y. To do it I’ll use exactly the same instructions. But the results are different.

n <- 100

set.seed(12)

x <- rnorm(n)
mean(x)
#[1] -0.03116866

y <- rnorm(n)
mean(y)
#[1] 0.009693236

Now I will restore the state of the normal number generator as it was just before I created the vector x.

set.seed(12)

z <- rnorm(n)
mean(z)
#[1] -0.03116866

identical(mean(x), mean(z))
#[1] TRUE

It’s not just the means that are identical, it’s the vectors themselves x and z.

identical(x, z)
[1] TRUE

Now an example of a resampling technique, the bootstrap. The following example is the first example of help('boot').

library(boot)

ratio <- function(d, w) sum(d$x * w)/sum(d$u * w)

b1 <- boot(city, ratio, R = 999, stype = "w")
b2 <- boot(city, ratio, R = 999, stype = "w")

mean(b1$t)
#[1] 1.562257

mean(b2$t)
#[1] 1.560816

The values are different.
Now make results reproducible.

set.seed(1234)
b3 <- boot(city, ratio, R = 999, stype = "w")
set.seed(1234)
b4 <- boot(city, ratio, R = 999, stype = "w")

identical(mean(b3$t), mean(b4$t))
#[1] TRUE

Again it is not only statistics that are equal, it is objects created.

identical(b3, b4)
#[1] TRUE

Finally, once again note that both can use 12, 123, 2319 or any other value. But if you use a value always use that same value every time you run the same analysis or simulation program or any other that calls the Rngs generators.

2


The difference in using different numbers on set.seed() is basically that each time you use a different number in parentheses will be generated a different random number.

As the function of set.seed() is to generate random numbers, the value used would be a way to ensure that the same random number is used later, for example:

If you use the command rnorm() to generate 10 randomly sampled values of a normal distribution can be:

>rnorm(10)
 [1]  1.2240818  0.3598138  0.4007715  0.1106827 -0.5558411  1.7869131  0.4978505
 [8] -1.9666172  0.7013559 -0.4727914

repeating the same command the values could be:

> rnorm(10)
 [1] -1.0678237 -0.2179749 -1.0260044 -0.7288912 -0.6250393 -1.6866933  0.8377870
 [8]  0.1533731 -1.1381369  1.2538149

therefore, different for the same function. On the stand, if you want to start from the same random number, you can use the function set.seed():

> set.seed(123); rnorm(10)
 [1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499  0.46091621
 [8] -1.26506123 -0.68685285 -0.44566197

repeating the same command with the set.seed() and the same numbers within parentheses:

> set.seed(123); rnorm(10)
 [1] -0.56047565 -0.23017749  1.55870831  0.07050839  0.12928774  1.71506499  0.46091621
 [8] -1.26506123 -0.68685285 -0.44566197

therefore equal, because the same random number has been used for the function rnorm().

If you used the same function and "forgot" to put 3 in the command set.seed(123), could have :

> set.seed(12); rnorm(10)
 [1] -1.4805676  1.5771695 -0.9567445 -0.9200052 -1.9976421 -0.2722960 -0.3153487
 [8] -0.6282552 -0.1064639  0.4280148

that would be completely different.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.