2
About the Seed, I don’t understand when to use: seed(1)
, seed(123)
, seed(12345)
. What would be the difference between them?
2
About the Seed, I don’t understand when to use: seed(1)
, seed(123)
, seed(12345)
. What would be the difference between them?
4
The function set.seed
is used to reproduce the results of pseudo-random number generators (RNG). This is important to have data analysis results in which Rngs generators are used.
For example, when you run simulations. Or when you want to adjust a classification model and need two subsets of the data, one to train the model and one to test it.
Seed value is not important as long as it is consistent in its use. What is truly important is that what the programme does can be reproduced faithfully.
See the following examples.
First I’ll create two vectors, x
and y
. To do it I’ll use exactly the same instructions. But the results are different.
n <- 100
set.seed(12)
x <- rnorm(n)
mean(x)
#[1] -0.03116866
y <- rnorm(n)
mean(y)
#[1] 0.009693236
Now I will restore the state of the normal number generator as it was just before I created the vector x
.
set.seed(12)
z <- rnorm(n)
mean(z)
#[1] -0.03116866
identical(mean(x), mean(z))
#[1] TRUE
It’s not just the means that are identical, it’s the vectors themselves x
and z
.
identical(x, z)
[1] TRUE
Now an example of a resampling technique, the bootstrap. The following example is the first example of help('boot')
.
library(boot)
ratio <- function(d, w) sum(d$x * w)/sum(d$u * w)
b1 <- boot(city, ratio, R = 999, stype = "w")
b2 <- boot(city, ratio, R = 999, stype = "w")
mean(b1$t)
#[1] 1.562257
mean(b2$t)
#[1] 1.560816
The values are different.
Now make results reproducible.
set.seed(1234)
b3 <- boot(city, ratio, R = 999, stype = "w")
set.seed(1234)
b4 <- boot(city, ratio, R = 999, stype = "w")
identical(mean(b3$t), mean(b4$t))
#[1] TRUE
Again it is not only statistics that are equal, it is objects created.
identical(b3, b4)
#[1] TRUE
Finally, once again note that both can use 12
, 123
, 2319
or any other value. But if you use a value always use that same value every time you run the same analysis or simulation program or any other that calls the Rngs generators.
2
The difference in using different numbers on set.seed()
is basically that each time you use a different number in parentheses will be generated a different random number.
As the function of set.seed()
is to generate random numbers, the value used would be a way to ensure that the same random number is used later, for example:
If you use the command rnorm()
to generate 10 randomly sampled values of a normal distribution can be:
>rnorm(10)
[1] 1.2240818 0.3598138 0.4007715 0.1106827 -0.5558411 1.7869131 0.4978505
[8] -1.9666172 0.7013559 -0.4727914
repeating the same command the values could be:
> rnorm(10)
[1] -1.0678237 -0.2179749 -1.0260044 -0.7288912 -0.6250393 -1.6866933 0.8377870
[8] 0.1533731 -1.1381369 1.2538149
therefore, different for the same function. On the stand, if you want to start from the same random number, you can use the function set.seed()
:
> set.seed(123); rnorm(10)
[1] -0.56047565 -0.23017749 1.55870831 0.07050839 0.12928774 1.71506499 0.46091621
[8] -1.26506123 -0.68685285 -0.44566197
repeating the same command with the set.seed()
and the same numbers within parentheses:
> set.seed(123); rnorm(10)
[1] -0.56047565 -0.23017749 1.55870831 0.07050839 0.12928774 1.71506499 0.46091621
[8] -1.26506123 -0.68685285 -0.44566197
therefore equal, because the same random number has been used for the function rnorm()
.
If you used the same function and "forgot" to put 3 in the command set.seed(123)
, could have :
> set.seed(12); rnorm(10)
[1] -1.4805676 1.5771695 -0.9567445 -0.9200052 -1.9976421 -0.2722960 -0.3153487
[8] -0.6282552 -0.1064639 0.4280148
that would be completely different.
Browser other questions tagged r
You are not signed in. Login or sign up in order to post.
seed
.– Rui Barradas