Rui Barradas already gave a good answer on how to calculate the confidence interval, this is to understand what was doing wrong. Because it is not only a programming error, but understanding of the IC.
The error message
The error that R is returning comes from using a negative value for the standard deviation in the function rnorm
:
> rnorm(5, -.4, .1)
[1] -0.4070113 -0.3860859 -0.3772370 -0.4746763 -0.3817434
> rnorm(5, -.4, -.1)
[1] NaN NaN NaN NaN NaN
Warning message:
In rnorm(5, -0.4, -0.1) : NAs produced
This is because standard deviation does not take negative values. But why are you using rnorm
to calculate the limits? Why use the absolute value of the average?
Calculation of the CI
There is a reason that basic R packages do not have a function for computing IC. It is a measure of how reliable the estimate of a parameter is, how it is calculated depends on the parameter and the model assigned to the data.
Your formula is correct for mu estimated small sample (or large variance) following close to normal distribution. Compare with the result of a test t:
ic <- media + qt(c(0.025, 0.975), df = n-1) * sqrt(var/n)
> ic
[1] -0.8479307 -0.4110068
> t.test(dados$y)$conf.int
[1] -0.8479307 -0.4110068
attr(,"conf.level")
[1] 0.95
But are these good assumptions for your data? Contrast with the result of the @Rui-Barradas function. And compare with the result using different methods with the average estimated by bootstrap:
library(boot)
media.b <- boot(dados$y, function(x,i) mean(x[i]), 10000)
> boot.ci(media.b, conf = .95, type = c("norm","basic", "perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = media.b, conf = 0.95, type = c("norm", "basic",
"perc", "bca"))
Intervals :
Level Normal Basic
95% (-0.8366, -0.4206 ) (-0.8527, -0.4365 )
Level Percentile BCa
95% (-0.8224, -0.4063 ) (-0.8041, -0.3735 )
Calculations and Intervals on Original Scale
Adding limits to the data
Once you have calculated the range, simply add/subtract the values of y:
df$lim_inf <- df$y - ic[1]
df$lim_sup <- df$y + ic[2]
But this is wrong. The confidence interval applies to the estimated parameter (the mean, in this case), not to the individual values. A more appropriate option is to use the standard error:
ep <- sqrt(var/n)
df$lim_inf <- df$y - ep
df$lim_sup <- df$y + ep
I can’t understand what’s going on here. For example, why calculate
media<-abs(mean(df$y))
? Is there any restriction to the mean being non-negative? Because, without calculating its absolute value, its value is negative.– Marcus Nunes
rnorm(nrow(df), media/-0.832, media/-0.399)
: how the standard deviation can be negative?– Rui Barradas