How to make a prediction interval for a restricted group?

Question

How to make a prediction interval for a restricted group?

Asked 6 years, 5 months ago

Viewed 50 times

4

Considering the model with only these two explanatory variables, indicate a 95% prediction interval for an individual in X1 first quartile (1st Qu) and X2 second quartile.

I know the generic code, but I can’t narrow it down to the requested group, the code I used:pr.p <- predict(model,interval="prediction",level=0.95)

Ex of the bank:

glucose	insulin	FIDADE
89	94	1
78	88	1
118	230	1
126	235	1
97	140	1
158	245	1
88	54	1
145	130	2
126	22	2
187	392	2
130	79	2
187	200	2
128	110	2
166	175	3
143	146	3
150	342	3
136	110	3
134	60	4
173	265	4
195	145	4
145	165	4

I appreciate any help!!

1 answer

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Rui Barradas • **15,422** points · Answer 1 · 2019-01-05T13:32:12+00:00

To predict using the model adjusted with lm, have a dataframe with the regressive variables at the points you want. The code below creates a sub-df with the lines where insulin is in the 1st quartile and FIDADE is of the category 2.

Assuming the adjusted model is this:

model <- lm(glucose ~ insulin + FIDADE, data = dados)

One can obtain a prediction interval with:

qq <- quantile(dados$insulin, probs = 0.25)
i1 <- with(dados, qq >= insulin)
i2 <- with(dados, FIDADE == 2)
new <- dados[i1 & i2, c("insulin", "FIDADE")]
predict(model, newdata = new, interval = "prediction", level = 0.95)
#        fit     lwr      upr
#9  108.6813 60.2474 157.1153
#11 118.9752 72.0415 165.9090

Editing.

Given the request in the commentary to simulate the 20% increase in the amplitude of the insulin variable, the only problem seems to be the creation of a data set with a 20% higher insulin amplitude in each category. (At least that’s what I think makes the most sense.)

rng <- with(dados, tapply(insulin, FIDADE, FUN = range))
rng <- lapply(rng, function(r){
  d <- diff(r)
  c(max(r) - 1.1*d, min(r + 1.1*d))
})
tmp <- unlist(lapply(names(rng), function(n) rep(as.integer(n), length(rng[[n]]))))
nova_ampl <- data.frame(insulin = unlist(rng), FIDADE = tmp)
rm(rng, tmp)

Now just pass this dataframe into the argument newdata.

predict(model, newdata = nova_ampl, interval = "prediction", level = 0.95)
#         fit       lwr      upr
#11  94.76547  45.69869 143.8323
#12 136.15787  87.45688 184.8589
#21 101.99931  52.22080 151.7778
#22 182.18353 128.06123 236.3058
#31 136.62942  89.30538 183.9535
#32 186.90710 135.84374 237.9705
#41 144.33280  93.69015 194.9755
#42 188.75920 138.68448 238.8339

Data in format dput.

dados <-
structure(list(glucose = c(89L, 78L, 118L, 126L, 97L, 
158L, 88L, 145L, 126L, 187L, 130L, 187L, 128L, 166L, 
143L, 150L, 136L, 134L, 173L, 195L, 145L), 
insulin = c(94L, 88L, 230L, 235L, 140L, 245L, 
54L, 130L, 22L, 392L, 79L, 200L, 110L, 175L, 146L, 
342L, 110L, 60L, 265L, 145L, 165L), 
FIDADE = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L)), 
class = "data.frame", row.names = c(NA, -21L))