5
Gurus do R,
I have the following data frame (Df) that establishes the relationship between variables X and Y:
X Y
1 25 2457524
2 25 2391693
3 25 2450828
4 25 2391252
5 25 2444638
6 25 2360293
7 50 4693194
8 50 4844527
9 50 4835596
10 50 4878092
11 50 4809226
12 50 4722253
13 75 7142763
14 75 7182769
15 75 7135550
16 75 7173920
17 75 7216871
18 75 7076359
19 100 9496553
20 100 9537788
21 100 9405825
22 100 9439201
23 100 9609870
24 100 9707734
25 125 12031958
26 125 12027037
27 125 11935594
28 125 11930086
29 125 12154132
30 125 12096462
31 150 14298064
32 150 14396607
33 150 13964716
34 150 14221039
35 150 14283992
36 150 14042220
(Note that we have 7 levels for variable X with 6 points in each level)
If we adjust a polynomial model of 2nd degree for these data we obtain the following model:
Model<-lm(formula = Y ~ X + I(X^2))
print(Model)
Call:
lm(formula = Y ~ X + I(X^2))
Coefficients:
(Intercept) X I(X^2)
-26588.12 97310.61 -14.02
The graphic representation of this model, which looks more like a straight line, is as follows::
If we want to use the model to predict the values of "Y" from the values of the variable "X" just run this line of code:
>predicted.intervals <- predict(Model,data.frame(x=X),interval='confidence',
+ level=0.95)
>predicted.intervals
fit lwr upr
1 2397413 2315346 2479481
2 2397413 2315346 2479481
3 2397413 2315346 2479481
4 2397413 2315346 2479481
5 2397413 2315346 2479481
6 2397413 2315346 2479481
7 4803887 4753705 4854070
8 4803887 4753705 4854070
9 4803887 4753705 4854070
10 4803887 4753705 4854070
11 4803887 4753705 4854070
12 4803887 4753705 4854070
13 7192834 7137649 7248019
14 7192834 7137649 7248019
15 7192834 7137649 7248019
16 7192834 7137649 7248019
17 7192834 7137649 7248019
18 7192834 7137649 7248019
19 9564252 9509067 9619438
20 9564252 9509067 9619438
21 9564252 9509067 9619438
22 9564252 9509067 9619438
23 9564252 9509067 9619438
24 9564252 9509067 9619438
25 11918144 11867961 11968326
26 11918144 11867961 11968326
27 11918144 11867961 11968326
28 11918144 11867961 11968326
29 11918144 11867961 11968326
30 11918144 11867961 11968326
31 14254507 14172440 14336574
32 14254507 14172440 14336574
33 14254507 14172440 14336574
34 14254507 14172440 14336574
35 14254507 14172440 14336574
36 14254507 14172440 14336574
The question that won’t shut up:
What would be the line(s) of code to do the inverse prediction, that is, in this model, predict "X" from the data of the variable "Y"? Searching on google I have tried several packages and specific functions but unfortunately I was not successful (perhaps due to lack of familiarity with the functions). Could any of you help me unravel this mystery? Big hug to all.
Is there any special reason you want to do this? Because you don’t fit a model
lm(X ~Y + I(Y^2))
and provides forX
fromy
directly? Statistically, doing what you want is strange because X is considered an observed variable, would not have pq you want to predict it...– Daniel Falbel
In addition to what Daniel said, I’d still be running
summary(Model)
, which is a statistically more interesting answer thanprint(Model)
. With it you will test the chances of the three coefficients of your model being equal to zero. Given the graph I am seeing, I bet that the quadratic term is not significant (i.e., p-value > 0.05). That is, you have data that follows a linear model, not quadratic.– Marcus Nunes
Yes there is !!! In short, I am working with a special class of regression models called "calibration". These models are little known in academia. For these models the prediction is inverse, that is, first you build the model and then determine the X.
– Weidson C. de Souza
Hello Marcus!!! It’s always a pleasure to see you here... My problem isn’t just about seeing which variables are important to the model. This I did at the beginning of the analysis!!! Now I need to predict the value of "X". I have already found that R has a function that does this called "Invest" of the "invest" package. But I’m not getting it right for the model in question.
– Weidson C. de Souza
Good night Marcus! The evaluation of the significance of the terms has already been done with the command 'Summary (Model)' and the p-value was not significant for the quadratic term (p=0.237). On the other hand, due to the "6" repetitions for each level of X, the lack of adjustment test indicated the lack of adjustment (p-value of 0.0232) for the first-order polynomial model. But this is not the focus of the problem.
– Weidson C. de Souza
What is really relevant is: learning to estimate the values of "X" from given values of "Y" in second-order polynomial models. I have already verified that there is a package of R ('investr') that does this inverse calculation. However, so far I am not succeeding with this data. Maybe some member of this group can come up with an elegant solution to this kind of problem.
– Weidson C. de Souza