How to include a variable high to n in regression

Asked

Viewed 180 times

4

Suppose I have the following data

x<-rnorm(100,1,10000)
y<-rnorm(100,1,10000)+2*x+x^2

If I use the lm function as follows:

model1<-lm(y~x+x^2)

The R does not understand that it is to place between the independent variables the term x squared. It simply ignores the term and regresses the model as the code below:

model2<-lm(y~x)

3 answers

4

Another way to do regression is to use the Poly function

x<-rnorm(100,1,10000)
y<-rnorm(100,1,10000)+2*x+x^2
model1<-lm(y~poly(x,degree=2,raw=T))
  • Yes! I remembered I had a function for it, but I couldn’t remember what it was :(

3

Whenever you want to use a function of some variable, you can use the function I().

x<-rnorm(100,1,100)
y<-rnorm(100,0,10)+2*x+x^2

mod <- lm(y~x+I(x^2))

The advantage of using I() in relation to creating a new variable with the values of x^2 is that you do not need to specify the values of x^2 to make projections, simply inform x.

predict(mod, data.frame(x=1:3))
        1         2         3 
 2.211883  7.209663 14.207509 

3


Use model1 <- lm(y ~ x + I(x^2)).

The problem is that characters like +, -, * and ^ have specific meanings within a formula; the function I makes his expression (x^2) be taken literally, as potentiation.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.