Poorly calculated coefficients in Linear Regression in R due to NA s

Question

Poorly calculated coefficients in Linear Regression in R due to NA s

Asked 6 years, 11 months ago

Viewed 47 times

3

This is my dataframe:

structure(list(Year = c(1979L, 1979L, 1979L, 1980L, 1980L, 1980L, 
1981L, 1981L, 1981L, 1982L, 1982L, 1982L, 1983L, 1983L, 1983L, 
1984L, 1984L, 1984L, 1985L, 1985L, 1985L, 1986L, 1986L, 1986L, 
1987L, 1987L, 1987L, 1988L, 1988L, 1988L), Month = c(10L, 11L, 
12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 
10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 11L, 12L, 10L, 
11L, 12L), Y.1 = c(8.00983263923528, 2.41267858341867, -0.701122343112104, 
-3.93438481559836, 1.61989462202274, -0.0837521649979607, -1.18856075379809, 
-5.79109166398385, -6.02656788564288, 3.57285443621284, 5.28086890954826, 
4.61968948421691, 1.6450358083769, 2.09679639676383, 3.13330926488653, 
7.03433470051535, 8.82984898471047, 6.35665464823924, -2.06916023327692, 
-6.80818412035661, -2.55840141236052, 5.93892137387166, 3.73139295521127, 
-2.43756307587375, -7.88332536927916, -11.1612368255376, -14.9073451470428, 
-3.39210451580797, -9.45264055248482, -6.71777033430725), X.1 = c(0.308656857874223, 
1.04586629806642, 0.861945545932596, 0.375970358978561, -0.347308458564966, 
-0.29159098146565, 0.658969566870815, 0.777325096646653, 0.819638059706351, 
0.14348380776068, 0.320980128297688, 0.422457840273038, 0.0753279027397413, 
-0.00412826834750302, -0.0306969460488249, 0.202590024491522, 
0.144588970489035, 0.299274727728394, 0.924086583854944, 0.903017497665926, 
0.964001122879932, 1.26678884737668, 1.24568369535494, 1.17738738727233, 
0.855877205956479, 0.778924677659654, 0.601219806786069, 0.967781164852632, 
1.10343758488876, 1.02401236754546), Y.2 = c("NA", "NA", "NA", 
"5.33565675549722", "-0.477469962261498", "0.743881752912509", 
"0.946947439972276", "5.26357788348063", "6.20317011981397", 
"-3.44416166730468", "-4.98209173294852", "-4.17799392953961", 
"-1.60319913629998", "-2.07841411022162", "-3.07277915798255", 
"-6.81314462908097", "-8.99190729955144", "-6.41231440381122", 
"2.93695557772259", "7.71262044640592", "3.48797284502131", "-5.06072963216373", 
"-2.74288427337241", "3.50049327959275", "8.56226731314113", 
"12.0144762810381", "15.6527185635863", "4.17084966096979", "10.4311905060596", 
"7.6861205071862"), X.2 = c(0.288003451, 0.873662015, 0.874190316, 
0.36027826, -0.120926336, -0.276130722, 0.633675698, 0.849582846, 
0.778756432, 0.20203225, 0.221280623, 0.467109312, 0.07783831, 
-0.008749708, -0.023401276, 0.196393036, 0.18439037, 0.294919158, 
0.908446718, 0.922729322, 0.962361556, 0.74, 0.74, 0.77, 2.36, 
2.79, 1.76, 1.26, 1.48, 1.21)), class = "data.frame", row.names = c(NA, 
-30L))

When I run the following equation, because there are some NA s R makes the adjustment to delete the first 3 lines of Y.1 and the first 3 lines of X.1. He was supposed to delete the last three lines of X.1:

summary(volcker.ini %>% lm(Y.1~X.1,data = .))

How can I make this adjustment in the above code?

1 answer

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Rui Barradas • **15,422** points · Answer 1 · 2018-08-30T21:10:44+00:00

There must be something non-standard with your R session.
How to read in help("lm"), in section Arguments (my emphasis):

in action.

a Function which indicates what should happen when the data contain Nas. The default is set by the na.action Setting of options, and is na.fail if that is unset. The ːFactory-Fresh' default is na.omit. Another possible value is NULL, no action. Value na.exclude can be Useful.

This means that the command lm will omit the values NA unless you change the value of options()$na.action. This value can be checked with

options()$na.action
#[1] "na.omit"

If you happen to give something else, just run the following command.

options(na.action = "na.omit")

In my system that’s the value, I never modify it. And when I ran your code it was all right.

library(dplyr)

summary(volcker.ini %>% lm(Y.1 ~ X.1,data = .))
#
#Call:
#lm(formula = Y.1 ~ X.1, data = .)
#
#Residuals:
#     Min       1Q   Median       3Q      Max 
#-14.1342  -4.0814   0.0258   4.5236  10.2769 
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)  
#(Intercept)    2.447      1.675   1.461   0.1552  
#X.1           -5.356      2.259  -2.371   0.0249 *
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 5.613 on 28 degrees of freedom
#Multiple R-squared:  0.1672,   Adjusted R-squared:  0.1375 
#F-statistic: 5.621 on 1 and 28 DF,  p-value: 0.02486

The quote above says that na.exclude may be useful. See its page help("na.exclude"), and if you find it useful, the modified code will be

summary(volcker.ini %>% lm(Y.1 ~ X.1,data = ., na.action = na.exclude))

And by the way, why not divide this instruction in two, one to assign the value of lm and another to the summary?

modelo <- volcker.ini %>% lm(Y.1 ~ X.1,data = ., na.action = na.exclude)
summary(modelo)

Later you may want coef(modelo) or other values such as waste.

Finally, so it doesn’t happen again, see if you have a file called .RData (this is not an extension, it is the full name of the file) and if you have removed it.