# 4. Fitting curves with the linear model#

require(s20x)

Hide code cell output
载入需要的程辑包：s20x


## 4.1. Identifying a curved relationship 初步探究曲线关系#

## Load the s20x library into our R session
library(s20x)
## Importing data into R
## Examine the data
plot(Exam ~ Assign, data = Stats20x.df) Hmmm, not quite a straight line – could be some curvature. Maybe will paint a clearer picture. 不是一条很直的线--可能是一些曲率。也许会描绘出一幅更清晰的图景。

trendscatter(Exam ~ Assign, data = Stats20x.df) Let’s fit a simple linear model to these data and see if it works out or not.

examassign.fit <- lm(Exam ~ Assign, data = Stats20x.df)
plot(examassign.fit, which = 1) The assumption of identical distribution with expected value of 0 looks to be questionable here. There tend to be more negative residuals in the middle, but more positive residuals at the extremes of the fitted values. Potential solution – add a quadratic (squared term) for.

## 4.2. Fitting a quadratic model 拟合二次模型#

The standard notation for a quadratic curve is:

$y = a x^2 + b x + c$

Here we will use different notation: $$\beta_0 = c$$, $$\beta_1 = b$$ and $$\beta_2 = a$$ and use the quadratic curve to describe the expected value of our dependent variable $$y$$. That is, we will use the following notation:

$E[Y|x] = \beta_0 + \beta_1 x + \beta_2 x^2$

If $$\beta_2 > 0$$, then the quadratic has slope that increases with increasing x(斜率随着 x 增大而增大). If $$\beta_2 < 0$$, then the quadratic has slope that decreases with increasing x. If $$\beta_2 = 0$$, then the quadratic(该“二次曲线”) has a constant slope(倾斜直线的外观).

examassign.fit2 <- lm(Exam ~ Assign + I(Assign^2), data = Stats20x.df)
plot(examassign.fit2, which = 1) That is looking much better.

normcheck(examassign.fit2)
cooks20x(examassign.fit2)  plot(Exam ~ Assign, data = Stats20x.df)
x <- 0:20 # Assignment values at which to predict exam mark
## Plot model 1
lines(x, predict(examassign.fit, data.frame(Assign = x)), col = "red")
## Plot model 2
lines(x, predict(examassign.fit2, data.frame(Assign = x)), col = "blue") summary(examassign.fit2)

Call:
lm(formula = Exam ~ Assign + I(Assign^2), data = Stats20x.df)

Residuals:
Min      1Q  Median      3Q     Max
-32.541  -9.149   1.273   9.087  41.116

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.41396    5.99081   4.743 5.05e-06 ***
Assign      -0.68172    1.07242  -0.636 0.525999
I(Assign^2)  0.16102    0.04545   3.542 0.000536 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.65 on 143 degrees of freedom
Multiple R-squared:  0.5477,	Adjusted R-squared:  0.5414
F-statistic: 86.59 on 2 and 143 DF,  p-value: < 2.2e-16


Note that the coefficient $$β_2 > 0$$ associated with the term $$I(Assign)^2$$ indicates an increase that starts slowly and ‘accelerates’(加速) as Assign increases.