Ex1.19

(a)

setwd("C:/Users/user/Desktop/政大/TA/迴歸分析TA/講義/111-2/0410")
GPA <- read.table("Grade_Point_Average.txt")
head(GPA)
     V1 V2
1 3.897 21
2 3.885 14
3 3.778 28
4 2.540 22
5 3.028 21
6 3.865 31
colnames(GPA) <- c("Y", "X")
head(GPA)
      Y  X
1 3.897 21
2 3.885 14
3 3.778 28
4 2.540 22
5 3.028 21
6 3.865 31
lm(Y ~ X, data = GPA)

Call:
lm(formula = Y ~ X, data = GPA)

Coefficients:
(Intercept)            X  
    2.11405      0.03883  

The least squares estimates of \(\beta_0\) is 2.11405 and \(\beta_1\) is 0.03883.

The estimated regression function is \(\hat{Y} = 2.11405 + 0.03883 X\).

(b)

attach(GPA)
plot(X, Y)
abline(lm(Y ~ X, data = GPA), col = 'blue', lwd = 3)

From the above picture, the estimated regression function appear to fit the data well.

(c)

hat_y <- 2.11405 + 0.03883 * 30
hat_y
[1] 3.27895
GPA.lm <- lm(Y ~ X, data = GPA)
predict(GPA.lm, data.frame(X = 30))
       1 
3.278863 

The point estimate of the mean freshman GPA for students with ACT test score \(X = 30\) is 3.27895.

(d)

The point estimate of the change in the mean response when the entrance test score increases by one point is \(\beta_1 = 0.03883\).

Ex2.23

(a)

GPA.lm <- lm(Y ~ X, data = GPA)
summary.aov(GPA.lm)
             Df Sum Sq Mean Sq F value  Pr(>F)   
X             1   3.59   3.588    9.24 0.00292 **
Residuals   118  45.82   0.388                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(b)

Hypothesis Test:

\(H_0: \beta_1 = 0\)

\(H_1: \beta_1 \neq 0\)

Significance level: \(\alpha = 0.01\)

Decision: From (a), the p-value of \(F\) test is \(0.00292 < \alpha = 0.01\), reject \(H_0\).

Conclusion: \(X\) and \(Y\) has a linear association.

(c)

GPA.lm <- lm(Y ~ X, data = GPA)
summary(GPA.lm)

Call:
lm(formula = Y ~ X, data = GPA)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.74004 -0.33827  0.04062  0.44064  1.22737 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.11405    0.32089   6.588  1.3e-09 ***
X            0.03883    0.01277   3.040  0.00292 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6231 on 118 degrees of freedom
Multiple R-squared:  0.07262,   Adjusted R-squared:  0.06476 
F-statistic:  9.24 on 1 and 118 DF,  p-value: 0.002917

The absolute magnitude of the reduction in the variation of Y when X is introduced into the regression model is 3.588 which is obtained by SSR.

Ex3.16

(a)

concentration <- read.table("Solution_concentration.csv", sep = ",")
# concentration <- read.csv("Solution_concentration.csv", header = F)
colnames(concentration) <- c("Y", "X")
head(concentration)
     Y X
1 0.07 9
2 0.09 9
3 0.08 9
4 0.16 7
5 0.17 7
6 0.21 7
attach(concentration)
plot(X, Y)

concentration.lm <- lm(Y ~ X, data = concentration)
summary(concentration.lm)

Call:
lm(formula = Y ~ X, data = concentration)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.5333 -0.4043 -0.1373  0.4157  0.8487 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.5753     0.2487  10.354 1.20e-07 ***
X            -0.3240     0.0433  -7.483 4.61e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4743 on 13 degrees of freedom
Multiple R-squared:  0.8116,    Adjusted R-squared:  0.7971 
F-statistic: 55.99 on 1 and 13 DF,  p-value: 4.611e-06
par(mfrow = c(2, 2))
plot(concentration.lm)

From the output, we can see that unequal error variances and nonnormality of the error terms. To remedy these departures from the simple linear regression model, we need a transformation on Y, since the shapes and spreads of the distributions of Y need to be changed.

(b)

library(MASS)
library(ALSM)
boxcox(concentration.lm,lambda = seq(-2, 2, by = 0.1))

boxcox.sse(X, Y, l = seq(-2, 2, by = 1))

  lambda         SSE
1     -2 68.84280491
2     -1  3.16846767
5      0  0.03897303
3      1  2.92465333
4      2 64.15599383

From the output, we take \(Y' = \log_eY\).

(c)

concentration.log.lm <- lm(log(Y) ~ X, data = concentration)
summary(concentration.log.lm)

Call:
lm(formula = log(Y) ~ X, data = concentration)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.19102 -0.10228  0.01569  0.07716  0.19699 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.50792    0.06028   25.01 2.22e-12 ***
X           -0.44993    0.01049  -42.88 2.19e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.115 on 13 degrees of freedom
Multiple R-squared:  0.993, Adjusted R-squared:  0.9924 
F-statistic:  1838 on 1 and 13 DF,  p-value: 2.188e-15

The least squares estimates of \(\beta_0\) is 1.50792 and \(\beta_1\) is -0.44993.

The estimated regression function is \(\hat{Y}' = 1.50792 - 0.44993 X\).

(d)

par(mfrow = c(1, 1))
plot(X, log(Y), xlab = "X", ylab = "Transformed Y")
abline(concentration.log.lm)

From the results it was a perfect fit.

(e)

ei <- concentration.log.lm$residuals
yhat <- concentration.log.lm$fitted.values
par(mfrow = c(1, 2))
plot(ei, yhat, xlab = "Errors", ylab = "Fitted Values")
stdei <- rstandard(concentration.log.lm)
qqnorm(stdei, ylab = "Standardized Residuals", xlab = "Normal Scores", main = "QQ Plot") 
qqline(stdei, col = "steelblue", lwd = 2)

From the output, we see that error variances are constant, errors are approximately normally distributed.

(f)

Since \(\log_e(\hat{Y}) = 1.50792 - 0.44993 X\).

Hence \(\hat{Y} = \exp(1.50792 - 0.44993 X)\).