setwd("C:/Users/user/Desktop/政大/TA/迴歸分析TA/講義/111-2/0410")
GPA <- read.table("Grade_Point_Average.txt")
head(GPA)
V1 V2
1 3.897 21
2 3.885 14
3 3.778 28
4 2.540 22
5 3.028 21
6 3.865 31
colnames(GPA) <- c("Y", "X")
head(GPA)
Y X
1 3.897 21
2 3.885 14
3 3.778 28
4 2.540 22
5 3.028 21
6 3.865 31
lm(Y ~ X, data = GPA)
Call:
lm(formula = Y ~ X, data = GPA)
Coefficients:
(Intercept) X
2.11405 0.03883
The least squares estimates of \(\beta_0\) is 2.11405 and \(\beta_1\) is 0.03883.
The estimated regression function is \(\hat{Y} = 2.11405 + 0.03883 X\).
attach(GPA)
plot(X, Y)
abline(lm(Y ~ X, data = GPA), col = 'blue', lwd = 3)
From the above picture, the estimated regression function appear to fit the data well.
hat_y <- 2.11405 + 0.03883 * 30
hat_y
[1] 3.27895
GPA.lm <- lm(Y ~ X, data = GPA)
predict(GPA.lm, data.frame(X = 30))
1
3.278863
The point estimate of the mean freshman GPA for students with ACT test score \(X = 30\) is 3.27895.
The point estimate of the change in the mean response when the entrance test score increases by one point is \(\beta_1 = 0.03883\).
GPA.lm <- lm(Y ~ X, data = GPA)
summary.aov(GPA.lm)
Df Sum Sq Mean Sq F value Pr(>F)
X 1 3.59 3.588 9.24 0.00292 **
Residuals 118 45.82 0.388
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hypothesis Test:
\(H_0: \beta_1 = 0\)
\(H_1: \beta_1 \neq 0\)
Significance level: \(\alpha = 0.01\)
Decision: From (a), the p-value of \(F\) test is \(0.00292 < \alpha = 0.01\), reject \(H_0\).
Conclusion: \(X\) and \(Y\) has a linear association.
GPA.lm <- lm(Y ~ X, data = GPA)
summary(GPA.lm)
Call:
lm(formula = Y ~ X, data = GPA)
Residuals:
Min 1Q Median 3Q Max
-2.74004 -0.33827 0.04062 0.44064 1.22737
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.11405 0.32089 6.588 1.3e-09 ***
X 0.03883 0.01277 3.040 0.00292 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6231 on 118 degrees of freedom
Multiple R-squared: 0.07262, Adjusted R-squared: 0.06476
F-statistic: 9.24 on 1 and 118 DF, p-value: 0.002917
The absolute magnitude of the reduction in the variation of Y when X is introduced into the regression model is 3.588 which is obtained by SSR.
concentration <- read.table("Solution_concentration.csv", sep = ",")
# concentration <- read.csv("Solution_concentration.csv", header = F)
colnames(concentration) <- c("Y", "X")
head(concentration)
Y X
1 0.07 9
2 0.09 9
3 0.08 9
4 0.16 7
5 0.17 7
6 0.21 7
attach(concentration)
plot(X, Y)
concentration.lm <- lm(Y ~ X, data = concentration)
summary(concentration.lm)
Call:
lm(formula = Y ~ X, data = concentration)
Residuals:
Min 1Q Median 3Q Max
-0.5333 -0.4043 -0.1373 0.4157 0.8487
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.5753 0.2487 10.354 1.20e-07 ***
X -0.3240 0.0433 -7.483 4.61e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4743 on 13 degrees of freedom
Multiple R-squared: 0.8116, Adjusted R-squared: 0.7971
F-statistic: 55.99 on 1 and 13 DF, p-value: 4.611e-06
par(mfrow = c(2, 2))
plot(concentration.lm)
From the output, we can see that unequal error variances and nonnormality of the error terms. To remedy these departures from the simple linear regression model, we need a transformation on Y, since the shapes and spreads of the distributions of Y need to be changed.
library(MASS)
library(ALSM)
boxcox(concentration.lm,lambda = seq(-2, 2, by = 0.1))
boxcox.sse(X, Y, l = seq(-2, 2, by = 1))
lambda SSE
1 -2 68.84280491
2 -1 3.16846767
5 0 0.03897303
3 1 2.92465333
4 2 64.15599383
From the output, we take \(Y' = \log_eY\).
concentration.log.lm <- lm(log(Y) ~ X, data = concentration)
summary(concentration.log.lm)
Call:
lm(formula = log(Y) ~ X, data = concentration)
Residuals:
Min 1Q Median 3Q Max
-0.19102 -0.10228 0.01569 0.07716 0.19699
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.50792 0.06028 25.01 2.22e-12 ***
X -0.44993 0.01049 -42.88 2.19e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.115 on 13 degrees of freedom
Multiple R-squared: 0.993, Adjusted R-squared: 0.9924
F-statistic: 1838 on 1 and 13 DF, p-value: 2.188e-15
The least squares estimates of \(\beta_0\) is 1.50792 and \(\beta_1\) is -0.44993.
The estimated regression function is \(\hat{Y}' = 1.50792 - 0.44993 X\).
par(mfrow = c(1, 1))
plot(X, log(Y), xlab = "X", ylab = "Transformed Y")
abline(concentration.log.lm)
From the results it was a perfect fit.
ei <- concentration.log.lm$residuals
yhat <- concentration.log.lm$fitted.values
par(mfrow = c(1, 2))
plot(ei, yhat, xlab = "Errors", ylab = "Fitted Values")
stdei <- rstandard(concentration.log.lm)
qqnorm(stdei, ylab = "Standardized Residuals", xlab = "Normal Scores", main = "QQ Plot")
qqline(stdei, col = "steelblue", lwd = 2)
From the output, we see that error variances are constant, errors are approximately normally distributed.
Since \(\log_e(\hat{Y}) = 1.50792 - 0.44993 X\).
Hence \(\hat{Y} = \exp(1.50792 - 0.44993 X)\).