- Create the scatter plot of bac vs. beers.
setwd("c:/workspace")
> df1 <- read.csv("beer-bac.txt")
> df1
beers bac
1 5 0.100
2 2 0.030
3 9 0.190
4 8 0.120
5 3 0.040
6 7 0.095
7 3 0.070
8 5 0.060
9 3 0.020
10 5 0.050
11 4 0.070
12 6 0.100
13 5 0.085
14 7 0.090
15 1 0.010
16 4 0.050
> plot(df1$beers, df1$bac, xlab="Number of Beers",
+ ylab="Blood Alcohol concentration")
The scatterplot shows the positive linear relationship between the dependent and the
independent variable. It also shows that the data form an ellipse-shaped bivariate normal point cloud.
- Find the linear regression equation for predicting bac from beers. Answer:
> model1 <- lm(bac ~ beers, data=df1)
> summary(model1)
Call:
lm(formula = bac ~ beers, data = df1)
Residuals:
Min 1Q Median 3Q Max
-0.027118 -0.017350 0.001773 0.008623 0.041027
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.012701 0.012638 -1.005 0.332
beers 0.017964 0.002402 7.480 2.97e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02044 on 14 degrees of freedom
Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855
F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06
The regression equation is
bac = 0.017964 * beers - 0.012701
- Find the R-squared value for this equation. Interpret it. Answer:
Multiple R-squared: 0.7998. This means that about 80% of the variability of the
dependent variable (bac) is due to the variability of the independent variable (beers).
- Create the boxplot of the residuals.
> r <- resid(model1)
> boxplot(r, xlab="Residuals", main="Boxplot of Residuals")
The boxplot shows no outliers.
- Create the scatterplot of the residuals vs. the predicted values. Interpret it.
Answer:
> p <- predict(model1)
> plot(p, r, xlab="Predicted Values", ylab="Residuals",
+ main="Residual Plot")
The residuals are fairly unbiased and homoscedastic.
- Create the normal plot of the residuals. Interpret it. Answer:
> qqline(r)
> qqline(r, col="red")
The residuals in the normal plot are fairly close to a straight line, so they
are approximately normally distributed.
- If y = ax + b. Perform a t-test that tests the null hypothesis that the true value
of the slope a is 0.
Answer: Look at the p-value for
testing that the slope coefficient a is zero.
Because the corresponding p-value is very small, and certainly less than 0.05,
we reject the null hypothesis that a = 0.
- If y = ax + b. Perform a t-test that tests the null hypothesis that the true value
of the intercept b is 0.
Answer: the p-value for
testing whether b is 0, is greater than 0.05 so we do not have anough evidence
to concluse that b = 0.
- For this example, if the number of beers
consumed is 4, what is the predicted
blood alcohol level?
Answer: substitute 4 for the variable beers in the
regression equation.
bac = 0.017964 *
beers - 0.012701 = 0.017964 * 4 - 0.012701 = 0.059155.