To Notes

IT 223 -- Mar 11 , 2026

Review Exercises

  1. A company wants to evaluate a new training program for its sales team. They test 6 employees before the training and again after the training to see if there is a significant improvement in their scores. Here are the before and after scores:
    Before: 71 80 75 85 93 69
    After:  75 82 73 89 96 75
    
    What kind of t-test should you use? Conduct the test by hand using a calculator or R to do the arithmetic.  Then use the R t.test function to verify your calculations.
    Answer: Here are the five steps of the paired sample t-test:
    1. First compute the differences after - before:
      > setwd("c:/workspace")
      > before <- scan( )
      1: 71 80 75 85 93 69
      7:
      Read 6 items
      > after <- scan( )
      1: 75 82 73 89 96 75
      7:
      Read 6 items
      > diff <- after - before
      > diff
      [1] 4 2 -2 4 3 6
      > mean(diff)
      [1] 2.833333
      > sd(diff)
      [1] 2.71416
      
      State the null and alternative hypotheses:
            H0: diff = 0      H1: diff ≠ 0
    2. Compute the test statistic:
      t = (diff - μ) / SDdiff/√n = (2.833333 - 0) / (2.71416/√6) = 2.557042
    3. Write down a 0.95 confidence interval using the t-table. Use the column 0.025 upper tail probability and the row n - 1 = 6 - 1 = 5 degrees of freedom: I = (-2.570, -2.570).
    4. t ∈ I, so accept null hypotheses, we do not have enough evidence to reject the null hypotheses.
    5. Let R compute the p-value. Recall that if p < 0.05, reject H0; if p ≥ 0.05, accept H0. Here is the R output using the t.test function:
      > t.test(after, before, paired=TRUE)
      
              Paired t-test
      
      data:  after and before
      t = 2.557, df = 5, p-value = 0.05083
      alternative hypothesis: true mean difference is not equal to 0
      95 percent confidence interval:
       -0.01500332  5.68166999
      sample estimates:
      mean difference 
             2.833333 
      
      Notice that the t statistic and degrees of freedom match our calculations. Also, the p-value = 0.05083 > 0.05, so accept H0.
      Here is the output from the one-sample t-test using diff as the statistic:
      > t.test(diff, mu=0)
      
              One Sample t-test
      
      data:  diff
      t = 2.557, df = 5, p-value = 0.05083
      alternative hypothesis: true mean is not equal to 0
      95 percent confidence interval:
       -0.01500332  5.68166999
      sample estimates:
      mean of x 
       2.833333 
      
      We obtain the same result as we did with the paired 2-sample t-test.
  2. The blood alchohol level for a random sample of college students is tested after they drink a few beers. The data file beer-bac.txt contains two columns: (a) the number of beers (beers) consumed and (b) their blood alchohol levels (bac) after they drink the beers. Use R to obtain the regression model:
    > model1 <- lm(bac ~ beers, data=df1)
    
    Analyze the regression model and graphs of the resulting data.
    1. Create the scatter plot of bac vs. beers.
      setwd("c:/workspace")
      > df1 <- read.csv("beer-bac.txt")
      > df1
      beers bac
      1 5 0.100
      2 2 0.030
      3 9 0.190
      4 8 0.120
      5 3 0.040
      6 7 0.095
      7 3 0.070
      8 5 0.060
      9 3 0.020
      10 5 0.050
      11 4 0.070
      12 6 0.100
      13 5 0.085
      14 7 0.090
      15 1 0.010
      16 4 0.050
      > plot(df1$beers, df1$bac, xlab="Number of Beers", 
      + ylab="Blood Alcohol concentration")
      
      The scatterplot shows the positive linear relationship between the dependent and the independent variable. It also shows that the data form an ellipse-shaped bivariate normal point cloud.
    2. Find the linear regression equation for predicting bac from beers. Answer:
      > model1 <- lm(bac ~ beers, data=df1)
      > summary(model1)
      
      Call:
      lm(formula = bac ~ beers, data = df1)
      
      Residuals:
            Min        1Q   Median       3Q      Max 
      -0.027118 -0.017350 0.001773 0.008623 0.041027 
      
      Coefficients:
                   Estimate Std. Error  t value Pr(>|t|) 
      (Intercept) -0.012701   0.012638   -1.005    0.332 
      beers        0.017964   0.002402    7.480 2.97e-06 ***
      ---
      Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 0.02044 on 14 degrees of freedom
      Multiple R-squared: 0.7998, Adjusted R-squared: 0.7855 
      F-statistic: 55.94 on 1 and 14 DF, p-value: 2.969e-06
      
      The regression equation is
            bac = 0.017964 * beers - 0.012701
    3. Find the R-squared value for this equation. Interpret it. Answer:
      Multiple R-squared: 0.7998. This means that about 80% of the variability of the dependent variable (bac) is due to the variability of the independent variable (beers).
    4. Create the boxplot of the residuals.
      > r <- resid(model1)
      > boxplot(r, xlab="Residuals", main="Boxplot of Residuals")
      
      The boxplot shows no outliers.
    5. Create the scatterplot of the residuals vs. the predicted values. Interpret it. Answer:
      > p <- predict(model1)
      > plot(p, r, xlab="Predicted Values", ylab="Residuals", 
      + main="Residual Plot")
      
      The residuals are fairly unbiased and homoscedastic.
    6. Create the normal plot of the residuals. Interpret it. Answer:
      > qqline(r)
      > qqline(r, col="red")
      
      The residuals in the normal plot are fairly close to a straight line, so they are approximately normally distributed.
    7. If y = ax + b. Perform a t-test that tests the null hypothesis that the true value of the slope a is 0.
      Answer: Look at the p-value for testing that the slope coefficient a is zero.  Because the corresponding p-value is very small, and certainly less than 0.05, we reject the null hypothesis that a = 0.
    8. If y = ax + b. Perform a t-test that tests the null hypothesis that the true value of the intercept b is 0.
      Answer: the p-value for testing whether b is 0, is greater than 0.05 so we do not have anough evidence to concluse that b = 0.
    9. For this example, if the number of beers consumed is 4, what is the predicted blood alcohol level?
      Answer: substitute 4 for the variable beers in the regression equation.
             bac = 0.017964 * beers - 0.012701 = 0.017964 * 4 - 0.012701 = 0.059155.
  3. Look at the dataset multi-reg-sales.txt, which contains three columns: TV Advertising (tv), Radio Advertising (radio), and Sales (sales). The units for all three variables are thousands of dollars.
    1. Use R to find the multiple regression equation for predicting sales from TV and radio advertising:
      model2 = lm(sales~tv+radio, data=df2)
      
      > # Answer:
      > setwd("c:/workspace")
      > df2 <- read.csv("multi-reg-sales.txt")
      > df2
      tv radio sales
      1 230.1 37.8 22.1
      2  44.5 39.3 10.4
      3  17.2 45.9  9.3
      4 151.5 41.3 18.5
      5 180.8 10.8 12.9
      6   8.7 48.9  7.2
      7  57.5 32.8 11.8
      8 120.2 19.6 14.8
      > model2 <- lm(sales~tv+radio, data=df2)
      > summary(model2)
      
      Call:
      lm(formula = sales ~ tv + radio, data = df2)
      
      Residuals:
             1       2        3       4        5
      -0.11933 0.10117 -0.09692 0.94653 -2.49597 
             6       7        8 
      -2.04224 1.51978  2.18698 
      
      Coefficients:
      Estimate Std. Error t value Pr(>|t|) 
      (Intercept) 2.14721 3.00413 0.715 0.50674 
      tv 0.06531 0.01041 6.275 0.00151 **
      radio 0.13347 0.06474 2.062 0.09421 . 
      ---
      Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      
      Residual standard error: 1.92 on 5 degrees of freedom
      Multiple R-squared: 0.8921, Adjusted R-squared: 0.8489 
      F-statistic: 20.67 on 2 and 5 DF, p-value: 0.003826
      
    2. Use R to create the residual plot and normal plot of residuals. Answer:
      > p <- predict(model2)
      > r <- resid(model2)
      > plot(p, r, xlab="Predicted Values", ylab="Residuals",
      + main="Residual Plot")
      > qqnorm(r)
      > qqline(r)
      
      The residual plot shows that the residuals are fairly unbiased and homoscedastic, as much as can be determined from a small dataset. The normal plot shows that the residuals are fairly normally distributed because the points generally follow a straight line.
    3. Use the resulting regression equation to predict sales when tv=80 and radio=30. Answer:
             sales = 0.06531 * tv + 0.13347 * radio + 2.14721
                     = 0.06531 * 80 + 0.13347 * 30 + 2.14721
                     = 11.3761

Review for Final Exam