To Notes

IT 223 -- Feb 11, 2026

Review Exercises

  1. What is the difference between the true regression equation and the estimated regression equation?
    Answer: recall the ideal measurement model: xi = μ + ei (actual measurement = true value + random error). μ is usually unknown and estimated by μ^= x.

    The true regression equation is yi = a * xi + b, where the slope a and the intercept b are unknown and must be estimated by a^ and b^, which are determined from the estimated regression equation:
          y - y = (r^ * SD+y / SD+x)(x - x)
    which is rewritten as
          y = a^ * x + b^.
  2. Look at the Regression Document. What is the RMSE for a regression model?
    Ans: The root mean squared error (RMSE is the standard deviation of the residuals. In particular, RMSE is the SD of the residuals in a thin vertical rectangle that contains a specific x value. RMSE is defined as SD+y √(1 - r2).
  3. A law school finds this relationship between LSAT scores (independent variable x) and first-year scores (dependent variable y). The data are bivariate normal. Here are the summary variables:
           x = 162    SDx = 6
           y = 68    SDy = 10
           r = 0.6
    1. About what percentage of the students have first-year scores over 75? Because the data are bivariate normal, the first-year scores are normally distributed, so we can use the normal table.
      Answer: z = (y - y) / SDy = (75 - 68) / 10 = 0.7. The area of the bin (-∞, 0.7] is 0.7580 = 76%. Therefore the percentage of students with first year score over 75 is 100% - 76% = 24%.
    2. Of those students who scored 165 on the LSAT, about what percentage have first-year scores over 75?  Visualize these scores as lying in a thin vertical rectangle centered at LSAT = 165. The observations in the thin vertical rectangle centered over x=165 are normally distributed.
      Answer: The regression equation is
            y - 68 = (0.6 * 10 / 6) (x - 162)
            y - 68 = 1 (x - 162)
            y = x - 94
      so the predicted value for the students in the thin vertical rectangle centered at x = 165 is y = x - 94 = 165 - 94 = 71.

      The RMSE for those students in the thin vertical rectangle centered at x = 165 is
            RMSE = √1 - r2 SDy = √1 - 0.62 * 10 = 0.8 * 10 = 8
      Then z = (y - y^) / RMSE = (75 - 71) / 8 = 0.5. The area of the bin (-∞, 0.5] is 0.6915 = 69%. Therefore of the students having LSAT score equal to 165, the percentage of students having first year score over 75 is 100% - 69% = 31%.

Project 3

The Regression Fallacy

Learning Outcomes for Probability

Probability

Practice Problems

  1. What is wrong with this argument? Either the Bears will win the Super Bowl in 2027 or they won't. Therefore the probability that the Bears will win the Superbowl in 2027 is 50%.
    Ans: Just because there are two outcomes doesn't mean they are equally likely. Some outcomes are very likely; they have probabilities close to 1. Some outcomes are unlikely; they have outcomes close to 0. Other outcomes have probabilities close to 0.5. In this case it does not make sense to use the a priori probabilities of 0.5 for winning the superbowl or not winning it in 2027.
  2. What is wrong with this strategy? Double down after each loss. Eventually you win and recoup your losses. For example:
    -1 - 2 - 4 - 8 - 16 + 32 = 1.
    Now start over with 1 and repeat the double down strategy.
    Ans: The problem with this strategy is that, eventually, you will either reach the casino betting limit or you will run out of money.
  3. A bookmaker offers 20 to 1 odds that the Bears will win the Super Bowl in 2026. If this is a fair bet, what is the probability p that the Bears will win the Super Bowl?
    Answer: The expected amount that you win is 20p + (-1)(1 - p); this expression is 0 because we are assuming that is a fair bet. Now solve for p:
         20p + (-1)(1 - p) = 0
         20p - 1 + p = 0
         21p = 1
         p = 1 / 21 = 0.0476 = 4.8%.

Random Variables