To Notes

IT 403 -- Oct 26, 2016

Review Exercises

  1. Use the standard normal table to find these two-sided confidence intervals:
          a. 99%     b. 99.9      c. 80%
    Ans: a. If the area under the normal curve over the interval [-z, z] is 99% = 0.99, then the areas corresponding to (-∞, -z] and [z, ∞) are both 0.005. This means that the area corresponding to (-∞, -z] ∪ [-z, z] = (-∞, z] is 0.99 + 0.005 = 0.995. Look up 0.995 in the body of the standard normal table and find the corresponding z-score: 2.575. Therefore the area corresponding to [-2.575, 2.575] is 0.99 = 99%.
    b. The 99.9% two-sided confidence interval is [-3.29, 3.29].
    c. The 80% two-sided confidence interval is [-1.28, 1.28].
    Notes: one-sided confidence intervals of the form (∞, z] and [z, ∞) are also possible. We will discuss them later.
  2. For a Bernoulli random variable x with probability of success p, what is the expected value E(X) and theoretical standard deviation σx?
    Ans: For a Bernoulli random variable: E(x) = p and σx = √p(1 - p).
  3. For a Binomial random variable x with n trials and probability of success p, what are E(S) and σS?
    Ans: For a binomial random variable: E(S) = np and σS = √np(1 - p).
  4. To forecast an election, an interviewer asks a random sample of people "Will you vote for Candidate X?" Here are the results:
          n = 2,500     S = 1,376.
    Find a 95% confidence interval for the true probability that a random person will vote for Candidate X. What is the sampling error?
  5. What is a test of hypothesis? What are the null and alternative hypothesis?
    Ans: A test of hypothesis is used to determine if there is a real difference between a random sample and the population. The null hypothesis, denoted by H0, states that there is no real difference; the difference is just random variation. The alternative hypothesis, denoted by H1, states that the difference is real.
  6. Use SPSS to simulate the following situations. Use the random number generator Rv.Binom(n, p).
    1. 1,000 outcomes of a Bernoulli random variable with p = 0.5.
    2. 10,000 outcomes of a Bernoulli random variable with p = 0.7.
    3. 100,000 outcomes of a Bernoulli random variable with p = 0.9.

    Ans: In SPSS, set up a dataset with two scale variables n and p like this:
        n  p
        1000 .5
      10000 .7
    100000 .9

    Then perform these computations with Transform >> Compute Variable:
    Target Variable       Numeric Expression       Description
            S     RV.BINOM(n, p) Generate binomial random outcomes.
            SE_S     sqrt(n * p * (1 - p)) Compute standard error of the sum.
            z     (S - n * p) / SE_S Compute test statistic z.

    Here are the results:
         n  p  S SE_S     z
        1000 .5     498 15.81   -.1265
      10000 .7   6949 43.83  -1.1129
    100000 .9 90067 94.87    .7062

    In the Variable View, the number of decimal places is set to 0, 1, 0, 2, and 4 for the variables n, p, S, SE_S, and z, respectively.
  7. For each of the cases in Problem 6, test whether the random number generator is fair for the given probability.
    Ans: The standard normal 95% two-sided confidence interval is I = [-1.96, 1.96]. In each case, z ∈ I, so we accept the null nypothesis that the binomial random number generator is fair in all cases.
  8. If the probability of success is p, let x be the "time to failure" random variable, that is the number of successes of a Bernoulli random variable required to obtain a failure. This random variable x is called a geometric distribution.
    1. What is the probability distribution of x?
      Ans: Set S = Success, F = Failure, x = number of successes before a failure:
      Outcome x P(X)
           F 0 1 - p
           SF 1 p(1 - p)
           SSF 2 p2(1 - p)
           SSSF 3 p3(1 - p)
            ... ...      ...

      To be a legitimate probability distribution, the sum of all the probabilities must be 1. To check, first let's find the sum of the geometric series:
            S = 1 + p + p2 + p3 + ...        Equation 1.
      Multiply Equation 1 by p to obtain:
            pS = p + p2 + p3 + p4 + ...     Equation 2.
      Now if we subtract Equation 1 minus Equation 2, the terms involving p go away and we obtain
            S - pS = 1.
      Solve for S to obtain S = 1 / (1 - p). Now we can sum the terms in the P(x) column of the probability distribution:
            1 - p + p(1 - p) + p2(1 - p) + p3(1 - p) + ...
        = (1 + p + p2 + p3 + ... )(1 - p) = [1 / (1 - p)](1 - p) = 1.
    2. What is E(x)? Ans:
      E(x) = 0P(0) + 1P(1) + 2P(2) + 3p(3) + ...
            = 0(1 - p) + 1p(1 - p) + 2p2(1 - p) + 3p3(1 - p) + 4p4(1 - p) + ...
            = p - p2 + 2p2 - 2p3 + 3p3 - 3p4 + 4p4 + ...
            = p + p2 + p3 + p4 + ... )
            = p(1 + p + p2 + p3 + ... ) = p[1 / (1 - p)] = p / (1 - p).

Quiz 6

The Central Limit Theorem

Practice Problems

  1. Use the CLT to estimate the following probabilities for the number of heads obtained from a fair coin:
    1. Obtaining 13 to 16 heads out of 25 tosses. (Because the normal table is continuous, use 12.5 to 16.5 tosses.)
      Ans: SES = √np(1-p) = √25(0.5)(1-0.5) = 2.5.
      Then z1 = (S - np) / SES = (12.5 - 25(0.5)) / 2.5 = 0
      z2 = (S - np) / SES = (16.5 - 25(0.5)) / 2.5 = 1.6;
      the area under the standard normal curve for the bin [0,1.6] is 0.9452 - 0.5000 - 0.4452 = 44.5%.
      The exact value obtained computing P(13) + P(14) + P(15) + P(16) is 0.4461 = 44.6%.
    2. Obtaining between 60 to 75 heads out of 100 tosses. (Use 59.5 to 75.5 tosses.)
      Ans: SES = √np(1-p) = √100(0.5)(1-0.5) = 5.
      Then z1 = (S - np) / SES = (59.5 - 100(0.5)) / 5 = 1.9,
      z2 = (S - np) / SES = (75.5 - 100(0.5)) / 5 = 5.1;
      the area under the normal curve for the bin [1.9,5.1] is 1 - 0.9713 = 0.0287
      The exact value obtained using the binomial formula is 0.0284.
    3. Obtaining exactly 30 heads out of 60 tosses. (Use 29.5 to 30.5 tosses.)
      Ans: SES = √np(1-p) = √60(0.5)(1-0.5) = 3.87.
      z1 = (S - np) / SES = (29.5 - 60(0.5)) / 3.87 = -0.129,
      z2 = (S - np) / SES = (30.5 - 60(0.5)) / 3.87 = 0.129;
      the area under the normal curve for the bin [-0.129, 0.129] is 0.5513 - 0.4487 = 0.1026 = 10%
      The exact value obtained using the binomial formula is 0.1026.

Tests of Hypothesis in General

The z-test

More about p-values

Practice Problems

We will discuss the following two practice problems On November 3.

  1. In 1999, it was reported that the mean serum cholesterol level for female undergraduates was 168 mg/dl. A recent study at Baylor university collected the following data for cholesterol levels for females:
          x = 173.7     SD+ = 27
    Is there a real difference between the women in the Baylor study and the reported value in 1999? (Example 6.15 from textbook). Perform the test at the 90%-level.
    1. H0: 168     H1: μ ≠ 168
    2. z = (x - μ) / SEave = (173.7 - 168) / (27 / √27) = 1.78
    3. A 90% confidence interval for z is [-1.64,1.64].
    4. 1.78 ∉ [-1.64,1.64], so reject the null hypothesis.
    5. Find the area corresponding to the bin [-1.78,1.78]: 2 × 0.0375 = 0.0750.
  2. Claim: if all high school seniors in California took the SAT test, the mean score would be equal to 450. To test this claim, take a sample of 400 high school seniors and give them the test. Here are the data:
    n = 400     x = 461     SD+ = 100
    Is this result for the sample significantly different from 450 or is it just chance variation? Perform the test at the 99%-level.
    Ans: Here are the steps of the z-test:
    1. H0: μ = 450      H1: μ ≠ 450
    2. z = (x - μ) / SEave = (461 - 450) / (100 / √400) = 2.2.
    3. A 99% confidence interval for z is [-2.58,2.58].
    4. 2.2 ∉ [-2.58, 2.58], so reject the null hypothesis.
    5. The p-value is the probability of obtaining a z-value as extreme or more extreme than the one actually obtained. Find the area corresponding to the bin [-2.2,2.2]: 2 × 0.0139 = 0.0278.

The t-test

Degrees of Freedom

The Paired Sample t-test