To Notes

IT 223 -- Jan 7, 2026

Review Exercises

  1. What is the proper filename for project submissions?
    Answer: proj1Smith.docx (replace with your last name)
  2. Matching:
    a. Fisher 1. First Stated the Central Limit Theorem
    b. Cotes 2. Tried to find the "ideal man" that nature was trying to produce
    c. Tukey 3. Was the first demographer to use statistics.
    d. de Moivre 4. Coined the term "exploratory data analysis."
    e. Pascal 5. Father of modern statistics.
    f. Gauss 6. First described the least squares method.
    g. Graunt 7. First studied the theory of errors applied to astronomy.
    h. Galton 8. First applied the theory of probability to gambling.
    i. Quetelet 9. Introduced the concept of correlation.

    Answer: a. 5; b. 7; c. 4; d. 1; e. 8; f. 6; g. 3; h. 9, i. 2.
  3. What is a lurking variable?
    Answer: A variable that is not included in the dataset, but should be. Another name for a lurking variable is a confounding variable.
  4. Why is it important for a clinical trial to be randomized and double blind?
    Answer: a clinical trial should be randomized to minimize the effect of lurking variables. The randomization insures that effects due to variables not included in the dataset are similarly distributed between both the treatment and the control group. A clinical study should be double blind so that psychological effects that result from the patient knowing that he or she has received the treatment vs. knowing that he or she has received the placebo. The doctor treating the patient should also not know whether that patient has received the treatment or the placebo.
  5. What is the difference between a controlled experiment and an observational study?
    Answer: In a controlled experiment, each subject is randomly assigned the treatment or placebo, to reduce or eliminate the effect of lurking variables. In an observational study, no experiment is performed. The data is just reported and analyzed "as is."
  6. Give examples of categorical, ordinal, and continuous variables.
    Answer: categorical: gender (M, F, X (non-binary), ordinal: year in college (1, 2, 3, 4), continuous: height.
  7. Which R function is used to create a vector.
    Answer: the c function, which means combine. For example:
    x <- c(4, 2, 7, 5)
    
  8. What is the R assignment operator?
    Answer: <-, for example,
    x <- c(4, 2, 7, 5)
    
    -> is also the assignment operator:
    c(4, 2, 7, 5) -> x
    
    but -> is not often used
  9. True or False. In the R language, vector indices are zero-based.
    Answer: False, R vectors are one-based, which means that the indices of the vector elements start at 1, not 0. Most other modern computer languages use zero-based indices because they are computationally more efficient. Examples of languages that use zero-based indices are Python, Java, and C#.
  10. Use R to compute Q0, Q1, Q2, Q3, and Q4 for these Celsius temperatures:
    38 54 52 49 65 58 103 12 70
    
    Answer:
    > temps <- c(38, 54, 52, 49, 65, 58, 103, 12, 70)
    > summary(temps)
    Min. 1st Qu. Median Mean 3rd Qu. Max. 
    12.00 49.00 54.00 55.67 65.00 103.00 
    > quantile(temps, prob=c(0.0, 0.25, 0.5, 0.75, 1.0))
    0% 25% 50% 75% 100% 
    12 49 54 65 103 
    
  11. Use R to create the stemplot, histogram, and boxplot of the Celsius temperatures in the previous exercise.
    # R statements and console output
    > stem(temps)
    
      The decimal point is 1 digit(s) to the right of the |
    
       0 | 2
       2 | 8
       4 | 9248
       6 | 50
       8 | 
      10 | 3
    
    > hist(temps)
    > boxplot(temps)
    
    Histogram:
    Histogram for Celsius Temps
    Boxplot:
    Boxplot for Celsius Temps

Quartiles

Boxplots

Histograms

Practice Problems

  1. Match the descriptions with the histograms below:
    1. The gender of all persons in a college class (male = 0, female = 1).
    2. The handedness of all persons in a college class (left handed = 0, right handed = 1).
    3. The heights of all married persons counted separately.
    4. The heights of all persons in families where both parents are 28 years old or less.
    5. The heights of all automobiles.
    6. The incomes of all persons in the U.S.

    Ans: 1 iv, 2 iii, 3 i, 4 ii, 5 v, 6 vi.

Using R

Project 2

More Practice Problems

  1. Draw the histogram in each for tables (1), (2), and (3). [a,b) denotes an interval that is closed on the left (includes a) and open on the right (does not include b).

    Caution: what does it mean for histograms (b) and (c) to have bins of different widths?
    Answer: If the bars of a histogram have unequal widths, the area of a bar represents the frequency (the number of observations in the interval under the bar).  The height of each bar is then frequency per horizontal unit. For unequal bin widths, the height of a bar is called the density.

       Table (a)     Table (b) Table (c)
    Bin Count
    [0,1) 1
    [1,2) 3
    [2,3) 5
    [3,4] 1
    Bin Count
    [0,1) 3
    [1,2) 5
    [2,4] 2
    Bin Count
    [0,1) 2
    [1,2) 4
    [2,2.5) 3
    [2.5,3] 1
  2. Compute the median for each histogram in the preceding problem by using interpolation in the bar that contains the median.
    Ans: Problems 1a, 1b, and 1c.