To Notes
IT 223 -- Jan 26, 2026
Review Exercises
- Work problems from the
Practice Problems on the Area under the Normal Curve.
Answer: we worked problems 1s, 1e, and 2d in class today. The answers are
shown on the page of practice problems.
- Use R to create a data vector with entries from 1 to 100. Answer:
v <- seq(1, 100, 1)
# or
v <- seq(1, 100)
# or
v <- 1:100
- Use R to create a data vector with entries that start at 1, end at 3,
and increase by 0.01 from one entry to the next? Answer:
> v <- seq(1, 3, 0.01)
- What is a z-score? How do you use R to compute the z-scores for a data
vector?
Ans: A z-score for individual observations is computed as z = (x - x) /
SD. It tells you how many - standard deviations the observation is away
from the mean. The z-score can also be computed for the sample mean:
z = (x - μ) / SDave. If you knew the population mean μ
this z would tell you how many standard errors the sample mean was from the population mean.
However, μ usually unknown, so this z can be used to obtain a confidence interval for μ.
The z for individual observations is used to look up areas under the standard normal curve.
Answer: a z-score for an observation tells you how many standard
deviations away from the sample mean the observation is. You compute the
z-scores of a data vector like this:
z = (x - mean(x)) / sd(x)
- We discussed this document last week.
- What proportion of observations are containted in each of these bins under the standard normal curve?
[-1, 1] [-2, 2] [3, 3]
Answer: 68% for [-1,1]; 95% for [-2,2]; 99.7% for [-3,3].
Some R functions:
dnorm, pnorm, qnorm, rnorm
- dnorm(x, mean, sd) -- The height of a normal density
with μ and σ specified by the mean and
sd arguments.
Example:
draw a plot of the normal density of SAT scores, where μ=1500 and σ=300.
Use seq(0, 2500, 1) for the x-values. Set the title of the plot (main argument)
to "Density of SAT Scores".
- pnorm(x, mean, sd) -- The area under the normal
density in the interval (-∞, x].
Example: if μ=1500 and σ=300 for SAT
scores, find the proportion of scores that are greater than 1950. Answer:
> pnorm(1950, 1500, 300)
[1] 0.9331928
However, we don't want area (-∞, 1950], we want area (1950, ∞) = 1 -
area (-∞, 1950]. The answer we want is
> 1 - pnorm(1950, 1500, 300)
[1] 0.0668072
- qnorm(p, mean, sd) -- The p quantile for the normal
density.
Example: If μ=1500 and σ=300 for SAT scores, what is the 0.95
quantile or 95th percentile for SAT scores?
Answer:
> qnorm(0.95, 1500, 300)
[1] 1993.456
- rnorm(n, mean, sd) -- Generate n normally
distributed random numbers with the specified mean and standard deviation.
Example: Generate 100 normally distributed random values with mean=1500 and
sd=300. Then create the histogram, the box plot, and the plot of the x-values vs
observation number.
x <- 1:100
y <- rnorm(x, 1500, 300)
hist(y)
boxplot(y)
plot(x, y)
Biased vs. Unbiased; Heteroscedastic vs. Homoscedastic for Graphs
- The following scatterplots are plots of xi
(measurement) vs. i (observation number) with the sample mean marked
with a red horizontal line. The measurement is plotted on the
vertical axis; the observation number is plotted on the horizontal axis.
What does each plot tell you? Describe each plot using these terms:
- Unbiased The average of the observations in every thin
vertical strip is the same all the way across the scatterplot.
- Biased The average of the observations changes, depending on
which thin vertical strip you pick.
- Homoscedastic The variation (SD+) of the observations
is the same in every thin vertical strip all the way across the
scatterplot.
- Heteroscedastic The variation (SD+) of the observations
in a thin vertical strip changes, depending on which vertical strip you pick.

Ans: (a) unbiased and homoscedastic,
(b) unbiased and heteroscedastic,
(c) biased and homoscedastic,
(d) biased and heteroscedastic,
(e) unbiased and heteroscedastic,
(f) biased and homoscedastic.
Normal Plots
We will finish discussing normal plots on Wednesday, Feb 28.
- Normal plots can be used to
determine if a dataset is approximately normal, or how a dataset deviates
from normality.
Practice Problems
We will discuss this section on Wednesday, Jan 28.
- Compute normal scores (Van der Waerden's method) for a dataset of size 9.
- Construct the normal plots by hand of this dataset:
81 95 97 101 112 125
129 167 220
- Create the normal plot for this dataset with R.
Random Variable Simulation
- A random variable is the process of choosing a random
number.
- R can generate many different types of random numbers,
including from a normal distribution with a specified μ and σ.
- Example 1: Generate vector of 200 normal random numbers with
μ=6.7 and σ=2.5 using
the R rnorm function with
arguments 200,
mean=6.7 and sd= 2.5.
Create the histogram and boxplot of these random numbers
- A uniform random variable with range [a,b) is a value drawn
from the interval [a,b); every value in this interval is equally
likely to be chosen.
- Example 2: Generate a vector of 200 uniform random numbers
from the interval [1, 3.14) using the runif function
with arguments 200, min=1 and max=3.14. Create the histogram and boxplot of this
vector.
Project 2BCD