Jan 26, 2026

IT 223 -- Jan 26, 2026

Work problems from the Practice Problems on the Area under the Normal Curve.
Answer: we worked problems 1s, 1e, and 2d in class today. The answers are shown on the page of practice problems.
Use R to create a data vector with entries from 1 to 100. Answer:
```
v <- seq(1, 100, 1)
# or
v <- seq(1, 100)
# or
v <- 1:100
```
Use R to create a data vector with entries that start at 1, end at 3, and increase by 0.01 from one entry to the next? Answer:
```
> v <- seq(1, 3, 0.01)
```
What is a z-score? How do you use R to compute the z-scores for a data vector?
Ans: A z-score for individual observations is computed as z = (x - x) / SD. It tells you how many
standard deviations the observation is away from the mean. The z-score can also be computed for the sample mean:
z = (x - μ) / SD_ave. If you knew the population mean μ this z would tell you how many standard errors the sample mean was from the population mean. However, μ usually unknown, so this z can be used to obtain a confidence interval for μ.
The z for individual observations is used to look up areas under the standard normal curve.

z = (x - mean(x)) / sd(x)

We discussed this document last week.
What proportion of observations are containted in each of these bins under the standard normal curve?
```
[-1, 1]  [-2, 2]  [3, 3]
```
Answer: 68% for [-1,1]; 95% for [-2,2]; 99.7% for [-3,3].

dnorm(x, mean, sd) -- The height of a normal density with μ and σ specified by the mean and sd arguments.
Example: draw a plot of the normal density of SAT scores, where μ=1500 and σ=300. Use seq(0, 2500, 1) for the x-values. Set the title of the plot (main argument) to "Density of SAT Scores".
pnorm(x, mean, sd) -- The area under the normal density in the interval (-∞, x].
Example: if μ=1500 and σ=300 for SAT scores, find the proportion of scores that are greater than 1950. Answer:
```
> pnorm(1950, 1500, 300)
[1] 0.9331928
```
However, we don't want area (-∞, 1950], we want area (1950, ∞) = 1 - area (-∞, 1950]. The answer we want is
```
> 1 - pnorm(1950, 1500, 300)
[1] 0.0668072
```
qnorm(p, mean, sd) -- The p quantile for the normal density.
Example: If μ=1500 and σ=300 for SAT scores, what is the 0.95 quantile or 95th percentile for SAT scores?
Answer:
```
> qnorm(0.95, 1500, 300)
[1] 1993.456
```
rnorm(n, mean, sd) -- Generate n normally distributed random numbers with the specified mean and standard deviation.
Example: Generate 100 normally distributed random values with mean=1500 and sd=300. Then create the histogram, the box plot, and the plot of the x-values vs observation number.
```
x <- 1:100
y <- rnorm(x, 1500, 300)
hist(y)
boxplot(y)
plot(x, y)
```

We will finish discussing normal plots on Wednesday, Feb 28.

Normal plots can be used to determine if a dataset is approximately normal, or how a dataset deviates from normality.

We will discuss this section on Wednesday, Jan 28.

Compute normal scores (Van der Waerden's method) for a dataset of size 9.
Construct the normal plots by hand of this dataset:

81 95 97 101 112 125 129 167 220
Create the normal plot for this dataset with R.

A random variable is the process of choosing a random number.
R can generate many different types of random numbers, including from a normal distribution with a specified μ and σ.
Example 1: Generate vector of 200 normal random numbers with μ=6.7 and σ=2.5 using the R rnorm function with arguments 200, mean=6.7 and sd= 2.5. Create the histogram and boxplot of these random numbers
A uniform random variable with range [a,b) is a value drawn from the interval [a,b); every value in this interval is equally likely to be chosen.
Example 2: Generate a vector of 200 uniform random numbers from the interval [1, 3.14) using the runif function with arguments 200, min=1 and max=3.14. Create the histogram and boxplot of this vector.