Feb 9, 2026

IT 223 -- Feb 9, 2026

Review Exercises

Try out the R scan function. Answer:

> x <- scan( )
1: 32 43 57 32 11
6: 43 22 128 9
10: 
Read 9 items

Press Enter two consecutive times to terminate the scan input. Other ways to input R input vectors:

> # 1. Use the c function.
> x <- c(0, 2, 1, 3)
>
> # 2. Use the seq function.
> x <- seq(-4, 4, 0.005)
>
> # 3. Use the : operator.
> x <- 1:100
>
> # 4. Read a data frame from a CSV file.
> xDf <- read.csv("x-data-vector.txt")
>
> # 5. Create a data frame from data vectors.
> xDf <- data.frame(X=0:3, Y=c(0, 2, 1, 3))

List statistics for estimating the center and spread of a univariate dataset. Give pros and cons for each. How do you compute them with R?
Answer. sample mean xbar: most accurate when the dataset is normally distributed.
median: more accurate than the sample mean when the dataset has outliers.
trimmed mean: a compromise between the sample mean and the sample median.
SD+: most accurate for estimating the spread of a normal histogram.
Mean absolute deviation (MAD): more accurate than SD+ when the dataset has outliers.
IQR: more descriptive than SD+ for non-normal histograms.
range = min - max: simple to compute but not very accurate, especially when there are outliers. Here are R statements with output illustrating these statistics:
```
> x <- c(1, 2, 3, 4, 10)
> x
[1] 1 2 3 4 10
> mean(x)
[1] 4
> median(x)
[1] 3
> mean(x, trim=0.2)
[1] 3
> sd(x)
[1] 3.535534
> mad(x)
[1] 1.4826
> IQR(x)
[1] 2
> range(x)
[1] 1 10
```
Which R graphical parameters are used to create an R scatterplot?
```
main  sub  xlab  ylab  xlim  ylim  pch    
```
The sub parameter sets the subtitle for the plot. It appears below the plot.
How do you add a line or lines to a scatterplot?
Answer: use the R function lines to add lines to a plot.

Give an examples of problems that you would use each of these R functions to solve.

dnorm  pnorm  qnorm  rnorm

Answer:

> # Plot the standard normal density
> x <- seq(-4, 4, 0.001)
> plot(x)
> # Find the proportion of persons with IQ > 120
> 1 - pnorm(120, mean=100, sd=15)
[1] 0.09121122
> # Find the 95th percentile of IQ scores
> qnorm(0.95, mean=100, sd=15)
[1] 124.6728
> # Generate 20 simulated random IQ scores
> scores <- rnorm(20, mean=100, sd=15)
> mean(scores)
[1] 101.4545
> sd(scores)
[1] 13.61754

What is the IQR for a standard normal dataset? Answer:
```
> qnorm(0.75) - qnorm(0.25)
[1] 1.34898
```
What are the inner fences in a boxplot.Answer:
Inner Fence 1 = Q1 - IQR * 1.5 Inner Fence 2 = Q3 + IQR * 1.5
What are two methods of defining outliers? Answer:
Method 1: use the boxplot. Outliers are points less than the inner fence 1 or greater than the inner fence 2.
Method 2: use z-scores. Outliers are points with z-score either less than -2 or greater than 2.
What are expected normal scores for a data vector of length n?
Answer: they are points that divide the normal density into n+1 equal areas. For example, if x is a dataset of size 4, you can find the expected normal scores like this:
```
> scores <- qnorm(c(0.2, 0.4, 0.6, 0.8))
> scores
[1] -0.8416212 -0.2533471 0.2533471 0.8416212
```
How do you use normal scores to create a normal plot?
Answer: plot the actual data points vs. the expected normal scores:
```
> scores <- qnorm(c(0.2, 0.4, 0.6, 0.8))
> x <- c(1, 3, 7, 35)
> plot(scores, x, main="Homemade Normal Plot") 
```
What does the following normal plot tell you about the dataset?

Answer: It tells you that the data have fat tails. The data are further from the mean than expected on both the left and the right.
Create a vector x of 100 simulated normal IQ scores (mean=100 and sd=15). Then create these graphs: histogram, boxplot, normal plot, plot is x vs. observation number. Interpret the plots.
```
> x <- rnorm(100, mean=100, sd=15)
> hist(x)
> boxplot(x)
> qqnorm(x)
> qqline(x)
> plot(1:100, x,xlab="Observation Number", ylab="x variable")
```
Use these normal tables or R to solve the following problems:
1. Human male weights are normally distributed with mean=75kg and sd=16kg. What proportion of these weights are between 51 and 99 kg? Answer:
```
 > pnorm(99, mean=75, sd=16) - pnorm(51, mean=75, sd=16)
[1] 0.8663856
```
2. What proportion of these weights are greater than 131kg? Answer:
```
> 1 - pnorm(131, mean=75, sd=16)
[1] 0.0002326291
```
3. What is the 80th percentile for these weights? Answer:
```
> qnorm(0.8, mean=75, sd=16)
[1] 88.46594
```
At the university of Northern South Dakota, the LSAT scores for first year students have x=162 and SD_x=6. The first year scores for these students have y=68 and SD_y=10. The correlation between LSAT scores and first year scores is r=0.6.
1. Compute the regression equation for predicting first year score from LSAT score. Answer:
```
y - ybar = (r SDy / SDx) (x - xbar)
y - 68 = (0.6 * 10 / 6) * (x - 162)
y - 68 = 1 * (x - 162)
y = x - 162 + 68
y = x - 94
```
2. For a first year student that has an LSAT score of 172, what is the predicted first year score? Answer:
```
y = x - 94
y = 172 - 94 = 78
```

Review for Midterm

Project 3

Look at the project description for Project 3.