To Notes

IT 223 -- Feb 9, 2026

Review Exercises

  1. Try out the R scan function. Answer:
    > x <- scan( )
    1: 32 43 57 32 11
    6: 43 22 128 9
    10: 
    Read 9 items
    
    Press Enter two consecutive times to terminate the scan input. Other ways to input R input vectors:
    > # 1. Use the c function.
    > x <- c(0, 2, 1, 3)
    >
    > # 2. Use the seq function.
    > x <- seq(-4, 4, 0.005)
    >
    > # 3. Use the : operator.
    > x <- 1:100
    >
    > # 4. Read a data frame from a CSV file.
    > xDf <- read.csv("x-data-vector.txt")
    >
    > # 5. Create a data frame from data vectors.
    > xDf <- data.frame(X=0:3, Y=c(0, 2, 1, 3))
    
  2. List statistics for estimating the center and spread of a univariate dataset. Give pros and cons for each. How do you compute them with R?
    Answer. sample mean xbar: most accurate when the dataset is normally distributed.
    median: more accurate than the sample mean when the dataset has outliers.
    trimmed mean: a compromise between the sample mean and the sample median.
    SD+: most accurate for estimating the spread of a normal histogram.
    Mean absolute deviation (MAD): more accurate than SD+ when the dataset has outliers.
    IQR: more descriptive than SD+ for non-normal histograms.
    range = min - max: simple to compute but not very accurate, especially when there are outliers. Here are R statements with output illustrating these statistics:
    > x <- c(1, 2, 3, 4, 10)
    > x
    [1] 1 2 3 4 10
    > mean(x)
    [1] 4
    > median(x)
    [1] 3
    > mean(x, trim=0.2)
    [1] 3
    > sd(x)
    [1] 3.535534
    > mad(x)
    [1] 1.4826
    > IQR(x)
    [1] 2
    > range(x)
    [1] 1 10
    
  3. Which R graphical parameters are used to create an R scatterplot?
    main  sub  xlab  ylab  xlim  ylim  pch    
    
    The sub parameter sets the subtitle for the plot. It appears below the plot.
  4. How do you add a line or lines to a scatterplot?
    Answer: use the R function lines to add lines to a plot.
  5. Give an examples of problems that you would use each of these R functions to solve.
    dnorm  pnorm  qnorm  rnorm
    
    Answer:
    > # Plot the standard normal density
    > x <- seq(-4, 4, 0.001)
    > plot(x)
    > # Find the proportion of persons with IQ > 120
    > 1 - pnorm(120, mean=100, sd=15)
    [1] 0.09121122
    > # Find the 95th percentile of IQ scores
    > qnorm(0.95, mean=100, sd=15)
    [1] 124.6728
    > # Generate 20 simulated random IQ scores
    > scores <- rnorm(20, mean=100, sd=15)
    > mean(scores)
    [1] 101.4545
    > sd(scores)
    [1] 13.61754
  6. What is the IQR for a standard normal dataset? Answer:
    > qnorm(0.75) - qnorm(0.25)
    [1] 1.34898
    
  7. What are the inner fences in a boxplot.Answer:
    Inner Fence 1 = Q1 - IQR * 1.5 Inner Fence 2 = Q3 + IQR * 1.5
  8. What are two methods of defining outliers? Answer:
    Method 1: use the boxplot. Outliers are points less than the inner fence 1 or greater than the inner fence 2.
    Method 2: use z-scores. Outliers are points with z-score either less than -2 or greater than 2.
  9. What are expected normal scores for a data vector of length n?
    Answer: they are points that divide the normal density into n+1 equal areas. For example, if x is a dataset of size 4, you can find the expected normal scores like this:
    > scores <- qnorm(c(0.2, 0.4, 0.6, 0.8))
    > scores
    [1] -0.8416212 -0.2533471 0.2533471 0.8416212
    
  10. How do you use normal scores to create a normal plot?
    Answer: plot the actual data points vs. the expected normal scores:
    > scores <- qnorm(c(0.2, 0.4, 0.6, 0.8))
    > x <- c(1, 3, 7, 35)
    > plot(scores, x, main="Homemade Normal Plot") 
    
    Homemade Normal Plot
  11. What does the following normal plot tell you about the dataset?

    Normal Plot
    Answer: It tells you that the data have fat tails. The data are further from the mean than expected on both the left and the right.
  12. Create a vector x of 100 simulated normal IQ scores (mean=100 and sd=15). Then create these graphs: histogram, boxplot, normal plot, plot is x vs. observation number. Interpret the plots.
    > x <- rnorm(100, mean=100, sd=15)
    > hist(x)
    > boxplot(x)
    > qqnorm(x)
    > qqline(x)
    > plot(1:100, x,xlab="Observation Number", ylab="x variable")
    
  13. Use these normal tables or R to solve the following problems:
    1. Human male weights are normally distributed with mean=75kg and sd=16kg. What proportion of these weights are between 51 and 99 kg? Answer:
       > pnorm(99, mean=75, sd=16) - pnorm(51, mean=75, sd=16)
      [1] 0.8663856
      
    2. What proportion of these weights are greater than 131kg? Answer:
      > 1 - pnorm(131, mean=75, sd=16)
      [1] 0.0002326291
      
    3. What is the 80th percentile for these weights? Answer:
      > qnorm(0.8, mean=75, sd=16)
      [1] 88.46594
      
  14. At the university of Northern South Dakota, the LSAT scores for first year students have x=162 and SDx=6. The first year scores for these students have y=68 and SDy=10. The correlation between LSAT scores and first year scores is r=0.6.
    1. Compute the regression equation for predicting first year score from LSAT score. Answer:
      y - ybar = (r SDy / SDx) (x - xbar)
      y - 68 = (0.6 * 10 / 6) * (x - 162)
      y - 68 = 1 * (x - 162)
      y = x - 162 + 68
      y = x - 94
      
    2. For a first year student that has an LSAT score of 172, what is the predicted first year score? Answer:
      y = x - 94
      y = 172 - 94 = 78
      

Review for Midterm

Project 3