To Notes

IT 223 -- Jan 21, 2016

Review Exercises

  1. What percentage of observations are in the interval [200, 350) for this density histogram? The label for the vertical axis is percent per horizontal unit
    0.5 +
        |
    0.4 +           +--------+
        |           |        |
    0.3 +           |        |
        |           |        |
    0.2 +     +-----+        |
        |     |     |        |
    0.1 +     |     |        +-----------+
        |     |     |        |           |
    0.0 +-----+-----+-----+--+--+-----+--+--+
          0  100   200   300   400   500   600
                    Horizontal Units
    
    The percentage of observations in a bin is represented by the area of the histogram bar. For the bar over the interval [200, 350), the area is
    (350 - 200) * 0.4 = 150 * 0.4 = 60%
    
    The horizontal units of the histogram are percent per horizontal unit.
  2. What is a parsimonious description of a histogram?
    Answer: it means the histogram can be described succintly or with a very few descriptors. In the case of a normal histogram, it can be described with only two descriptors, the sample mean and the sample standard deviation. We can't use less than two descriptors, because both the center and the spread of the histogram must be described.
  3. What does the R operator $ do?
    Answer: it selects a column out of a dataframe and returns it as a vector.
  4. How do you create a dataframe from a CSV file?
    Answer: Create the file ht-wt.txt in the directory C:/workspace. Then use these R statements:
    > setwd("C:/workspace")
    > getwd( )
    [1] "C:/workspace"
    > htWtDf <- read.csv("ht-wt.txt")
    > print(htWtDf)
       Name Height Weight
    1 Susan   1.56     61
    2 David   1.78     84
    3 Julie   1.65     51
    
  5. How do you create a dataframe named htWtDf from these R vectors without using a CSV file?
    n <- c("Susan", "David", "Julie")
    h <- c(1.56, 1.78, 1.65)
    w <- c(61, 84, 51)
    
    Answer:
    > htWtDf <- data.frame(Name=n, Height=h, Weight=w)
    > print(htWtDf)
       Name Height Weight
    1 Susan   1.56     61
    2 David   1.78     84
    3 Julie   1.65     51
    
    You can also input the vectors directly into the dataframe without creating variables for them:
    htWtDf <- data.frame(
        Name=  c("Susan", "David", "Julie"),
        Height=c(1.56, 1.78, 1.65), 
        Weight=c(61, 84, 51))
    

The Standard Deviation

z-scores

Project 2a

The Normal Distribution