- Try out the R scan function. Answer:
> x <- scan( )
1: 32 43 57 32 11
6: 43 22 128 9
10:
Read 9 items
Press Enter two consecutive times to terminate the scan input.
Other ways to input R input vectors:
> # 1. Use the c function.
> x <- c(0, 2, 1, 3)
>
> # 2. Use the seq function.
> x <- seq(-4, 4, 0.005)
>
> # 3. Use the : operator.
> x <- 1:100
>
> # 4. Read a data frame from a CSV file.
> xDf <- read.csv("x-data-vector.txt")
>
> # 5. Create a data frame from data vectors.
> xDf <- data.frame(X=0:3, Y=c(0, 2, 1, 3))
- List statistics for estimating the center and spread of
a univariate dataset. Give pros and cons for each. How do you compute them with R?
Answer. sample mean xbar: most accurate when the dataset is
normally distributed.
median: more accurate than the sample mean when the dataset has outliers.
trimmed mean: a compromise between the sample mean and the sample median.
SD+: most accurate for estimating the spread of a normal histogram.
Mean absolute deviation (MAD): more accurate than SD+ when the dataset has outliers.
IQR: more descriptive than SD+ for non-normal histograms.
range = min - max: simple to compute but not very accurate, especially when there
are outliers. Here are R statements with output illustrating these statistics:
> x <- c(1, 2, 3, 4, 10)
> x
[1] 1 2 3 4 10
> mean(x)
[1] 4
> median(x)
[1] 3
> mean(x, trim=0.2)
[1] 3
> sd(x)
[1] 3.535534
> mad(x)
[1] 1.4826
> IQR(x)
[1] 2
> range(x)
[1] 1 10
- Which R graphical parameters are used to create an R scatterplot?
main sub xlab ylab xlim ylim pch
The
sub parameter sets the subtitle for the plot. It appears below the plot.
- How do you add a line or lines to a scatterplot?
Answer: use the R function lines to add lines to a plot.
- Give an examples of problems that you would use each of these R functions to solve.
dnorm pnorm qnorm rnorm
Answer:
> # Plot the standard normal density
> x <- seq(-4, 4, 0.001)
> plot(x)
> # Find the proportion of persons with IQ > 120
> 1 - pnorm(120, mean=100, sd=15)
[1] 0.09121122
> # Find the 95th percentile of IQ scores
> qnorm(0.95, mean=100, sd=15)
[1] 124.6728
> # Generate 20 simulated random IQ scores
> scores <- rnorm(20, mean=100, sd=15)
> mean(scores)
[1] 101.4545
> sd(scores)
[1] 13.61754
- What is the IQR for a standard normal dataset? Answer:
> qnorm(0.75) - qnorm(0.25)
[1] 1.34898
- What are the inner fences in a boxplot.Answer:
Inner Fence 1 = Q1 - IQR * 1.5
Inner Fence 2 = Q3 + IQR * 1.5
- What are two methods of defining outliers? Answer:
Method 1: use the boxplot. Outliers are points less than the
inner fence 1 or greater than the inner fence 2.
Method 2: use z-scores. Outliers are points with z-score
either less than -2 or greater than 2.
- What are expected normal scores for a data vector of length n?
Answer: they are points that divide the normal density into n+1 equal areas. For
example, if x is a dataset of size 4, you can find the
expected normal scores like this:
> scores <- qnorm(c(0.2, 0.4, 0.6, 0.8))
> scores
[1] -0.8416212 -0.2533471 0.2533471 0.8416212
- How do you use normal scores to create a normal plot?
Answer: plot the actual data points vs. the expected normal scores:
> scores <- qnorm(c(0.2, 0.4, 0.6, 0.8))
> x <- c(1, 3, 7, 35)
> plot(scores, x, main="Homemade Normal Plot")
- What does the following normal plot tell you about the dataset?

Answer: It tells you that the data have fat tails. The data are further from
the mean than expected on both the left and the right.
- Create a vector x of 100 simulated normal IQ scores (mean=100 and sd=15). Then create these graphs: histogram,
boxplot, normal plot, plot is x vs. observation number. Interpret the plots.
> x <- rnorm(100, mean=100, sd=15)
> hist(x)
> boxplot(x)
> qqnorm(x)
> qqline(x)
> plot(1:100, x,xlab="Observation Number", ylab="x variable")
- Use these normal tables or R to solve the following problems:
- Human male weights are normally distributed with mean=75kg and sd=16kg. What proportion of
these weights are between 51 and 99 kg? Answer:
> pnorm(99, mean=75, sd=16) - pnorm(51, mean=75, sd=16)
[1] 0.8663856
- What proportion of these weights are greater than 131kg? Answer:
> 1 - pnorm(131, mean=75, sd=16)
[1] 0.0002326291
- What is the 80th percentile for these weights? Answer:
> qnorm(0.8, mean=75, sd=16)
[1] 88.46594
- At the university of Northern South Dakota, the LSAT scores for first year
students have x=162 and SDx=6. The first year
scores for these students have
y=68 and SDy=10. The correlation between LSAT
scores and first year scores is r=0.6.
- Compute the regression equation for predicting first year score from LSAT score. Answer:
y - ybar = (r SDy / SDx) (x - xbar)
y - 68 = (0.6 * 10 / 6) * (x - 162)
y - 68 = 1 * (x - 162)
y = x - 162 + 68
y = x - 94
- For a first year student that has an LSAT score of 172, what is the
predicted first year score? Answer:
y = x - 94
y = 172 - 94 = 78