38.4 57.6 46.2 55.5 62.5 49.5 38.0 40.9 62.8 44.3 33.9 93.8 50.4 47.9 35.0 69.2 52.8 46.2 60.1 56.3 55.1
> times <- scan( ) 1: 38.4 57.6 46.2 55.5 62.5 49.5 38.0 8: 40.9 62.8 44.3 33.9 93.8 50.4 47.9 15: 35.0 69.2 52.8 46.2 60.1 56.3 55.1 22: Read 21 items > boxplot(times, main="Maze Completion Times") > qqnorm(times, main="Maze Completion Times") > qqline(times, col="red")
One Sample t-test
data: times
t = -2.6319, df = 20, p-value = 0.01598
alternative hypothesis: true mean is not equal to 60
95 percent confidence interval:
46.03498 58.38406
sample estimates:
mean of x
52.20952
The p-value is less than 0.05 so reject the null hypothesis that the
true time to complete the maze is 60 seconds.
> # Indices with negative values mean
> # delete the value at that index.
> times <- times[-12]
> # Perform t-test again:
> t.test(times, mu=60)
One Sample t-test
data: times
t = -4.4568, df = 19, p-value = 0.0002705
alternative hypothesis: true mean is not equal to 60
95 percent confidence interval:
45.49475 54.76525
sample estimates:
mean of x
50.13
The p-value = 0.0002705 is much smaller after deleting the outlier because the standard deviation
of the times is smaller.29.3 28.2 29.1 28.7 28.9 28.5
> weights <- scan( ) 1: 29.3 28.2 29.1 28.7 28.9 28.5 7: Read 6 items > qqnorm(weights) > qqline(weights, col="red")
> t.test(weights, mu=28.3)
One Sample t-test
data: weights
t = 2.9445, df = 5, p-value = 0.03209
alternative hypothesis: true mean is not equal to 28.3
95 percent confidence interval:
28.36138 29.20529
sample estimates:
mean of x
28.78333
The p-value is less than 0.03209 so reject the null hypothesis.SoleMaterialA: 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 SoleMaterialB: 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6
SoleMaterialA: 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3 SoleMaterialB: 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6Load this data into the two data vectors materialA and materialB. Then perform an independent two-sample t-test with R.
> t.test(materialA, materialB, var.equal=TRUE)This test uses the test statistic
.
.> t.test(materialA, materialB, var.equal=FALSE)Computational details are not shown here, but here is the R output:
Welch Two Sample t-test
data: materialA and materialB
t = -0.36891, df = 17.987, p-value = 0.7165
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.745046 1.925046
sample estimates:
mean of x mean of y
10.63 11.04
For this independent two-sample t-test, the result is not significant because
p = 0.7165. The variability is greatly increased because we are ignoring the
pairing between the two groups.Early Bird: 23 28 27 33 26 30 22 25 26 Night Owl: 26 10 20 19 26 18 12 25We name the two data vectors morning and night.
> morning <- scan( )
1: 23 28 27 33 26 30 22 25 26
10:
Read 9 items
> night <- scan( )
1: 26 10 20 19 26 18 12 25
9:
Read 8 items
> sd(morning)
[1] 3.391165
> sd(night)
[1] 6.141196
> boxplot(morning, night, names=c("morning", "night"),
+ main="Boxplots for the Two Groups")

> t.test(morning, night, paired=FALSE, var.equal=FALSE)
Welch Two Sample t-test
data: morning and night
t = 2.9277, df = 10.626, p-value = 0.01422
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.755705 12.577629
sample estimates:
mean of x mean of y
26.66667 19.50000
Since the p-value is less than 0.05, we conclude that the difference of the
group means is significantly different.
> setwd("c:/workspace")
> df1 <- read.csv("chem-reaction.txt")
> print(df1)
mass time
1 5 40
2 7 120
3 12 180
4 16 210
5 20 240
> plot(df1$mass, df1$time, xlab="Mass (grams)", ylab="Time (sec)",
+ main="Reaction Time vs. Mass")

> model1 <- lm(time ~ mass, data=df1)
> summary(model1)
Call:
lm(formula = time ~ mass, data = df1)
Residuals:
1 2 3 4 5
-32.545 23.039 22.000 3.169 -15.662
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.506 29.687 0.388 0.7242
mass 12.208 2.245 5.437 0.0122 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 27.86 on 3 degrees of freedom
Multiple R-squared: 0.9079, Adjusted R-squared: 0.8771
F-statistic: 29.56 on 1 and 3 DF, p-value: 0.01222
The regression equation is time = 12.208 * 11.506.> r <- resid(model1) > boxplot(r, main="Boxplot of Residuals")

> r <- resid(model1) > p <- predict(model1) > plot(p, r, xlab="Predicted Values", + ylab="Residuals", main="Residual Plot")

> qqnorm(r, main="Normal Plot of Residuals") > qqline(r, col="red")

> setwd("c:/workspace")
> # Read data frame from input file.
> df <- read.csv("pendulum.txt")
> print(df)
LengthIn TimeFor15
1 5 10.87
2 10 15.42
3 15 18.66
4 20 21.65
5 25 24.21
6 30 26.60
7 35 28.14
8 40 29.99
9 45 32.34
10 50 33.61
> # Convert length in inches to meters
> # There are 2.54 cm per inch
> and 0.01 m per cm.
> len <- df$LengthIn * 2.54 * 0.01
> print(len)
[1] 0.127 0.254 0.381 0.508 0.635 0.762
[7] 0.889 1.016 1.143 1.270
> # Divide by 15 to obtain time for one period in sec.
> per <- df$TimeFor15 / 15
> print(per)
[1] 0.7246667 1.0280000 1.2440000 1.4433333
[5] 1.6140000 1.7733333 1.8760000 1.9993333
[9] 2.1560000 2.2406667
model1 <- lm(per ~ len) > summary(model1) Call: lm(formula = per ~ len) Residuals: Min 1Q Median 3Q Max -0.155067 -0.020467 0.004333 0.067533 0.085200 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.71747 0.05761 12.45 1.61e-06 *** len 1.27769 0.07311 17.48 1.17e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.08433 on 8 degrees of freedom Multiple R-squared: 0.9745, Adjusted R-squared: 0.9713 F-statistic: 305.4 on 1 and 8 DF, p-value: 1.172e-07The R-squared value is 0.9745, which is very good. However, we can do better. The plot of per vs. len shows a curvilinear relationship between these variables. Furthermore, the physics formula below shows that √L (sqrtlen = square root of len) is a better independent variable to use for predicting period.
> sqrtlen <- sqrt(len)
> model2 <- lm(per ~ sqrtlen)
> summary(model2)
Call:
lm(formula = per ~ sqrtlen)
Residuals:
Min 1Q Median 3Q Max
-0.0189833 -0.0114976 -0.0008825 0.0103954 0.0210963
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.03227 0.01631 1.979 0.0832 .
sqrtlen 1.97034 0.01951 100.975 1.03e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01478 on 8 degrees of freedom
Multiple R-squared: 0.9992, Adjusted R-squared: 0.9991
F-statistic: 1.02e+04 on 1 and 8 DF, p-value: 1.033e-13
The R2 value for Model 2 is 0.9992, which is better than
the R2 = 0.9745 from Model 1. Rewrite the physics formula
for predicting period from length like this:
| Physics | model2 |
|---|---|
| a = 2*pi/sqrt(g) = 2.006 | 1.97034 |
| b = 0 | 0.03227 |