<- read_csv("https://figshare.com/ndownloader/files/15070175") canines
8 Testing means with \(t\)-tests
8.1 Introduction
Previously, we talked about normal distributions as a method for comparing samples to overall populations or comparing individuals to overall populations. However, sample sizes can introduce some error, and oftentimes we may not have access to an entire population. In these situations, we need a better test that can account for this changing error and the effect of different sample sizes. This is especially important when comparing two samples to each other. We may find a small sample from one population and a small sample for another, and we want to determine if these came from the same overall population as effectively as possible.
The distribution that we commonly refer to as a \(t\)-distribution is also sometimes known as a “Student’s \(t\)-distribution” as it was first published by a man with the pseudonym of “Student”. Student was in fact William Sealy Gossett, an employee of the Guinness corporation who was barred from publishing things by his employer to ensure that trade secrets were not made known to their competitors. Knowing that his work regarding statistics was important, Gossett opted to publish his research anyway under his pseudonym.
8.2 Dataset
For all of the examples on this page, we will be using a dataset on the morphology of canine teeth for identification of predators killing livestock (Courtenay 2019).
We want to set up some of these columns as “factors” to make it easier to process and parse in R. We will look at the column OA
for these examples. Unfortunately, it is unclear what exactly OA
stands for since this paper is not published at the present time.
$Sample <- as.factor(canines$Sample)
canines
# we will be examining the column "OA"
$OA <- as.numeric(canines$OA)
canines
summary(canines)
Sample WIS WIM WIB
Dog :34 Min. :0.1323 Min. :0.1020 Min. :0.03402
Fox :41 1st Qu.:0.5274 1st Qu.:0.3184 1st Qu.:0.11271
Wolf:28 Median :1.1759 Median :0.6678 Median :0.25861
Mean :1.6292 Mean :1.0233 Mean :0.44871
3rd Qu.:2.4822 3rd Qu.:1.5194 3rd Qu.:0.74075
Max. :4.8575 Max. :3.2423 Max. :1.51721
D RDC LDC OA
Min. :0.005485 Min. :0.05739 Min. :0.02905 Min. :100.7
1st Qu.:0.034092 1st Qu.:0.28896 1st Qu.:0.22290 1st Qu.:139.2
Median :0.182371 Median :0.61777 Median :0.55985 Median :149.9
Mean :0.250188 Mean :0.88071 Mean :0.84615 Mean :148.4
3rd Qu.:0.361658 3rd Qu.:1.26417 3rd Qu.:1.26754 3rd Qu.:158.0
Max. :1.697461 Max. :3.02282 Max. :3.20533 Max. :171.5
8.3 \(t\)-distribution
For these scenarios where we are testing a single sample mean from one or more samples we use a \(t\)-distributions. A \(t\)-distribution is a specially altered normal distribution that has been adjusted to account for the number of individuals being sampled. Specifically, a \(t\)-distributions with infinite degrees of freedom is the same as a normal distribution, and our degrees of freedom help create a more platykurtic distribution to account for error and uncertainty. The distribution can be calculated as follows:
\[ t = \frac{\Gamma(\frac{v+1}{2})}{\sqrt{\pi \nu}\Gamma(\frac{\nu}{2})}(1+\frac{t^2}{\nu})^{-\frac{(v+1)}{2}} \]
These \(t\)-distributions can be visualized as follows:
For all \(t\)-tests, we calculate the degrees of freedom based on the number of samples. If comparing values to a single sample, we use \(df = n -1\). If we are comparing two sample means, then we have \(df = n_1 + n_2 -2\).
Importantly, we are testing to see if the means of the two distributions are equal in a \(t\)-test. Thus, our hypotheses are as follows:
\(H_0: \mu_1 = \mu_2\) or \(H_0: \mu_1 - \mu_2 = 0\)
\(H_A: \mu_1 \ne \mu_2\) or \(H_A: \mu_1 - \mu_2 \ne 0\)
When asked about hypotheses, remember the above as the statistical hypotheses that are being directly tested.
In R, we have the following functions to help with \(t\) distributions:
dt
: density function of a \(t\)-distributionpt
: finding our \(p\) value from a specific \(t\) in a \(t\)-distributionqt
: finding a particular \(t\) from a specific \(p\) in a \(t\)-distributionrt
: random values from a \(t\)-distribution
All of the above arguments required the degrees of freedom to be declared. Unlike the normal distribution functions, these can not be adjusted for your data; tests must be performed using t.test
.
8.4 \(t\)-tests
We have three major types of \(t\)-tests:
One-sample \(t\)-tests: a single sample is being compared to a value, or vice versa
Two-sample \(t\)-tests: two samples are being compared to one another to see if they come from the same population
Paired \(t\)-tests: before-and-after measurements of the same individuals are being compared. This is necessary to account for a repeat in the individuals being measured, and different potential baselines at initiation. In this case, we are looking to see if the difference between before and after is equal to zero.
We also have what we call a “true” \(t\)-test and “Welch’s” \(t\)-test. The formula for a “true” \(t\) is as follows:
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} \]
Where \(s_p\) is based on the “pooled variance” between the samples. This can be calculated as follows:
\[ s_p = \sqrt{\frac{(n_1-1)(s_1^2)+(n_2-1)(s_2^2)}{n_1+n_2 -2}} \]
Whereas the equation for a “Welch’s” \(t\) is:
\[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}} \]
Welch’s \(t\) also varies with respect to the degrees of freedom, calculated by:
\[ df = \frac{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}{\frac{(\frac{s_1^2}{n_1})^2}{n_1-1}+\frac{(\frac{s_2^2}{n_2})^2}{n_2-1}} \]
OK, so why the difference?
A \(t\)-test works well under a certain set of assumptions, include equal variance between samples and roughly equal sample sizes. A Welch’s \(t\)-test is better for scenarios with unequal variance and small sample sizes. If sample sizes and variances are equal, the two \(t\)-tests should perform the same.
Because of this, some argue that “Welch’s” should be the default \(t\)-test, and in R, Welch’s is the default \(t\)-test. If you want to specify a “regular” \(t\)-value, you will have to set the option var.equal = TRUE
. (The default is var.equal = FALSE
).
8.4.1 One-sample \(t\)-tests
Let’s look at the values of all of the dog samples in our canines
dataset.
<- canines |>
dogs filter(Sample == "Dog") |>
select(Sample, OA)
<- mean(dogs$OA)
xbar <- sd(dogs$OA)
sd_dog <- nrow(dogs) n
Now we have stored all of our information on our dog dataset. Let’s say that the overall populations of dogs a mean OA score of \(143\) with a \(\sigma = 1.5\). Is our sample different than the overall population?
t.test(x = dogs$OA,
alternative = "two.sided",
mu = 143)
One Sample t-test
data: dogs$OA
t = -0.74339, df = 33, p-value = 0.4625
alternative hypothesis: true mean is not equal to 143
95 percent confidence interval:
138.4667 145.1070
sample estimates:
mean of x
141.7869
As we can see above, we fail to reject the null hypothesis that our sample is different than the overall mean for dogs.
8.4.2 Two-sample \(t\)-tests
Now let’s say we want to compare foxes and dogs to each other. Since we have all of our data in the same data frame, we will have to subset our data to ensure we are doing this properly.
# already got dogs
<- dogs$OA
dog_oa
<- canines |>
foxes filter(Sample == "Fox") |>
select(Sample, OA)
<- foxes$OA fox_oa
Now, we are ready for the test!
t.test(dog_oa, fox_oa)
Welch Two Sample t-test
data: dog_oa and fox_oa
t = -6.3399, df = 72.766, p-value = 1.717e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-19.62289 -10.23599
sample estimates:
mean of x mean of y
141.7869 156.7163
As we can see, the dogs and the foxes significantly differ in their OA
measurement, so we reject the null hypothesis that \(\mu_{dog} = \mu_{fox}\).
8.4.3 Paired \(t\)-tests
I will do a highly simplified version of a paired \(t\)-test here just for demonstrations sake. Remember that you want to used paired tests when we are looking at the same individuals at different points in time.
# create two random distributions
# DEMONSTRATION ONLY
# make repeatable
set.seed(867)
<- rnorm(20,0,1)
t1 <- rnorm(20,2,1) t2
Now we can compare these using paired = TRUE
.
t.test(t1, t2, paired = TRUE)
Paired t-test
data: t1 and t2
t = -7.5663, df = 19, p-value = 3.796e-07
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-3.107787 -1.760973
sample estimates:
mean difference
-2.43438
As we can see, we reject the null hypothesis that these distributions are equal in this case. Let’s see how this changes though if we set paired = FALSE
.
t.test(t1, t2)
Welch Two Sample t-test
data: t1 and t2
t = -8.1501, df = 37.48, p-value = 8.03e-10
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.039333 -1.829428
sample estimates:
mean of x mean of y
-0.07258938 2.36179080
This value differs because, in a paired test, we are looking to see if the difference between the distributions is \(0\), while in the independent (standard) test we are comparing the overall distributions of the samples.
8.5 Wilcoxon tests
When data (and the differences among data) are non-normal, they violate the assumptions of a \(t\)-test. In these cases, we have to do a Wilcoxon test (also called a Wilcoxon signed rank test). In R, the command wilcox.test
also includes the Mann-Whitney \(U\) test for unpaired data and the standard Wilcoxon test \(W\) for paired data.
8.5.1 Mann-Whitney \(U\)
For this test, we would perform the following procedures to figure out our statistics:
- Rank the pooled dataset from smallest to largest, and number all numbers by their ranks
- Sum the ranks for the first column and the second column
- Compute \(U_1\) and \(U_2\), comparing the smallest value to a Mann-Whitney \(U\) table.
The equations for these statistics are as follows, where \(R\) represents the sum of the ranks for that sample:
\[ U_1 = n_1n_2+\frac{n_1(n_1+1)}{2}-R_1 \]
\[ U_2 = n_1n_2 + \frac{n_2(n_2+1)}{2} - R_2 \]
In R, this looks like so:
wilcox.test(t1, t2, paired = FALSE)
Wilcoxon rank sum exact test
data: t1 and t2
W = 11, p-value = 2.829e-09
alternative hypothesis: true location shift is not equal to 0
8.5.2 Wilcoxon signed rank test
For paired samples, we want to do the Wilcoxon signed rank test. This is performed by:
- Finding the difference between sampling events for each sampling unit.
- Order the differences based on their absolute value
- Find the sum of the positive ranks and the negative ranks
- The smaller of the values is your \(W\) statistic.
In R, this test looks as follows:
wilcox.test(t1, t2, paired = TRUE)
Wilcoxon signed rank exact test
data: t1 and t2
V = 0, p-value = 1.907e-06
alternative hypothesis: true location shift is not equal to 0
8.6 Confidence intervals
In \(t\) tests, we are looking at the difference between the means. Oftentimes, we are looking at a confidence interval for the difference between these means. This can be determined by:
\[ (\bar{x}_1-\bar{x}_2) \pm t_{crit}\sqrt{\frac{s_p^2}{n_1}+\frac{s_p^2}{n_2}} \]
This is very similar to the CI we calculated with the \(Z\) statistic. Remember that we can use the following function to find our desired \(t\), which requires degrees of freedom to work:
qt(0.975, df = 10)
[1] 2.228139
8.7 Homework: Chapter 9
For Chapter 9, complete problems 9.1, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, and 9.10. For problems 9.3 - 9.8, be sure to state the null and alternative hypotheses and whether the test is one- or two-tailed.
8.8 Homework: Chapter 10
Two-sample means are practiced in Chapter 10. Please see Canvas for more information.