12  Pick the test

Author

University of Nebraska at Kearney Biology

12.1 Picking the test

A major component of the final will be picking the correct test to run on some data. Here, I cover the specific ways in which you can determine what test to use.

Please use this web page as an interactive way to pick the test.

12.2 Exceptions

Below is a walk through for the most common statistical analyses, but keep in mind there are a few “less common” ones that we are using as well:

  • Binomial test - if we are looking at something with discrete outcomes - like coin tosses, die rolls, etc. - we are doing a binomial test to determine the probability of a specific outcome. You can do this with binom.test.

  • Poisson test - if we are looking at the probability of obtaining certain counts over events - specifically, looking at the probability of rare events - we will use a Poisson. You can do this with poisson.test.

12.3 Overview of picking the test

Below is an overview of picking basic statistical tests. There are more complex tests, but these are the main ones to consider for exams in this course.

Which test to pick for which combinations of variables.
Categorical Explanatory Continuous Explanatory
Categorical Response
  • \(\chi^2\) tests, especially if count data for category

    • Yate’s correction auto-applied for 2x2 tables
  • Fisher test is a 2x2 table

  • Null is that counts match a known proportion (goodness-of-fit) or counts match each other (independence)

  • Logistic regression, modeling categorical responses across continuous variables.

    • Not covered in BIOL 305 at present
Continuous Response
  • Single mean to population

    • This is a \(Z\)-score comparison, when the population parameters are known

      • \(H_0: \bar{x} = \mu\)
    • Almost always better to resort to \(t\)-test comparison if unsure is a population - \(t\) approaches a \(Z\) as \(df \rightarrow \infty\)

      • \(H_0: \mu_1 = \mu_2\)
  • Two measurements

    • Remember - all \(t\)-tests default to Welch’s in R, assuming unequal variance

      • When \(\sigma^2_1 = \sigma^2_2\), Welch’s \(t\)-test is the same as a standard \(t\)-test

      • Usually better to use Welch’s, but justify reasoning

    • One-sample \(t\)-test, comparing the two groups

      • \(H_0: \mu_1 = \mu_2\)
  • Two samples

    • Two-sample \(t\)-test, comparing the two samples

      • \(H_0: \mu_1 = \mu_2\)
  • Paired samples

    • Special case - used when there are repeated measurements of the same sampling units

      • Need to set paired = TRUE

      • \(H_0: \mu_d = 0\)

  • \(\ge\) 3 samples

    • ANOVA - analysis of variance across multiple samples, computing variance within samples and between samples

      • For all ANOVA, \(H_0: \mu_1 = \mu_2 = ... \mu_i\)

      • Single type of measurement is a one-way ANOVA

      • When things are “blocked” into categories - like litters - that are unique per row, use a randomized block ANOVA (remember to put factor() around anything you are worried won’t be read as categorical!)

        • aov(response ~ explanatory + block, data)
      • When things are individuals being repeatedly measured, use a repeated measures ANOVA (mathematically like randomized block ANOVA)

        • aov(response ~ explanatory + individual, data)
      • When we are looking at more than one variable at once, you are doing an interactive ANOVA

        • aov(response ~ explanatory1 + explanatory2 + explanatory1*explanatory2)
  • When looking at if there is a relationship, use correlation

    • \(H_0: \rho = 0\)

    • Evaluated using a \(t\) statistic

  • When looking at what the relationship is, use linear regression

    • Line formula is \(y = \beta x + \alpha + \epsilon_i\)

    • Very similar to your high-school \(y = mx +b\)

    • \(H_0: \beta = 0\)

  • Remember - linear regression works like an ANOVA, and has similar outputs

    • Check the ANOVA page for more information!

12.4 Another method - checklist

Feel free to go through the following headings to also help you pick a test.

12.4.1 Explanatory Variable

The explanatory variable is continuous and numeric. Pick this one for a continuous measurement variable, for example, such as Longitude, concentration, or other ratio or integer data.

The explanatory variable is discontinuous and categorical. In these cases, the explanatory variable is a condition, like a control and a treatment.

12.4.2 Continuous explanatory variable

12.4.2.1 Continuous response variable

If you have a continuous response variable, then you need to see what the question is asking:

  • Is there a relationship?

If you are looking at a problem and it is simply asking if there is a relationship, you are looking at a correlation analysis.

  • What is the relationship?

If you are asking what the relationship is or looking to be able to predict a value based on what you know, you are doing a linear regression. Note there are other kinds of regression, but in this class, we focus on linear regression.

If we have multiple response variables, we can do multiple regression, which we do not cover here.

12.4.2.2 Discrete response variable

If you are looking at a discrete response variable, such as a state of 1 or 0 in response to a certain amount of stimulus, then you are doing a logistic regression. We did not cover this analysis in this class.

12.4.3 Discrete explanatory variable

For discrete explanatory variables, we are often looking at categorical treatments or distinct groups, like species or geographic locations.

12.4.3.1 Continuous response variable

For a continuous response variable, we need to ask ourselves how many categories we are dealing with.

  • If we are dealing with two categories or two measurements from the same individual, we are using a \(t\)-test.

Make sure you check the t-test page to understand what kind of t-test is being performed. For repeated measurements from the same individuals or populations under different conditions, we have the paired t-test. Otherwise, we have the Welch’s t-test as the default in R. NOTE that the default assumes unequal variance; you must set var.equal = TRUE to perform a “true” t-test. Make sure you familiarize yourself with why our code is assuming variances are unequal.

If we know what the population is, we will use a \(Z\)-score, but bear in mind that we will almost always use a \(t\)-test to account for error. A \(t\)-test with infinite degrees of freedom is the same as a \(Z\)-score, so it is better to default to a \(t\)-test.

  • If we have three or more categories or treatments then we need to perform an ANOVA.

If we are simply comparing multiple groups, we are performing a one-way ANOVA. We also need to make sure we aren’t doing some sort of factorial ANOVA, repeated-measures ANOVA, or interactive ANOVA. Please read the ANOVA page to ensure you are using the correct format.

Remember to label ANOVA plots with letters to indicate the separate groups.

If we have multiple response variables, we can use a MANOVA; we do not cover that in this class.

12.4.3.2 Discrete response variable

If our response variable and our explanatory variable are discrete (i.e., categorical or nominal), then we are doing a \(\chi^2\) test. This tests looks at counts in different categories. For example, looking at proportions of men and women who do and do not smoke would be a classic \(\chi^2\) test.