library(tidyverse)
<- read_csv("https://raw.github.com/jacobccooper/biol305_unk/main/datasets/cinnyris.csv")
cinnyris
<- cinnyris |>
male_cinnyris mutate(Genus = as.factor(Genus)) |>
mutate(Species = as.factor(Species)) |>
mutate(Subspecies = as.factor(Subspecies)) |>
filter(Age == "Adult") |>
filter(Sex == "Male") |>
select(-`Collection Name`, -Locality, -Country, -Age,
-`Notes / Existing Damage Precluding Measurements`) |>
na.omit()
15 Multivariate methods
15.1 Multivariate analyses
Multivariate analyses are commonly used to help understand the variation that exists in large, multidimensional datasets. These analyses can help us understand relationships among individuals and help us determine how many groups (and what groups) exist in a dataset.
For this particular walkthrough, we will be using data on the Northern Double-collared Sunbird Cinnyris reichenowi sensu lato complex from equatorial Afromontane regions (Cooper et al. 2021a, b).
15.2 Principal components
Click here for an overview of PCA
A very common method of visualizing variation in population is using principal components analyses (PCA). Principal components help collapse multidimensional data into a few dimensions that explain the most variation possible across the dataset.
For principal components, we want to use the library vegan
.
library(vegan)
Loading required package: permute
The function for principal components in vegan
is rda
, which will only work on a whole dataframe with no non-numeric data. Thus, we must create a new object of only numerical data and then re-merge it with the original dataframe. The row order is maintained, so this will not be an issue.
<- male_cinnyris |>
male.cinnyris.data select(-Genus, -Species, -Subspecies,
-Collection, -Catalog, -Locality2, -Sex)
<- rda(male.cinnyris.data)
cinnyris.pca
cinnyris.pca
Call: rda(X = male.cinnyris.data)
Inertia Rank
Total 23
Unconstrained 23 6
Inertia is variance
Eigenvalues for unconstrained axes:
PC1 PC2 PC3 PC4 PC5 PC6
15.620 4.443 1.940 0.877 0.076 0.040
Above, we can see that we now have a complex object that contains our PCA. We can use the function biplot
to see how our data look and how our variables contributed to the principal components.
biplot(cinnyris.pca)
ordihull(cinnyris.pca,
group = male_cinnyris$Subspecies)
We can also do a ggplot
to color our data and see how it works. Let’s combine our datasets so we can do this.
<- cbind(male_cinnyris,
cinnyris.male.pca # extract PC values
$CA$u)
cinnyris.pca
ggplot(cinnyris.male.pca, aes(x = PC1, y = PC2)) +
geom_point() +
theme_classic()
To the naked eye, these data seem to fall into two major clusters. This can be examined again, this time assigning colors to one of our explanatory variables: subspecies.
ggplot(cinnyris.male.pca, aes(x = PC1, y = PC2, colour = Subspecies)) +
geom_point() +
stat_ellipse() +
theme_classic()
Too few points to calculate an ellipse
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_path()`).
As we can see, many of these birds fall into one of the two groups. Only one population - genderuensis - seems to really bridge both groups.
The following shows how each variable contributed to each principal component:
$CA$v cinnyris.pca
PC1 PC2 PC3
Right.wing.chord -0.61936877 0.498711666 0.60193981
Tail.length -0.69190828 -0.709947486 -0.12739287
Culmen.length -0.33598591 0.419621089 -0.62835637
Bill.depth..base.of.feathers.on.mandible. -0.01633872 0.004353418 -0.01231269
Bill.width..base.of.feathers.on.maxilla. -0.05865394 0.061714372 -0.09962101
Left.Tarsus -0.14509132 0.259522190 -0.46532983
PC4 PC5
Right.wing.chord 0.07304690 -0.001241684
Tail.length 0.02988534 -0.008716903
Culmen.length -0.53855805 -0.161236784
Bill.depth..base.of.feathers.on.mandible. -0.05951994 0.216257889
Bill.width..base.of.feathers.on.maxilla. -0.08219923 0.962867346
Left.Tarsus 0.83272259 0.006646209
PC6
Right.wing.chord -0.0002699604
Tail.length -0.0062803042
Culmen.length -0.0125622441
Bill.depth..base.of.feathers.on.mandible. 0.9742958451
Bill.width..base.of.feathers.on.maxilla. -0.2212610835
Left.Tarsus 0.0399226008
We can also see the percent variance explained by each axis, which is contained within the eigenvectors of the PCA:
$CA$eig/sum(cinnyris.pca$CA$eig) cinnyris.pca
PC1 PC2 PC3 PC4 PC5 PC6
0.679234583 0.193214694 0.084367157 0.038143954 0.003291025 0.001748588
The first principal component explains 68% of the variation, and the second 19% of the variation.