Comparison of Finite and Infinite Mixture Models for Capturing Compositional Heterogeneity Across Sites

Public Deposited
Resource Type
  • Phylogenetic modelling of evolutionary processes across sites from sequence alignments has garnered increasing attention over the last few decades. One approach adopts the view that the heterogeneity across observations is a result of the data set having been emitted from several different models, each drawn from a distribution. Finite mixture models provide discretizations of the unknown distribution into a set of sub-models, or components. Choosing a level of discretization is done from a set of likelihood-based model comparisons. We use Bayesian cross-validation to compare a range of finite mixture models, along with the infinite mixture modelling approach known as `CAT', and gamma-distributed rates across sites approach. Using simulations and real alignments, our findings indicate that the improvement in model-fit from finite mixture models is attained when the number of components is between 20 and 60. The magnitude of improvement is dependant on whether or not the gamma approach is invoked.

Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Rights Notes
  • Copyright © 2018 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
Date Created
  • 2018


In Collection: