Coalescent theory

 

In genetics, coalescent theory is a retrospective model of population genetics. It employs a sample of individuals from a population to trace all alleles of a gene shared by all members of the population to a single ancestral copy, known as the most recent common ancestor (MRCA; sometimes also termed the co-ancestor to emphasize the coalescent relationship). The inheritance relationships between alleles are typically represented as a gene genealogy, similar in form to a phylogenetic tree. This gene genealogy is also known as the coalescent; understanding the statistical properties of the coalescent under different assumptions forms the basis of coalescent theory. The coalescent runs models of genetic drift backward in time to investigate the genealogy of antecedents. In the most simple case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure. Advances in coalescent theory, however, allow extension to the basic coalescent, and can include recombination, selection, and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis. The mathematical theory of the coalescent was originally developed in the early 1980s by John Kingman.

 

Theory

Consider two distinct haploid organisms who differ at a single nucleotide. By tracing the ancestry of these two individuals backwards there will be a point in time when the most recent common ancestor (MRCA) is encountered and the two lineages will have coalesced.

 

Time to coalescence

A useful analysis based on coalescence theory seeks to predict the amount of time elapsed between the introduction of a mutation and the arising of a particular allele or gene distribution in a population. This time period is equal to how long ago the most recent common ancestor existed.

The probability that two lineages coalesce in the immediately preceding generation is the probability that they share a parent. In a diploid population of constant size with 2N copies of each locus, there are 2N "potential parents" in the previous generation, so the probability that two alleles share a parent is 1/(2N) and correspondingly, the probability that they do not coalesce is 1 − 1/(2N).

 

Graphical representation

Coalescents can be visualised using dendrograms which show the relationship of branches of the population to each other. The point where two branches meet indicates a coalescent event.

 

Applications

Disease gene mapping

The utility of coalescent theory in the mapping of disease is slowly gaining more appreciation; although the application of the theory is still in its infancy, there are a number of researchers who are actively developing algorithms for the analysis of human genetic data that utilise coalescent theory.

 

History

Coalescent theory is a natural extension of the more classical population genetics concept of neutral evolution and is an approximation to the Fisher-Wright (or Wright-Fisher) model for large populations. It was ‘discovered’ independently by several researchers in the 1980’s, but the definitive formalisation is attributed to Kingman. Major contributions to the development of coalescent theory have been made by Peter Donnelly, Robert Griffiths, Richard R Hudson and Simon Tavaré. This has included incorporating variations in population size, recombination and selection. In 1999 Jim Pitman and Serik Sagitov independently introduced coalescent processes with multiple collisions of ancestral lineages. Shortly later the full class of exchangeable coalescent processes with simultaneous multiple mergers of ancestral lineages was discovered by Martin Möhle and Serik Sagitov and Jason Schweinsberg.