Genetic draft, selective interference, and population genetics of rapid adaptation


Coalescence with positive selection

FFPopSim can be used to track the genealogy of the population at specified loci. This feature allows to explore how the shape of genealogical trees depends on population parameters. The script does exactly that (download it here). Typical output of the script is reproduced below and discussed briefly. The script uses the infinite sites mode of FFPopSim, in which L loci are kept polymorphic by introducing a new mutation whenever the previous mutation fixed or was lost. The crucial parts of the script are the following:
L = 1000   	#number of segregating sites
s = 1e-2 	#single site effect
N = 2000 	#population size
r = 0.0  	#outcrossing rate

sample_size=30	#number of individuals whose genealogy is looked at
nsamples = 3	#number of trees
burnin = 2000 	#a few times the expected coalescence time
dt = 1000 		#time between samples
These lines set the parameters. In this case the genome has length 1000 and every mutation at any locus increase fitness by 0.01. The population size is 2000 and the population is asexual (no outcrossing). Furthermore, it is specified that we will take 3 samples of size 30 after a burnin of 2000 generations. Different samples are separated in time by 1000 generations. Following this, the population is set up as usual. We draw attention to the set-up of the fitness landscape.
#set the effect sizes of the mutations that are injected (the same at each site in this case)
The additive effects of each locus are set by passing an array of length L to the function set_fitness_additive(). Internally, FFPopSim denotes the alleles at a locus by +/- 1. Hence the effects sizes need to be smaller by a factor 0.5 to correspond to s in a 0/1 model. Next, we need to specify which locus to track. In this case, we track only the locus at L/2
#track the genealogy at a central locus L/2 (which one doesn't matter in the asexual case)
The population needs to equilibrate by simply evolving it for enough generations, which is not shown here. The more interesting part is retrieving and drawing the trees.
for si in xrange(nsamples):
	#evolve a while before sampling the next tree

	#draw a sample from the population, convert to a BioPython tree object and plot
	tree = pop.genealogy.get_tree(L/2)
	subtree = tree.create_subtree_from_keys(rd.sample(tree.leafs,sample_size)).to_Biopython_tree()
	Phylo.draw(subtree,label_func=lambda x:"")

These few lines of code access the tree at locus L/2, which is stored in the genealogy object of the population pop. This is the tree of the entire population. Each tree has a method that allows the user to produce a subtree corresponding to a subset of the leafs, specified by their keys. This subset of size sample_size is chosen at random from all leafs. Subsequently, the tree is converted to a BioPython tree and BioPython is used to ladderize the tree and draw it. The entire script can be viewed genealogies_with_selection.html or downloaded

Strong positive selection

To demonstrate how selection speeds up coalescence, we simulate a population of size N=10000 with 1000 loci, each of which contributes a selection differential of s=0.01.

The most recent common ancestor is found in roughly 200 generations, despite the fact that the population size is N=10000. Furthermore, the shape of the trees is rather different. There are long terminal branches and often very skewed branching. Sometimes, several lineages coalesce almost at the same time, corresponding to approximate multiple mergers.

Neutral comparison

The following shows three trees sampled from an almost neutrally evolving population of size 200.

As expected for approximately neutral coalescence, most coalescence happens early and the trees are rather balanced. The time to the most recent common ancestor is of the order of the population size. Open the script in your favorite text editor, change parameters, and rerun to explore.