haploid_highd Class Reference ¶

haploid_highd offers a number of methods, listed below. Global functions related to haploid_highd are shown after the methods.

Contents

haploid_highd Class Reference
- Class methods
- Global functions related to haploid_highd

Class methods ¶

class haploid_highd(L=0, rng_seed=0, number_of_traits=1, all_polymorphic=False)¶

Class for high-dimensional population genetics (genomes larger than ~20 loci).

This class is the main object for simulating the evolution of populations with many loci (more than ~20). The class offers a number of functions, but an example will explain the basic idea:

######################################
#  EXAMPLE SCRIPT FOR HAPLOID_HIGHD  #
######################################
import numpy as np
import matplotlib.pyplot as plt
import FFPopSim as h
c = h.haploid_highd(300)       # 300 loci
pop.set_wildtype(1000)         # start with 1000 wildtype individuals
pop.mutation_rate = 1e-4       # mutation rate per site per generation
pop.outcrossing_rate = 1e-1    # probability of sexual reproduction per gen
pop.crossover_rate = 1e-2      # probability of crossover per site per gen
pop.evolve(100)                # evolve for 100 generations
c.plot_divergence_histogram()
plt.show()
######################################

Populations can have a number of phenotypic traits that contribute to the fitness of each individual. The function that calculates fitness from the phenotype identifies fitness with the first trait only by default. The user is, however, free to subclass haploid_highd in C++ (as it is done in hivpopulation) and implement their own phenotype -> fitness function.

In addition, the trait landscapes describe the genotype -> phenotype maps. These can be set directly from Python (since the genotypic space has a finite number of elements).

Note: fitness is not a phenotypic trait directly, but rather a function of all phenotypic traits together.

__init__(L=0, rng_seed=0, number_of_traits=1, all_polymorphic=False)¶

Construct a high-dimensional population with certain parameters.

Parameters:

L: number of loci
rng_seed: seed for the random generator. If zero (default) pick a random number
number_of_traits: number of phenotypic traits, defaults to one
all_polymorphic: option to use an infinite-sites model tracking ancestral alleles

(only available with a single phenotypic trait and zero mutation rate)

__str__() <==> str(x)¶

__repr__() <==> repr(x)¶

copy(rng_seed=0)¶

Copy population into new instance.

Parameters:

rng_seed: random number to initialize the new population

dump(filename, format='bz2', include_genealogy=False)¶

Dump a population to binary file, for later use.

Parameters:

filename: the path to the file where to store the information
format: one of ‘bz2’ or ‘plain’. Choose the former if you want compression.
include_genealogy: if True, the multi_locus_genealogy is stored as well (if present).

Note

The population can be reloaded using the function FFPopSim.load_haploid_highd.

status()¶: Print a status list of the population parameters

L¶: Number of loci (read-only)

number_of_loci¶: Number of loci (read-only)

N¶: Population size (read-only)

population_size¶: Population size (read-only)

generation¶: Current generation (read-only)

number_of_clones¶: Number of non-empty clones (read-only)

number_of_traits¶: Number of traits (read-only)

max_fitness¶: Maximal fitness in the population (read-only)

participation_ratio¶: Participation ratio (read-only)

circular¶: is the genome circular?

carrying_capacity¶: current carrying capacity of the environment

recombination_model¶

Model of recombination to use

Available values:

FFPopSim.FREE_RECOMBINATION: free reassortment of all loci between parents
FFPopSim.CROSSOVERS: linear chromosome with crossover probability per locus

outcrossing_rate¶: outcrossing rate (probability of sexual reproduction per generation)

crossover_rate¶: crossover rate (probability of crossover per site per generation)

mutation_rate¶: mutation rate (per site per generation)

trait_weights¶: weight of each trait on fitness

Note

Fitness is updated automatically when the weights are changed.

get_clone(n)¶

Get a single clone

Parameters:

n: index of the clone

Returns:

clone: the n-th clone in the population

set_wildtype(N)¶

Initialize a population of wildtype individuals

Parameters:

N: the number of individuals

Note

the carrying capacity is set to the same value if still unset.

set_allele_frequencies(frequencies, N)¶

Initialize the population according to the given allele frequencies in linkage equilibrium.

Parameters:

frequencies: an array of length L with all allele frequencies
N: set the population size and, if still unset, the carrying capacity to this value

set_genotypes(genotypes, counts)¶

Initialize population with fixed counts for specific genotypes.

Parameters:

genotypes: list of genotypes to set. Genotypes are lists of alleles, e.g. [[0,0,1,0], [0,1,1,1]] for genotypes 0010 and 0111
counts: list of the number at which each of those genotypes it to be present

Note

the population size and, if unset, the carrying capacity will be set as the sum of the counts.

Example: if you want to initialize 200 individuals with genotype 001 and: 300 individuals with genotype 110, you can use set_genotypes([[0,0,1], [1,1,0]], [200, 300])

evolve(gen=1)¶

Evolve for some generations.

Parameters:

gen: number of generations, defaults to one

bottleneck(size_of_bottleneck)¶

Make the population undergo a bottleneck

Parameters:

size_of_bottleneck: the number of individuals at the bottleneck

flip_single_locus(locus)¶

Take a random clone, flip the allele at the selected locus and create a new clone.

Parameters:

locus: locus to flip

Returns:

index: index of the new clone with the flipped locus

calc_stat()¶: Calculate trait and fitness statistics for the population

get_divergence_statistics(n_sample)¶

Get the mean and variance of the divergence in the population.

Parameters:

n_sample: number of individuals to sample at random from the population

Returns:

stat: structure with mean and variance of divergence in the population

get_diversity_statistics(n_sample)¶

Get the mean and variance of the diversity in the population.

Parameters:

n_sample: number of individuals to sample at random from the population

Returns:

stat: structure with mean and variance of diversity in the population

get_trait_statistics(t)¶

Get the mean and variance of a trait in the population.

Parameters:

t: number of the trait whose statistics are to be computed

Returns:

stat: structure with mean and variance of the trait in the population

get_fitness_statistics()¶

Get the mean and variance of the fitness in the population.

Returns:

stat: structure with mean and variance of the fitness in the population

get_trait_covariance(t1, t2)¶

Get the covariance of two traits in the population.

Parameters:

t1: first trait
t2: second trait

Returns:

cov: the covariance of the two traits

get_allele_frequency(locus)¶

Get the frequency of the + allele at the selected locus

Parameters:

locus: locus whose frequency of the + allele is to be returned

Returns:

frequency: allele frequency in the population

get_allele_frequencies()¶: Get all allele frequencies

get_pair_frequency(locus1, locus2)¶

Get the joint frequency of two + alleles

Parameters:

locus1: first locus
locus2: second locus

Returns:

the joint frequency of the + alleles

get_chi(locus)¶

Get \(\chi_i\) of an allele

Parameters:

locus: locus whose chi is to be computed

Returns:

the chi of that allele, \(\chi_i := \left<s_i\right>\), where \(s_i \in \{\pm1\}\).

get_chi2(locus1, locus2)¶

Get \(\chi_{ij}\)

Parameters:

locus1: first locus
locus2: second locus

Returns:

the linkage disequilibiurm between them, i.e. \(\chi_{ij} := \left<s_i s_j\right> - \chi_i \cdot \chi_j\).

get_LD(locus1, locus2)¶

Get linkage disequilibrium

Parameters:

locus1: first locus
locus2: second locus

Returns:

the linkage disequilibiurm between them, i.e. \(LD := 1 / 4 \left[\left<s_i s_j\right> - \chi_i \cdot \chi_j\right]\).

get_moment(locus1, locus2)¶

Get moment of two alleles in the -/+ basis

Parameters:

locus1: first locus
locus2: second locus

Returns:

the second moment, i.e. \(\left<s_i s_j\right>\), where \(s_i, s_j \in \{-1, 1\}\).

add_genotype(genotype, n=1)¶

Add new individuals to the population with certain genotypes

Parameters:

genotype: genotype to add to the population (Boolean list)
n: number of new individuals carrying that genotype

get_trait_additive(t)¶

Get an array with the additive coefficients of all loci of a trait.

Parameters:

t: number of the trait

Returns:

coefficients: array of additive coefficients for the selected trait

clear_trait(t)¶

Clear a trait landscape.

Parameters:

t: number of the trait to be cleared

clear_traits()¶: Clear all trait landscapes

clear_fitness()¶: Shortcut for clear_trait when there is only one trait

set_trait_additive(coefficients, t)¶

Set the additive part of a trait

Parameters:

coefficients: array of coefficients for the trait (of length L). All previous additive coefficents are erased
t: number of the trait to set

set_fitness_additive(coefficients)¶: Shortcut for set_trait_additive when there is only one trait

add_trait_coefficient(value, loci, t=0)¶

Add a coefficient to the trait landscape.

Parameters:

value: value of the coefficient
loci: array/list of loci indexed by the coefficient.
t: number of the trait to be changed

Example: to set a second-order epistatic term \(t_{ij} = 0.1\), use add_trait_coefficient(0.1, [i, j]).

Warning

the -/+ basis is used throughout the library. If you are used to the 0/1 basis, keep in mind that the interaction series-expansion is different.

add_fitness_coefficient(value, loci)¶: Shortcut for add_trait_coefficient when there is only one trait

get_trait_epistasis(t=0)¶

Get the epistatic terms of the genotype/phenotype map of the chosen trait.

Parameters:

t: trait number

Returns:

coefficients: tuple of coefficients, with a value and a tuple of loci

Note

This function is designed to work well in conjunction with add_trait_coefficient.

set_random_trait_epistasis(epistasis_std, t=0)¶

Set a random epistatic term in the genotype-phenotype map. This is meant as an approximation to multi-locus epistasis to which many locus sets contribute. It assigns to each genotype a reprodrucible fitness component drawn from a Gaussian distribution.

Parameters:

epistasis_std: standard deviation of the random epistatic terms
t: trait number

Note

the epistatic terms will be Gaussian distributed around zero with the given standard deviation.

set_random_epistasis(epistasis_std)¶: Shortcut for set_random_trait_epistasis when there is only one trait

get_fitness(n)¶

Get the fitness of an individual

Parameters:

n: index of the clone whose fitness is to be computed

Returns:

fitness: fitness value of that clone

get_fitnesses()¶: Get the fitness of all clones.

get_trait(n, t=0)¶

Get a trait of an individual

Parameters:

n: index of the clone whose trait is to be computed
t: trait to be computed

Returns:

trait: value of that trait for that clone

get_traits()¶: Get all traits from all clones

get_clone_size(n)¶

Get the size of a clone

Parameters:

n: index of the clone

Returns:

size: size of the selected clone

get_clone_sizes()¶: Get the size of all clones.

get_genotype(n)¶: get_genotype(haploid_highd self, int n) -> boost::dynamic_bitset< >

get_genotypes()¶

Get all genotypes of the population.

Return:

genotypes: boolean 2D array with the genotypes

Note

this function does not return the sizes of each clone.

unique_clones()¶

Recompress the clone structure

During its evolution, identical clones might be generated by different routes at different times. This function merges any such duplicates into unique clones with the size equal to the sum of the sizes of the duplicates.

distance_Hamming(clone_gt1, clone_gt2, chunks=None, every=1)¶

Calculate the Hamming distance between two genotypes

Parameters:

clone_gt1: index of the clone corresponding to the first genotype
clone_gt2: index of the clone corresponding to the second genotype
chunks: list of pairs delimiting the genetic areas to include
every: do the comparison only on certain sites

Example: to calculate the distance between the first two clones limited to third codon positions between locus 90 and 200, use: distance_Hamming(0, 1, chunks=[92, 200], every=3).

random_genomes(n)¶

Get a sample of random genomes from the population

Parameters:

n: number of random genomes to compute

Returns:

gts: (n x L) bool matrix with the n genotypes

random_clone()¶

Get a random clone

Returns:

clone: index of the random clone

random_clones(n)¶

Get random clones

Parameters:

n: number of random clones to return

Returns:

clones: clone indices

get_fitness_histogram(n_sample=1000, **kwargs)¶

Calculate the fitness histogram of a population sample.

Parameters:

n_sample: number of individuals to sample

Returns:

h: numpy.histogram of fitness in the population

plot_fitness_histogram(axis=None, n_sample=1000, **kwargs)¶

Plot a distribution of fitness of a population sample.

Parameters:

axis: an axis to use. A new figure is created by default
n_sample: number of individuals to sample
kwargs: further optional keyword arguments to matplotlib.pyplot.hist

Returns:

return value of axis.hist(...)

get_divergence_histogram(bins=10, chunks=None, every=1, n_sample=1000, **kwargs)¶

Get the divergence histogram restricted to those chunks of the genome.

Parameters:

bins: number or array of bins to be used in the histogram (see also numpy.histogram)
chunks: restrict analysis to some chunk in the genome. It must be an n x 2 matrix with

the initial and (final+1) positions of the chunks
every: restrict analysis to every X positions. For instance, if every third site is neutral,

this argument can be used to only look at those neutral sites
n_sample: number of individuals to sample
kwargs: further optional keyword arguments to numpy.histogram

Returns:

h: numpy.histogram of divergence in the population

plot_divergence_histogram(axis=None, n_sample=1000, **kwargs)¶

Plot the divergence histogram of a population sample.

Parameters:

axis: an axis to use. A new figure is created by default
n_sample: number of individuals to sample
kwargs: further optional keyword arguments to matplotlib.pyplot.hist

Returns:

return value of axis.hist(...)

get_diversity_histogram(bins=10, chunks=None, every=1, n_sample=1000, **kwargs)¶

Get the diversity histogram restricted to those chunks of the genome.

Parameters:

bins: number or array of bins to be used in the histogram (see also numpy.histogram)
chunks: restrict analysis to some chunk in the genome. It must be an n x 2 matrix with

the initial and (final+1) positions of the chunks
every: restrict analysis to every X positions. For instance, if every third site is neutral,

this argument can be used to only look at those neutral sites
n_sample: number of individuals to sample
kwargs: further optional keyword arguments to numpy.histogram

Returns:

h: numpy.histogram of diversity in the population

plot_diversity_histogram(axis=None, n_sample=1000, **kwargs)¶

Plot the diversity histogram of a population sample.

Parameters:

axis: an axis to use. A new figure is created by default
n_sample: number of individuals to sample
kwargs: further optional keyword arguments to matplotlib.pyplot.hist

Returns:

return value of axis.hist(...)

genealogy¶: Genealogy of the tracked loci.

Note

This attribute is read-only.

track_locus_genealogy(loci)¶

Track the genealogy of some loci.

Parameters:

loci: sites whose genealogy is being stored

Returns:

zero if successful

haploid_highd Class Reference ¶

Class methods ¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

haploid_highd Class Reference¶

Class methods¶

Global functions related to haploid_highd¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation

haploid_highd Class Reference ¶

Class methods ¶

Global functions related to haploid_highd ¶