haploid_highd offers a number of methods, listed below. Global functions related to haploid_highd are shown after the methods.
-
class haploid_highd(L=0, rng_seed=0, number_of_traits=1, all_polymorphic=False)
Class for high-dimensional population genetics (genomes larger than ~20 loci).
This class is the main object for simulating the evolution of populations with
many loci (more than ~20). The class offers a number of functions, but an
example will explain the basic idea:
######################################
# EXAMPLE SCRIPT FOR HAPLOID_HIGHD #
######################################
import numpy as np
import matplotlib.pyplot as plt
import FFPopSim as h
c = h.haploid_highd(300) # 300 loci
pop.set_wildtype(1000) # start with 1000 wildtype individuals
pop.mutation_rate = 1e-4 # mutation rate per site per generation
pop.outcrossing_rate = 1e-1 # probability of sexual reproduction per gen
pop.crossover_rate = 1e-2 # probability of crossover per site per gen
pop.evolve(100) # evolve for 100 generations
c.plot_divergence_histogram()
plt.show()
######################################
Populations can have a number of phenotypic traits that contribute to the fitness
of each individual. The function that calculates fitness from the phenotype
identifies fitness with the first trait only by default. The user is, however,
free to subclass haploid_highd in C++ (as it is done in hivpopulation) and
implement their own phenotype -> fitness function.
In addition, the trait landscapes describe the genotype -> phenotype maps.
These can be set directly from Python (since the genotypic space has a finite
number of elements).
Note: fitness is not a phenotypic trait directly, but rather a function of all
phenotypic traits together.
-
__init__(L=0, rng_seed=0, number_of_traits=1, all_polymorphic=False)
Construct a high-dimensional population with certain parameters.
- Parameters:
L: number of loci
rng_seed: seed for the random generator. If zero (default) pick a random number
number_of_traits: number of phenotypic traits, defaults to one
- all_polymorphic: option to use an infinite-sites model tracking ancestral alleles
(only available with a single phenotypic trait and zero mutation rate)
-
__str__() <==> str(x)
-
__repr__() <==> repr(x)
-
copy(rng_seed=0)
Copy population into new instance.
- Parameters:
- rng_seed: random number to initialize the new population
-
dump(filename, format='bz2', include_genealogy=False)
Dump a population to binary file, for later use.
- Parameters:
- filename: the path to the file where to store the information
- format: one of ‘bz2’ or ‘plain’. Choose the former if you want compression.
- include_genealogy: if True, the multi_locus_genealogy is stored as well (if present).
Note
The population can be reloaded using the function FFPopSim.load_haploid_highd.
-
status()
Print a status list of the population parameters
-
L
Number of loci (read-only)
-
number_of_loci
Number of loci (read-only)
-
N
Population size (read-only)
-
population_size
Population size (read-only)
-
generation
Current generation (read-only)
-
number_of_clones
Number of non-empty clones (read-only)
-
number_of_traits
Number of traits (read-only)
-
max_fitness
Maximal fitness in the population (read-only)
-
participation_ratio
Participation ratio (read-only)
-
circular
is the genome circular?
-
carrying_capacity
current carrying capacity of the environment
-
recombination_model
Model of recombination to use
- Available values:
- FFPopSim.FREE_RECOMBINATION: free reassortment of all loci between parents
- FFPopSim.CROSSOVERS: linear chromosome with crossover probability per locus
-
outcrossing_rate
outcrossing rate (probability of sexual reproduction per generation)
-
crossover_rate
crossover rate (probability of crossover per site per generation)
-
mutation_rate
mutation rate (per site per generation)
-
trait_weights
weight of each trait on fitness
Note
Fitness is updated automatically when the weights are changed.
-
get_clone(n)
Get a single clone
- Parameters:
-
- Returns:
- clone: the n-th clone in the population
-
set_wildtype(N)
Initialize a population of wildtype individuals
- Parameters:
- N: the number of individuals
Note
the carrying capacity is set to the same value if still unset.
-
set_allele_frequencies(frequencies, N)
Initialize the population according to the given allele frequencies in linkage equilibrium.
- Parameters:
- frequencies: an array of length L with all allele frequencies
- N: set the population size and, if still unset, the carrying
capacity to this value
-
set_genotypes(genotypes, counts)
Initialize population with fixed counts for specific genotypes.
- Parameters:
- genotypes: list of genotypes to set. Genotypes are lists of alleles,
e.g. [[0,0,1,0], [0,1,1,1]] for genotypes 0010 and 0111
- counts: list of the number at which each of those genotypes it to be present
Note
the population size and, if unset, the carrying capacity will be set
as the sum of the counts.
- Example: if you want to initialize 200 individuals with genotype 001 and
- 300 individuals with genotype 110, you can use
set_genotypes([[0,0,1], [1,1,0]], [200, 300])
-
evolve(gen=1)
Evolve for some generations.
- Parameters:
- gen: number of generations, defaults to one
-
bottleneck(size_of_bottleneck)
Make the population undergo a bottleneck
- Parameters:
- size_of_bottleneck: the number of individuals at the bottleneck
-
flip_single_locus(locus)
Take a random clone, flip the allele at the selected locus and create a new clone.
- Parameters:
-
- Returns:
- index: index of the new clone with the flipped locus
-
calc_stat()
Calculate trait and fitness statistics for the population
-
get_divergence_statistics(n_sample)
Get the mean and variance of the divergence in the population.
- Parameters:
- n_sample: number of individuals to sample at random from the population
- Returns:
- stat: structure with mean and variance of divergence in the population
-
get_diversity_statistics(n_sample)
Get the mean and variance of the diversity in the population.
- Parameters:
- n_sample: number of individuals to sample at random from the population
- Returns:
- stat: structure with mean and variance of diversity in the population
-
get_trait_statistics(t)
Get the mean and variance of a trait in the population.
- Parameters:
- t: number of the trait whose statistics are to be computed
- Returns:
- stat: structure with mean and variance of the trait in the population
-
get_fitness_statistics()
Get the mean and variance of the fitness in the population.
- Returns:
- stat: structure with mean and variance of the fitness in the population
-
get_trait_covariance(t1, t2)
Get the covariance of two traits in the population.
- Parameters:
- t1: first trait
- t2: second trait
- Returns:
- cov: the covariance of the two traits
-
get_allele_frequency(locus)
Get the frequency of the + allele at the selected locus
- Parameters:
- locus: locus whose frequency of the + allele is to be returned
- Returns:
- frequency: allele frequency in the population
-
get_allele_frequencies()
Get all allele frequencies
-
get_pair_frequency(locus1, locus2)
Get the joint frequency of two + alleles
- Parameters:
- locus1: first locus
- locus2: second locus
- Returns:
- the joint frequency of the + alleles
-
get_chi(locus)
Get \(\chi_i\) of an allele
- Parameters:
- locus: locus whose chi is to be computed
- Returns:
- the chi of that allele, \(\chi_i := \left<s_i\right>\), where \(s_i \in \{\pm1\}\).
-
get_chi2(locus1, locus2)
Get \(\chi_{ij}\)
- Parameters:
- locus1: first locus
- locus2: second locus
- Returns:
- the linkage disequilibiurm between them, i.e. \(\chi_{ij} := \left<s_i s_j\right> - \chi_i \cdot \chi_j\).
-
get_LD(locus1, locus2)
Get linkage disequilibrium
- Parameters:
- locus1: first locus
- locus2: second locus
- Returns:
- the linkage disequilibiurm between them, i.e. \(LD := 1 / 4 \left[\left<s_i s_j\right> - \chi_i \cdot \chi_j\right]\).
-
get_moment(locus1, locus2)
Get moment of two alleles in the -/+ basis
- Parameters:
- locus1: first locus
- locus2: second locus
- Returns:
- the second moment, i.e. \(\left<s_i s_j\right>\), where \(s_i, s_j \in \{-1, 1\}\).
-
add_genotype(genotype, n=1)
Add new individuals to the population with certain genotypes
- Parameters:
- genotype: genotype to add to the population (Boolean list)
- n: number of new individuals carrying that genotype
-
get_trait_additive(t)
Get an array with the additive coefficients of all loci of a trait.
- Parameters:
-
- Returns:
- coefficients: array of additive coefficients for the selected trait
-
clear_trait(t)
Clear a trait landscape.
- Parameters:
- t: number of the trait to be cleared
-
clear_traits()
Clear all trait landscapes
-
clear_fitness()
Shortcut for clear_trait when there is only one trait
-
set_trait_additive(coefficients, t)
Set the additive part of a trait
- Parameters:
- coefficients: array of coefficients for the trait (of length L). All previous additive coefficents are erased
- t: number of the trait to set
-
set_fitness_additive(coefficients)
Shortcut for set_trait_additive when there is only one trait
-
add_trait_coefficient(value, loci, t=0)
Add a coefficient to the trait landscape.
- Parameters:
- value: value of the coefficient
- loci: array/list of loci indexed by the coefficient.
- t: number of the trait to be changed
Example: to set a second-order epistatic term \(t_{ij} = 0.1\), use add_trait_coefficient(0.1, [i, j]).
Warning
the -/+ basis is used throughout the library. If you are used to the 0/1 basis, keep in mind that the interaction series-expansion is different.
-
add_fitness_coefficient(value, loci)
Shortcut for add_trait_coefficient when there is only one trait
-
get_trait_epistasis(t=0)
Get the epistatic terms of the genotype/phenotype map of the chosen trait.
- Parameters:
-
- Returns:
- coefficients: tuple of coefficients, with a value and a tuple of loci
Note
This function is designed to work well in conjunction with add_trait_coefficient.
-
set_random_trait_epistasis(epistasis_std, t=0)
Set a random epistatic term in the genotype-phenotype map. This is meant as an approximation to multi-locus epistasis to which many locus sets contribute. It assigns to each genotype a reprodrucible fitness component drawn from a Gaussian distribution.
- Parameters:
- epistasis_std: standard deviation of the random epistatic terms
- t: trait number
Note
the epistatic terms will be Gaussian distributed around zero with the given standard deviation.
-
set_random_epistasis(epistasis_std)
Shortcut for set_random_trait_epistasis when there is only one trait
-
get_fitness(n)
Get the fitness of an individual
- Parameters:
- n: index of the clone whose fitness is to be computed
- Returns:
- fitness: fitness value of that clone
-
get_fitnesses()
Get the fitness of all clones.
-
get_trait(n, t=0)
Get a trait of an individual
- Parameters:
- n: index of the clone whose trait is to be computed
- t: trait to be computed
- Returns:
- trait: value of that trait for that clone
-
get_traits()
Get all traits from all clones
-
get_clone_size(n)
Get the size of a clone
- Parameters:
-
- Returns:
- size: size of the selected clone
-
get_clone_sizes()
Get the size of all clones.
-
get_genotype(n)
get_genotype(haploid_highd self, int n) -> boost::dynamic_bitset< >
-
get_genotypes()
Get all genotypes of the population.
- Return:
- genotypes: boolean 2D array with the genotypes
Note
this function does not return the sizes of each clone.
-
unique_clones()
Recompress the clone structure
During its evolution, identical clones might be generated by different routes at
different times. This function merges any such duplicates into unique clones with
the size equal to the sum of the sizes of the duplicates.
-
distance_Hamming(clone_gt1, clone_gt2, chunks=None, every=1)
Calculate the Hamming distance between two genotypes
- Parameters:
- clone_gt1: index of the clone corresponding to the first genotype
- clone_gt2: index of the clone corresponding to the second genotype
- chunks: list of pairs delimiting the genetic areas to include
- every: do the comparison only on certain sites
Example: to calculate the distance between the first two clones
limited to third codon positions between locus 90 and 200, use:
distance_Hamming(0, 1, chunks=[92, 200], every=3).
-
random_genomes(n)
Get a sample of random genomes from the population
- Parameters:
- n: number of random genomes to compute
- Returns:
- gts: (n x L) bool matrix with the n genotypes
-
random_clone()
Get a random clone
- Returns:
- clone: index of the random clone
-
random_clones(n)
Get random clones
- Parameters:
- n: number of random clones to return
- Returns:
-
-
get_fitness_histogram(n_sample=1000, **kwargs)
Calculate the fitness histogram of a population sample.
- Parameters:
- n_sample: number of individuals to sample
- Returns:
- h: numpy.histogram of fitness in the population
-
plot_fitness_histogram(axis=None, n_sample=1000, **kwargs)
Plot a distribution of fitness of a population sample.
- Parameters:
- axis: an axis to use. A new figure is created by default
- n_sample: number of individuals to sample
- kwargs: further optional keyword arguments to matplotlib.pyplot.hist
- Returns:
- return value of axis.hist(...)
-
get_divergence_histogram(bins=10, chunks=None, every=1, n_sample=1000, **kwargs)
Get the divergence histogram restricted to those chunks of the genome.
- Parameters:
bins: number or array of bins to be used in the histogram (see also numpy.histogram)
- chunks: restrict analysis to some chunk in the genome. It must be an n x 2 matrix with
the initial and (final+1) positions of the chunks
- every: restrict analysis to every X positions. For instance, if every third site is neutral,
this argument can be used to only look at those neutral sites
n_sample: number of individuals to sample
kwargs: further optional keyword arguments to numpy.histogram
- Returns:
- h: numpy.histogram of divergence in the population
-
plot_divergence_histogram(axis=None, n_sample=1000, **kwargs)
Plot the divergence histogram of a population sample.
- Parameters:
- axis: an axis to use. A new figure is created by default
- n_sample: number of individuals to sample
- kwargs: further optional keyword arguments to matplotlib.pyplot.hist
- Returns:
- return value of axis.hist(...)
-
get_diversity_histogram(bins=10, chunks=None, every=1, n_sample=1000, **kwargs)
Get the diversity histogram restricted to those chunks of the genome.
- Parameters:
bins: number or array of bins to be used in the histogram (see also numpy.histogram)
- chunks: restrict analysis to some chunk in the genome. It must be an n x 2 matrix with
the initial and (final+1) positions of the chunks
- every: restrict analysis to every X positions. For instance, if every third site is neutral,
this argument can be used to only look at those neutral sites
n_sample: number of individuals to sample
kwargs: further optional keyword arguments to numpy.histogram
- Returns:
- h: numpy.histogram of diversity in the population
-
plot_diversity_histogram(axis=None, n_sample=1000, **kwargs)
Plot the diversity histogram of a population sample.
- Parameters:
- axis: an axis to use. A new figure is created by default
- n_sample: number of individuals to sample
- kwargs: further optional keyword arguments to matplotlib.pyplot.hist
- Returns:
- return value of axis.hist(...)
-
genealogy
Genealogy of the tracked loci.
Note
This attribute is read-only.
-
track_locus_genealogy(loci)
Track the genealogy of some loci.
- Parameters:
- loci: sites whose genealogy is being stored
- Returns:
-