haploid_highd Class Reference

haploid_highd offers a number of methods, listed below. Global functions related to haploid_highd are shown after the methods.

Class methods

class haploid_highd(L=0, rng_seed=0, number_of_traits=1, all_polymorphic=False)

Class for high-dimensional population genetics (genomes larger than ~20 loci).

This class is the main object for simulating the evolution of populations with many loci (more than ~20). The class offers a number of functions, but an example will explain the basic idea:

######################################
#  EXAMPLE SCRIPT FOR HAPLOID_HIGHD  #
######################################
import numpy as np
import matplotlib.pyplot as plt
import FFPopSim as h
c = h.haploid_highd(300)       # 300 loci
pop.set_wildtype(1000)         # start with 1000 wildtype individuals
pop.mutation_rate = 1e-4       # mutation rate per site per generation
pop.outcrossing_rate = 1e-1    # probability of sexual reproduction per gen
pop.crossover_rate = 1e-2      # probability of crossover per site per gen
pop.evolve(100)                # evolve for 100 generations
c.plot_divergence_histogram()
plt.show()
######################################

Populations can have a number of phenotypic traits that contribute to the fitness of each individual. The function that calculates fitness from the phenotype identifies fitness with the first trait only by default. The user is, however, free to subclass haploid_highd in C++ (as it is done in hivpopulation) and implement their own phenotype -> fitness function.

In addition, the trait landscapes describe the genotype -> phenotype maps. These can be set directly from Python (since the genotypic space has a finite number of elements).

Note: fitness is not a phenotypic trait directly, but rather a function of all phenotypic traits together.

__init__(L=0, rng_seed=0, number_of_traits=1, all_polymorphic=False)

Construct a high-dimensional population with certain parameters.

Parameters:
  • L: number of loci

  • rng_seed: seed for the random generator. If zero (default) pick a random number

  • number_of_traits: number of phenotypic traits, defaults to one

  • all_polymorphic: option to use an infinite-sites model tracking ancestral alleles

    (only available with a single phenotypic trait and zero mutation rate)

__str__() <==> str(x)
__repr__() <==> repr(x)
copy(rng_seed=0)

Copy population into new instance.

Parameters:
  • rng_seed: random number to initialize the new population
dump(filename, format='bz2', include_genealogy=False)

Dump a population to binary file, for later use.

Parameters:
  • filename: the path to the file where to store the information
  • format: one of ‘bz2’ or ‘plain’. Choose the former if you want compression.
  • include_genealogy: if True, the multi_locus_genealogy is stored as well (if present).

Note

The population can be reloaded using the function FFPopSim.load_haploid_highd.

status()

Print a status list of the population parameters

L

Number of loci (read-only)

number_of_loci

Number of loci (read-only)

N

Population size (read-only)

population_size

Population size (read-only)

generation

Current generation (read-only)

number_of_clones

Number of non-empty clones (read-only)

number_of_traits

Number of traits (read-only)

max_fitness

Maximal fitness in the population (read-only)

participation_ratio

Participation ratio (read-only)

circular

is the genome circular?

carrying_capacity

current carrying capacity of the environment

recombination_model

Model of recombination to use

Available values:
  • FFPopSim.FREE_RECOMBINATION: free reassortment of all loci between parents
  • FFPopSim.CROSSOVERS: linear chromosome with crossover probability per locus
outcrossing_rate

outcrossing rate (probability of sexual reproduction per generation)

crossover_rate

crossover rate (probability of crossover per site per generation)

mutation_rate

mutation rate (per site per generation)

trait_weights

weight of each trait on fitness

Note

Fitness is updated automatically when the weights are changed.

get_clone(n)

Get a single clone

Parameters:
  • n: index of the clone
Returns:
  • clone: the n-th clone in the population
set_wildtype(N)

Initialize a population of wildtype individuals

Parameters:
  • N: the number of individuals

Note

the carrying capacity is set to the same value if still unset.

set_allele_frequencies(frequencies, N)

Initialize the population according to the given allele frequencies in linkage equilibrium.

Parameters:
  • frequencies: an array of length L with all allele frequencies
  • N: set the population size and, if still unset, the carrying capacity to this value
set_genotypes(genotypes, counts)

Initialize population with fixed counts for specific genotypes.

Parameters:
  • genotypes: list of genotypes to set. Genotypes are lists of alleles, e.g. [[0,0,1,0], [0,1,1,1]] for genotypes 0010 and 0111
  • counts: list of the number at which each of those genotypes it to be present

Note

the population size and, if unset, the carrying capacity will be set as the sum of the counts.

Example: if you want to initialize 200 individuals with genotype 001 and
300 individuals with genotype 110, you can use set_genotypes([[0,0,1], [1,1,0]], [200, 300])
evolve(gen=1)

Evolve for some generations.

Parameters:
  • gen: number of generations, defaults to one
bottleneck(size_of_bottleneck)

Make the population undergo a bottleneck

Parameters:
  • size_of_bottleneck: the number of individuals at the bottleneck
flip_single_locus(locus)

Take a random clone, flip the allele at the selected locus and create a new clone.

Parameters:
  • locus: locus to flip
Returns:
  • index: index of the new clone with the flipped locus
calc_stat()

Calculate trait and fitness statistics for the population

get_divergence_statistics(n_sample)

Get the mean and variance of the divergence in the population.

Parameters:
  • n_sample: number of individuals to sample at random from the population
Returns:
  • stat: structure with mean and variance of divergence in the population
get_diversity_statistics(n_sample)

Get the mean and variance of the diversity in the population.

Parameters:
  • n_sample: number of individuals to sample at random from the population
Returns:
  • stat: structure with mean and variance of diversity in the population
get_trait_statistics(t)

Get the mean and variance of a trait in the population.

Parameters:
  • t: number of the trait whose statistics are to be computed
Returns:
  • stat: structure with mean and variance of the trait in the population
get_fitness_statistics()

Get the mean and variance of the fitness in the population.

Returns:
  • stat: structure with mean and variance of the fitness in the population
get_trait_covariance(t1, t2)

Get the covariance of two traits in the population.

Parameters:
  • t1: first trait
  • t2: second trait
Returns:
  • cov: the covariance of the two traits
get_allele_frequency(locus)

Get the frequency of the + allele at the selected locus

Parameters:
  • locus: locus whose frequency of the + allele is to be returned
Returns:
  • frequency: allele frequency in the population
get_allele_frequencies()

Get all allele frequencies

get_pair_frequency(locus1, locus2)

Get the joint frequency of two + alleles

Parameters:
  • locus1: first locus
  • locus2: second locus
Returns:
  • the joint frequency of the + alleles
get_chi(locus)

Get \(\chi_i\) of an allele

Parameters:
  • locus: locus whose chi is to be computed
Returns:
  • the chi of that allele, \(\chi_i := \left<s_i\right>\), where \(s_i \in \{\pm1\}\).
get_chi2(locus1, locus2)

Get \(\chi_{ij}\)

Parameters:
  • locus1: first locus
  • locus2: second locus
Returns:
  • the linkage disequilibiurm between them, i.e. \(\chi_{ij} := \left<s_i s_j\right> - \chi_i \cdot \chi_j\).
get_LD(locus1, locus2)

Get linkage disequilibrium

Parameters:
  • locus1: first locus
  • locus2: second locus
Returns:
  • the linkage disequilibiurm between them, i.e. \(LD := 1 / 4 \left[\left<s_i s_j\right> - \chi_i \cdot \chi_j\right]\).
get_moment(locus1, locus2)

Get moment of two alleles in the -/+ basis

Parameters:
  • locus1: first locus
  • locus2: second locus
Returns:
  • the second moment, i.e. \(\left<s_i s_j\right>\), where \(s_i, s_j \in \{-1, 1\}\).
add_genotype(genotype, n=1)

Add new individuals to the population with certain genotypes

Parameters:
  • genotype: genotype to add to the population (Boolean list)
  • n: number of new individuals carrying that genotype
get_trait_additive(t)

Get an array with the additive coefficients of all loci of a trait.

Parameters:
  • t: number of the trait
Returns:
  • coefficients: array of additive coefficients for the selected trait
clear_trait(t)

Clear a trait landscape.

Parameters:
  • t: number of the trait to be cleared
clear_traits()

Clear all trait landscapes

clear_fitness()

Shortcut for clear_trait when there is only one trait

set_trait_additive(coefficients, t)

Set the additive part of a trait

Parameters:
  • coefficients: array of coefficients for the trait (of length L). All previous additive coefficents are erased
  • t: number of the trait to set
set_fitness_additive(coefficients)

Shortcut for set_trait_additive when there is only one trait

add_trait_coefficient(value, loci, t=0)

Add a coefficient to the trait landscape.

Parameters:
  • value: value of the coefficient
  • loci: array/list of loci indexed by the coefficient.
  • t: number of the trait to be changed

Example: to set a second-order epistatic term \(t_{ij} = 0.1\), use add_trait_coefficient(0.1, [i, j]).

Warning

the -/+ basis is used throughout the library. If you are used to the 0/1 basis, keep in mind that the interaction series-expansion is different.

add_fitness_coefficient(value, loci)

Shortcut for add_trait_coefficient when there is only one trait

get_trait_epistasis(t=0)

Get the epistatic terms of the genotype/phenotype map of the chosen trait.

Parameters:
  • t: trait number
Returns:
  • coefficients: tuple of coefficients, with a value and a tuple of loci

Note

This function is designed to work well in conjunction with add_trait_coefficient.

set_random_trait_epistasis(epistasis_std, t=0)

Set a random epistatic term in the genotype-phenotype map. This is meant as an approximation to multi-locus epistasis to which many locus sets contribute. It assigns to each genotype a reprodrucible fitness component drawn from a Gaussian distribution.

Parameters:
  • epistasis_std: standard deviation of the random epistatic terms
  • t: trait number

Note

the epistatic terms will be Gaussian distributed around zero with the given standard deviation.

set_random_epistasis(epistasis_std)

Shortcut for set_random_trait_epistasis when there is only one trait

get_fitness(n)

Get the fitness of an individual

Parameters:
  • n: index of the clone whose fitness is to be computed
Returns:
  • fitness: fitness value of that clone
get_fitnesses()

Get the fitness of all clones.

get_trait(n, t=0)

Get a trait of an individual

Parameters:
  • n: index of the clone whose trait is to be computed
  • t: trait to be computed
Returns:
  • trait: value of that trait for that clone
get_traits()

Get all traits from all clones

get_clone_size(n)

Get the size of a clone

Parameters:
  • n: index of the clone
Returns:
  • size: size of the selected clone
get_clone_sizes()

Get the size of all clones.

get_genotype(n)

get_genotype(haploid_highd self, int n) -> boost::dynamic_bitset< >

get_genotypes()

Get all genotypes of the population.

Return:
  • genotypes: boolean 2D array with the genotypes

Note

this function does not return the sizes of each clone.

unique_clones()

Recompress the clone structure

During its evolution, identical clones might be generated by different routes at different times. This function merges any such duplicates into unique clones with the size equal to the sum of the sizes of the duplicates.

distance_Hamming(clone_gt1, clone_gt2, chunks=None, every=1)

Calculate the Hamming distance between two genotypes

Parameters:
  • clone_gt1: index of the clone corresponding to the first genotype
  • clone_gt2: index of the clone corresponding to the second genotype
  • chunks: list of pairs delimiting the genetic areas to include
  • every: do the comparison only on certain sites

Example: to calculate the distance between the first two clones limited to third codon positions between locus 90 and 200, use: distance_Hamming(0, 1, chunks=[92, 200], every=3).

random_genomes(n)

Get a sample of random genomes from the population

Parameters:
  • n: number of random genomes to compute
Returns:
  • gts: (n x L) bool matrix with the n genotypes
random_clone()

Get a random clone

Returns:
  • clone: index of the random clone
random_clones(n)

Get random clones

Parameters:
  • n: number of random clones to return
Returns:
  • clones: clone indices
get_fitness_histogram(n_sample=1000, **kwargs)

Calculate the fitness histogram of a population sample.

Parameters:
  • n_sample: number of individuals to sample
Returns:
  • h: numpy.histogram of fitness in the population
plot_fitness_histogram(axis=None, n_sample=1000, **kwargs)

Plot a distribution of fitness of a population sample.

Parameters:
  • axis: an axis to use. A new figure is created by default
  • n_sample: number of individuals to sample
  • kwargs: further optional keyword arguments to matplotlib.pyplot.hist
Returns:
  • return value of axis.hist(...)
get_divergence_histogram(bins=10, chunks=None, every=1, n_sample=1000, **kwargs)

Get the divergence histogram restricted to those chunks of the genome.

Parameters:
  • bins: number or array of bins to be used in the histogram (see also numpy.histogram)

  • chunks: restrict analysis to some chunk in the genome. It must be an n x 2 matrix with

    the initial and (final+1) positions of the chunks

  • every: restrict analysis to every X positions. For instance, if every third site is neutral,

    this argument can be used to only look at those neutral sites

  • n_sample: number of individuals to sample

  • kwargs: further optional keyword arguments to numpy.histogram

Returns:
  • h: numpy.histogram of divergence in the population
plot_divergence_histogram(axis=None, n_sample=1000, **kwargs)

Plot the divergence histogram of a population sample.

Parameters:
  • axis: an axis to use. A new figure is created by default
  • n_sample: number of individuals to sample
  • kwargs: further optional keyword arguments to matplotlib.pyplot.hist
Returns:
  • return value of axis.hist(...)
get_diversity_histogram(bins=10, chunks=None, every=1, n_sample=1000, **kwargs)

Get the diversity histogram restricted to those chunks of the genome.

Parameters:
  • bins: number or array of bins to be used in the histogram (see also numpy.histogram)

  • chunks: restrict analysis to some chunk in the genome. It must be an n x 2 matrix with

    the initial and (final+1) positions of the chunks

  • every: restrict analysis to every X positions. For instance, if every third site is neutral,

    this argument can be used to only look at those neutral sites

  • n_sample: number of individuals to sample

  • kwargs: further optional keyword arguments to numpy.histogram

Returns:
  • h: numpy.histogram of diversity in the population
plot_diversity_histogram(axis=None, n_sample=1000, **kwargs)

Plot the diversity histogram of a population sample.

Parameters:
  • axis: an axis to use. A new figure is created by default
  • n_sample: number of individuals to sample
  • kwargs: further optional keyword arguments to matplotlib.pyplot.hist
Returns:
  • return value of axis.hist(...)
genealogy

Genealogy of the tracked loci.

Note

This attribute is read-only.

track_locus_genealogy(loci)

Track the genealogy of some loci.

Parameters:
  • loci: sites whose genealogy is being stored
Returns:
  • zero if successful

Table Of Contents

Previous topic

haploid_lowd Class Reference

Next topic

hivgene Class reference

This Page