with an introduction to parallelisation and High Performance Computing
Olivia Angelin-Bonnet, New Zealand eScience Infrastructure
21-22 September 2022
mathematical or statistical representation of a system or phenomenon (cell, ecosystem, solar system, etc).
Data about the system generated using a mathematical or statistical model.
Models and simulations allow us to:
Explain experimental data (through model fitting and refinement);
Test hypotheses:
without going through expensive and time-consuming experiments;
that wouldn’t be ethical or even feasible to test in real life.
Make predictions about new interventions/scenarios;
Communicate knowledge.
Modelling and simulations are used in many fields of science, e.g.:
Epidemiology: modelling of infectious diseases (see this talk on COVID-19 modelling);
Ecology: modelling of ecosystems, prediction of species abundance, evaluation of conservation policies;
Medicine: construction of organ models, prediction of drug-target binding and drug efficiency;
Chemistry, molecular biology: molecules interaction models;
Astrophysics: modelling of planet formation, galaxy mergers (see this example);
and many more!
the study of the interactions between biological entities through modelling and simulations
Interdisciplinary field: builds on physics, chemistry, biology, computer science, statistics, mathematics, etc
Interest in understanding the emerging properties of biological systems arising from local interactions between molecular components
Construction of whole-cell computational model of the human pathogen Mycoplasma genitalium: Kar et al., Cell (2012)
An overview of gene expression:
Regulation of gene expression:
Cells adapt to changes in environment by modulating the expression of their genes
Gene expression regulated by different types of molecules:
proteins
regulatory non-coding RNAs
small metabolites
Expression of a target gene can be controlled in different ways:
Regulation of transcription (regulatory proteins called transcription factors or TFs)
Regulation of translation
Regulation of gene products’ decay (RNAs and proteins)
Post-translational regulation (modification of sequence or shape of target proteins)
A Gene Regulatory Network:
Not all genes linked to all others: relationship between regulator and target usually very specific:
most targets controlled only by a few regulators;
most regulators controlling only a few targets;
some “hub” or “master” regulators.
Why simulating GRNS?
to test hypotheses about the GRN (e.g. by comparing simulations to experimental data);
to predict the response to a specific condition;
to predict the response of the system to a modification of the GRN;
to understand the emerging properties of the system;
to evaluate the performance of statistical tools used for GRN reconstruction from gene expression data.
Building blocks of a GRN model:
A list of regulatory interactions between the genes (often represented as a graph);
A set of rules to convert the regulations into a mathematical or statistical model;
(optional, depends on the model): A set of numerical parameters specifying the rate of the different reactions in the model.
Each type of model has its own advantages and drawbacks:
Some tools available to simulate GRNs (not an exhaustive list!)
GeneNetWeaver: for deterministic or semi-stochastic modelling;
CaiNet: mix of probabilistic and deterministic equations;
MeSCoT: stochastic modelling with time-delay;
In this workshop, we’ll focus on discrete and stochastic models:
A stochastic model consists of:
Reactions represented with a stoichiometry matrix:
System state represented as vector of species abundance at a given time point:
To simulate the evolution of the system over time, we need to know which reaction will fire when.
probability of the reaction to occur in the next (small) unit time step
Propensity depends on:
constant rate of the reaction
state of the system at a current time point:
few reactants present in the system \(\rightarrow\) low chance of the reaction occurring;
reactants abundant in the system: \(\rightarrow\) high chance of the reaction occurring.
In our example:
Basic concept of the Stochastic Simulation Algorithm:
Compute reactions propensities, based on current system state.
Randomly generate time increment \(\tau\) during which next reaction occurs.
Randomly select which reaction will occur between \(t\) and \(t+ \tau\).
Update time to \(t + \tau\) and system state based on which reaction occurred.
Repeat steps 1 to 4, until \(t = t_{max}\).
An example of one SSA iteration:
Advantage of SSA: every single reaction is simulated!
Downside: if many reactions with high propensity, each time increment will be really small
\(\rightarrow\) will take a long time to get to the end of the simulation
There exist many variations of the SSA:
exact versions: simulate the occurrence of every single reaction;
approximate versions: trade-off between accuracy and computational burden.
Several implementations of the SSA:
sismonr was developed for the purpose of generating benchmark datasets to assess the performance of network reconstruction methods
Objectives:
include the effect of small genetic mutations in the GRN to mimic genetic variability between individuals;
allow the simulation of polyploid systems (i.e. more that 2 copies of each gene present in the system);
model post-transcriptional regulation;
transparent model, customisable by the user;
generate random but plausible GRNs or use GRN provided by user.
can be slow for intensite computations \(\rightarrow\) sismonr uses under the hood!
Socket connection initialised on a random port \(\rightarrow\) can get messy when working on a HPC…
Some sismonr abbreviations:
Abbreviations | Meaning |
---|---|
TC | Transcription |
TL | Translation |
RD | RNA decay |
PD | Protein decay |
PTM | Post-translational modification |
PC | Protein-coding |
NC | Noncoding |
R | RNA |
P | Protein |
Pm | Modified protein |
C | Regulatory complex |
Instructions to log in to NeSI Mahuika Jupyter in the Supplementary Material.