Keynote by Henning Hermjakob

Reproducibility in Systems Biology Modelling — Sometimes

Reproducibility of scientific results is a key element of science and credibility. The lack of reproducibility across many scientific fields has emerged as an important concern. In the context of the BioModels database of mathematical models of biological systems, we have systematically attempted to reproduce 455 kinetic models published in peer-reviewed research articles from 152 journals. About half (49%) of the models could not be reproduced using the information provided in the published manuscripts. With further effort, an additional 12% of the models could be reproduced either by empirical correction or support from authors. Among the 37% non-reproducible models, 22% (99 models) could not be reproduced due to the three main reasons: inconsistency in model structure, missing initial concentration and parameters values. We will describe the study and discuss suggestions for improvement at individual and community level. As a concrete simple measure, we propose an 8-point reproducibility scorecard for modellers, reviewers and journal editors to assess models and address the reproducibility crisis.

Key reference:

Tiwari K, Kananathan S, Roberts MG, et al. Reproducibility in systems biology modelling. bioRxiv; 2020.

Keynote by Jonathan Karr

Novel technologies for data-intensive mechanistic models of whole cells

Computational models have great potential to help bioengineers design biological machines and to help physicians precisely treat patients. Several years ago, we reported one of the first models that demonstrated that such models are feasible through a combination of model, data integration, and software engineering. However, it remains challenging to construct models that are sufficiently comprehensive and detailed to guide genome engineering and medicine. The data needed for more comprehensive models is scattered throughout many databases and articles, and we lack the computational tools needed to build, describe, and simulate substantially larger models of multiple biochemical subsystems and scales. In addition, it remains difficult for large teams of investigators to work together to construct complex models. Recently, we have developed several technologies to address these issues, including Datanator, an integrated database of the molecular data needed for biochemical modeling and tools for discovering data that is relevant to a specific modeling project; WC-Lang, a language for describing models of single cells; WC-Rules, a language for modeling the combinatorial complexity of biochemistry; BpForms and BcForms, grammars for precisely describing the macromolecules represented by models; SSA-FBA and WC-Sim, hybrid algorithms for efficiently simulating multiple scales; and BioSimulators and BioSimulations, standards and repositories that facilitate collaboration. Together, we anticipate these technologies will accelerate the development of more comprehensive and predictive models.

PEtab – Interoperable Specification of Parameter Estimation Problems in Systems Biology

Daniel Weindl

Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been – so far – no broadly supported format for the specification of parameter estimation problems in systems biology. Therefore, we developed PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated.
We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and as well as example parameter estimation problems based on recent studies.
Specifications of PEtab, the PEtab Python library, as well as links to examples, and all supporting software tools are publicly available at github.com/PEtab-dev/PEtab.

Benchmarking of numerical integration methods for ODE models of biological systems

Paul Stapor, Yannik Schälte and Leonard Schmiester

Ordinary differential equation (ODE) models are a key tool to understand complex mechanisms in systems biology. These models are studied using various approaches, including stability and bifurcation analysis, but most frequently by numerical simulations. The number of required simulations is often large, e.g., when unknown parameters need to be inferred. This renders efficient and reliable numerical integration methods essential. However, these methods depend on various hyperparameters, which strongly impact the ODE solution. Despite this, and although hundreds of published ODE models are freely available in public databases, a thorough study that quantifies the impact of hyperparameters on the ODE solver in terms of accuracy and computation time was missing so far. In a recent study, we investigated which choices of algorithms and hyperparameters are generally favorable when dealing with ODE models arising from biological processes. To ensure a representative evaluation, we considered 142 published models. Our study provided evidence that most ODEs in computational biology are stiff. Furthermore, we could give guidelines for the choice of algorithms and hyperparameters. We think that these results will help researchers in systems biology to choose appropriate numerical methods when dealing with ODE models.

CobraMod: A pathway‑centric curation tool for constraint‑based metabolic models

Stefano Camborda La Cruz and Nadine Töpfer

Genome-scale metabolic models (GEMs) and their analysis by constraint‑based metabolic modeling techniques are a popular tool to study metabolic systems at a large‑scale. Several software tools for Constraint‑Based Reconstruction and Analysis (COBRA) are available such as COBRApy(1) which is based on the popular programming language Python. These tools offer a wide range of functionalities for model modification and metabolic flux predictions. However, they all require manual addition of biochemical information, such as metabolites, reactions, and gene IDs which is time-consuming and error‑prone. Here we present CobraMod, a COBRApy extension for pathway‑centric modification and curation of GEMs. The open‑source package enables extending GEMs with biochemical data from various databases such as BiGG, BioCyc, and KEGG. Our tool automatically identifies and transforms biochemical information into the corresponding sets of metabolites and reactions. These sets can either be separately added or included as groups, denominated pathways. CobraMod curates each new pathway taking into consideration chemical formulas, duplicate elements, reversibility and mass balance of reactions, the capability to carry non‑zero fluxes for newly added reactions, track changes, and can add extra databases for data retrieval. Our package integrates the software package Escher(2) for visualizing pathways and their corresponding flux distributions and thus enables a pathway‑centric and user‑friendly analysis of the model. CobraMod uses Python standards and aims for stability, uniformity and speed. We exemplify these functionalities in a case study where we calculate the additional metabolic cost for synthesizing new metabolites in Arabidopsis thaliana.

1. Ebrahim, et al. (2013) BMC Syst Biol 7, 74
2. King ZA, et al. (2015) PLOS Comp Bio 11(8)