Advertisement
Mini-Review| Volume 28, ISSUE 3, P53-64, April 2023

Download started.

Ok

Using chemical and biological data to predict drug toxicity

Open AccessPublished:January 10, 2023DOI:https://doi.org/10.1016/j.slasd.2022.12.003

      Abstract

      Various sources of information can be used to better understand and predict compound activity and safety-related endpoints, including biological data such as gene expression and cell morphology. In this review, we first introduce types of chemical, in vitro and in vivo information that can be used to describe compounds and adverse effects. We then explore how compound descriptors based on chemical structure or biological perturbation response can be used to predict safety-related endpoints, and how especially biological data can help us to better understand adverse effects mechanistically. Overall, the described applications demonstrate how large-scale biological information presents new opportunities to anticipate and understand the biological effects of compounds, and how this can support predictive toxicology and drug discovery projects.

      Keywords

      Introduction

      Understanding the risk of potential adverse effects of compounds is crucial across all chemistry-related industries, especially in the pharmaceutical industry, and has hence also been studied across a broad chemical space [
      • Alves VM
      • Muratov EN
      • Zakharov A
      • Muratov NN
      • Andrade CH
      • Tropsha A.
      Chemical toxicity prediction for major classes of industrial chemicals: is it possible to develop universal models covering cosmetics, drugs, and pesticides?.
      ]. Thereby, adverse effects manifest through different levels of biological organization. While we can model the binding of compounds on the molecular level relatively well, effects are more difficult to predict on higher levels as these can generally be induced through multiple molecular initiating events and depend on the broader cell and tissue context. Molecular initiating events refer to the early interactions between a molecule and a biological system which are then causally linked to the later outcome. Our understanding of these and our ability to measure these has advanced comparatively slower. Generally, effects are first induced on the cellular level, either through direct or indirect action of the compound and then lead to effects on the tissue or organ level influencing their physiological function. These can then eventually result in clinically relevant phenotypes, such as liver failure or other adverse outcomes (Fig. 1). Across all levels of biological complexity, there are generally multiple lower-level events that can induce the higher-level event. As an example, compounds can cause mitochondrial dysfunction through interaction with multiple protein targets, and this is in turn only one cellular mechanism leading to hepatotoxicity [
      • Labbe G
      • Pessayre D
      • Fromenty B.
      Drug-induced liver injury through mitochondrial dysfunction: mechanisms and detection during preclinical safety studies.
      ].
      Fig 1:
      Fig. 1Compounds induce systems-level phenotypes through effects across multiple biological scales which can be studied using different model systems.
      In this review, we will discuss in vitro models and how these can help us to anticipate the in vivo adverse effects of new compounds. We will first introduce the types of information that can be derived from chemical structure, in vitro and in vivo models, and will subsequently introduce different types of in vitro assays and the rationale for employing them in the context of toxicology. We will discuss how in vitro readouts can be modelled in combination with chemical and/or in vivo information to understand the observed properties of compounds or to anticipate biological properties of yet unseen compounds.

      Types of information

      Chemical structures: If we assume that the toxicity (or safety-related properties) of a molecule is linked to its structure, we may be able to predict the same based on the chemical structure. However, this requires a numerical or machine learning-friendly representation that may be obtained by some logical or mathematical procedure. Chemical features which are machine friendly can contain a wide range of chemical information about a molecule ranging from atomic to structural, geometrical, and even physicochemical properties. A useful feature retains sufficient and necessary information from the structure, is different in value for dissimilar molecules while also being distinguishable from other features. In the following section, we discuss various ways to encode chemical information into features.
      Chemical structure can be described in different dimensions, ranging from 1D, such as a compound's molecular weight, to 2D, which describes the connectivity between atoms, to 3D, which additionally accounts for the orientation of atoms and bonds [
      • Glem RC
      • Bender A
      • Arnby CH
      • Carlsson L
      • Boyer S
      • Smith J.
      Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME.
      ]. Descriptor vectors can be composed of bits, counts or continuous variables, such as computationally predicted physicochemical properties, and examples for these are summarised in Table 1. Thereby, descriptors can capture various types of information on the molecule which may be relevant in different contexts. For example, physicochemical properties such as partition coefficient, water solubility, and topological polar surface area are highly relevant for the pharmacokinetics of a chemical. In contrast, chemical structure and fragments are more commonly used in current quantitative structure-activity relationship (QSAR) studies. Although 3D molecular descriptors and quantum mechanical properties can be used to represent a compound, studies have shown that 2D fragments were overall more popular due to better predictivity and computational cost [
      • Tropsha A.
      Predictive quantitative structure–activity relationship modeling.
      ,
      • Brown RD
      • Martin YC.
      The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding.
      ]. More recently, chemical structures have also been represented by molecular graphs which outperformed molecular fingerprints by being able to leverage deep learning techniques [
      • Wu Z
      • Ramsundar B
      • Feinberg EN
      • et al.
      MoleculeNet: a benchmark for molecular machine learning.
      ].
      Table 1Dimensionality of molecular descriptors used to represent chemical structures in chemoinformatics. 3D view created using https://molview.org/
      Molecule dimensionsMethane ExampleDescriptorsExample
      1DCH4Based on the molecular formulaMolecular weight, Atoms counts etc
      2DBased on the connectivity table, e.g. based on structure or fragment countsExtended Connectivity Fingerprint (ECFP)
      • Rogers D
      • Hahn M.
      Extended-connectivity fingerprints.
      , Atom pair fingerprints
      • Carhart RE
      • Smith DH
      • Venkataraghavan R.
      Atom pairs as molecular features in structure-activity studies: definition and applications.
      , MACCS Fingerprints
      • Durant JL
      • Leland BA
      • Henry DR
      • Nourse JG.
      Reoptimization of MDL keys for use in drug discovery.
      3DBased on the 3D geometry and pharmacophoreExtended Three-Dimensional Fingerprint (E3FP)
      • Axen SD
      • Huang XP
      • Cáceres EL
      • Gendelev L
      • Roth BL
      • Keiser MJ.
      A simple representation of three-dimensional molecular structure.
      , Moments of inertia, Electronic descriptors, Weighted holistic invariant molecular (WHIM) descriptors
      • Todeschini R
      • Gramatica P
      • Provenzani R
      • Marengo E.
      Weighted holistic invariant molecular descriptors. Part 2. Theory development and applications on modeling physicochemical properties of polyaromatic hydrocarbons.
      Having derived chemical descriptors, the general assumption is then that a structurally similar compound is likely to show similar properties and may exhibit the same pharmacological effects, e.g. by binding to the same target protein, and hence chemical information can be valuable to anticipate compound properties [
      • Bender A
      • Glen RC.
      Molecular similarity: a key technique in molecular informatics.
      ]. Using compound descriptors, we can calculate the similarity between two compounds, as shown in Fig. 2, and can predict the relevant compound properties.
      Fig 2:
      Fig. 2Principal Component Analysis (PCA) of a subset of compounds in Tox21 based on fragment-based fingerprints in DataWarrior (https://openmolecules.org/datawarrior/). The PCA shows the global structure of the chemical space covered, while the Tanimoto similarity describes the similarity of each compound to the query compound.
      However, one potential limitation of models built with chemical information is that these are restricted by limited variation of the chemical space of the training data, which defines the model's applicability domain. For example, a model trained on one or several analogue series, irrespective of the dataset size, is likely not predictive for compounds that have a different scaffold from the training data. Instead, only structurally similar compounds are likely confidently predicted. However, also two structurally very similar chemicals can display large differences in binding affinity, which is referred to as an “activity cliff”. This may result from the lack of invariance of chemical space, which may be associated with the selected molecular descriptors and the training data [
      • Maggiora GM.
      On outliers and activity cliffs–why QSAR often disappoints.
      ], but can also simply be a consequence of the complexity of biological processes. For example, adding a methyl group to Sudoxicam can cause a different metabolic pathway, resulting in hepatotoxicity [
      • Kalgutkar AS.
      Designing around structural alerts in drug discovery.
      ]. For these kinds of problems, information on the biological response to perturbation may be required to reveal the mode of action of a chemical and to predict the effects of compounds in vivo.
      In vitro information: One approach to identifying functional similarity is to derive compound representations that are biologically more relevant than chemical structure. From the experimental side, this can be achieved by characterising the response of biological systems to perturbation with compounds in vitro using various readouts. In this context, different model systems are used with different levels of physiological relevance, and the underlying assumption is that these can ideally capture early convergent effects despite different molecular interactions with the biological system [
      • Duran-Frigola M
      • Pauls E
      • Guitart-Pla O
      • et al.
      Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.
      ].
      The most commonly used in vitro assays are ligand binding assays which measure the binding of the compound to a target molecule, which can e.g. be a protein or mRNA, and can give insights into which biomolecules a compound likely interacts with [
      • Pollard TD.
      A guide to simple and informative binding assays.
      ,
      • Croston GE.
      The utility of target-based discovery.
      ]. If the given target has a strong mechanistic link to a phenotype, the assay can then help to identify potentially unsafe compounds earlier. However, in the case of more complex phenotypes, such as systems-level adverse effects, the information may be limited as binding alone does not imply that the ligand has a functional effect, and perturbation of an individual protein might simply not be sufficiently informative. An alternative approach to measuring individual targets is hence to detect intermediate phenotypes in simple model systems. The simplest model systems for large-scale perturbations are immortalised cell lines which are well-established, scalable and cost-efficient. However, the relevance of these cell cultures is limited by the fact that both the cells and their surrounding conditions are artificial [
      • Hartung T.
      Food for thought ... on cell culture.
      ] and for instance lack metabolism meaning that effects mediated by drug metabolites cannot be detected [
      • Coecke S
      • Ahr H
      • Blaauboer BJ
      • et al.
      Metabolism: a bottleneck in in vitro toxicological test development. The report and recommendations of ECVAM workshop 54.
      ]. At the same time, it should be noted that also more complex in vitro model systems exist and are actively being developed which mimic physiological properties more closely and hence are promising tools to derive more relevant readouts. For instance, it was found that HepaRG cells that retain drug metabolism capabilities are better at detecting hepatotoxic compounds than HepG2 cells which lack these, while both immortalised liver-derived cell lines are less sensitive than primary human hepatocytes [
      • Gerets HHJ
      • Tilmant K
      • Gerin B
      • et al.
      Characterization of primary human hepatocytes, HepG2 cells, and HepaRG cells at the mRNA level and CYP activity in response to inducers and their predictivity for the detection of human hepatotoxins.
      ].
      Beyond mono cell cultures, more advanced in vitro systems should be mentioned which also aim to emulate whole tissues or organs. The two main directions in this regard are organoids, which aim to mimic organ function in vitro with high accuracy, and microfluidics-based “organ-on-a-chip” models which instead aim to capture relevant microphysiological aspects in a miniaturised manner [
      • Marx U
      • Akabane T
      • Andersson TB
      • et al.
      Biology-inspired microphysiological systems to advance patient benefit and animal welfare in drug development.
      ,
      • Skardal A
      • Aleman J
      • Forsythe S
      • et al.
      Drug compound screening in single and integrated multi-organoid body-on-a-chip systems.
      ]. As an intermediate between in vitro and in vivo models, patient-derived organoids should be mentioned which allow the same analyses as in vitro models but using cells and tissues derived from patients which hence closely resemble the in vivo physiology. As example, patient-derived tumour organoids have been used to screen drug sensitivity for 240 drugs in 5 days and are able to guide decision-making for the given patient and tumour shortly after surgery [
      • Phan N
      • Hong JJ
      • Tofig B
      • et al.
      A simple high-throughput approach identifies actionable drug sensitivities in patient-derived tumor organoids.
      ]. However, these are of course only able to capture information up to the organ level, which may still be insufficient to describe in vivo phenotypes such as adverse effects which manifest over longer periods and on a whole-organism level. Organ-on-a-chip models are particularly interesting for compound screening due to their focus on scalability. Assays to study drug-induced toxicity are being developed for multiple organ systems, e.g. for liver, kidney and heart, and are being further extended to multi-organ or whole-body human-on-a-chip assays [
      • Cong Y
      • Han X
      • Wang Y
      • et al.
      Drug toxicity evaluation based on organ-on-a-chip technology: a review.
      ]. Beyond drug safety, these models can also help to estimate a compound's pharmacokinetic properties, and can help to study the spatiotemporal effects of a compound at higher resolution than in vivo models as well as simpler in vitro models [
      • Ma C
      • Peng Y
      • Li H
      • Chen W.
      Organ-on-a-chip: a new paradigm for drug development.
      ]. While the prospects and development in this field are encouraging, no larger-scale perturbation screens are yet available, which is why these organ-on-a-chip models will not be further discussed in the sections below.
      From the introduced model systems, different types of readouts can be derived. These can give a broad overview on changes in biological processes including gene expression and imaging or can be aimed at specific physiological phenotypes, such as calcium transients which are used as an indicator for cardiomyocyte contractility [
      • Watanabe H
      • Honda Y
      • Deguchi J
      • Yamada T
      • Bando K.
      Usefulness of cardiotoxicity assessment using calcium transient in human induced pluripotent stem cell-derived cardiomyocytes.
      ] (Table 2). The currently available large-scale perturbation screens will be introduced below. A general limitation of these in vitro models is that it is difficult to extrapolate perturbation response to in vivo systems given that exposure is critical but generally not characterised in vitro. While there are efforts to better estimate the biologically effective dose in vitro [
      • Armitage JM
      • Wania F
      • Arnot JA.
      Application of mass balance models and the chemical activity concept to facilitate the use of in vitro toxicity data for risk assessment.
      ,
      • Fischer FC
      • Henneberger L
      • König M
      • et al.
      Modeling exposure in the Tox21 in vitro bioassays.
      ,
      • Proença S
      • Escher BI
      • Fischer FC
      • et al.
      Effective exposure of chemicals in in vitro cell systems: a review of chemical distribution models.
      ], these are not currently widely employed. Consequently, most in vitro models generally only characterise hazard, the potential to cause harm, instead of risk, the corresponding likelihood. However, more and more advanced model systems are being developed which address this challenge of in vitro-in vivo translation, including multi-organ chips which model physiological pharmacokinetics, or more precisely first-pass drug absorption, metabolism and excretion through linked liver, gut and kidney chambers [
      • Herland A
      • Maoz BM
      • Das D
      • et al.
      Quantitative prediction of human pharmacokinetic responses to drugs via fluidically coupled vascularized organ chips.
      ].
      Table 2Data types to characterise the biological response to perturbation in vitro or in vivo.
      Data TypeInformation levelDescriptorAdvantagesDisadvantages
      in vitro information
      Bioactivity assaysTargets (MIE), Cellular (KE)Binding affinityProvides a ligand bioactivity spacein vitro data extrapolation to in vivo is challenging, activity is highly dependent on concentration both in vivo and in vitro
      Activity readout
      MorphologyCellular, Tissue (KE)Microscope imagesProvides versatile cell level features of a systemComputed features are very correlated and difficult to interpret from an in vivo biological perspective
      • Bender A
      • Cortes-Ciriano I.
      Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data.
      Cell/organelle features, e.g. size/shape/…
      in vitro and in vivo information
      TranscriptomicsCellular, Tissue (KE)Differential expressionCost-efficient, increasing resolution (single-cell, spatial transcriptomics)Often weak correlation between expression and protein level
      • Gry M
      • Rimini R
      • Strömberg S
      • et al.
      Correlations between RNA and protein expression profiles in 23 human cell lines.
      ProteomicsCellular, Tissue (KE)Differential protein levelsGlobal or targeted detection of specific proteins, e.g.possible focus on post-translational modification or interaction partnersMore expensive and low-throughput than transcriptomics
      MetabolomicsCellular, Tissue (KE)Differential metabolite levelsCan detect metabolites within a sample globally, but also can be targeted at specific subtypesRelatively expensive and low-throughput
      in vivo information
      Clinical chemistry and blood countOrganism (AO)Measured marker levelsCan be measured non-invasively and reflects in vivo response in real-timeCommonly measured markers are only informative for a few phenotypes
      HistopathologyTissue (KE/AO)Microscopic images of tissues, e.g. after H&E stainingSpatial information on lesions, such as severity and frequencyRequire expert-driven evaluation and annotation, which can be difficult to harmonize
      • Pinches MD
      • Thomas R
      • Porter R
      • Camidge L
      • Briggs K.
      Curation and analysis of clinical pathology parameters and histopathologic findings from eTOXsys, a large database project (eTOX) for toxicologic studies.
      Adverse eventsPopulation (AO)Unstructured data from reported adverse eventCovers many drugs and large populationsInfluenced by known biases in reporting [
      • de Boissieu P
      • P de Boissieu
      • Kanagaratnam L
      • et al.
      Notoriety bias in a database of spontaneous reports: the example of osteonecrosis of the jaw under bisphosphonate therapy in the French national pharmacovigilance database.
      ,
      • Moore N
      • Hall G
      • Sturkenboom M
      • Mann R
      • Lagnaoui R
      • Begaud B.
      Biases affecting the proportional reporting ratio (PRR) in spontaneous reports pharmacovigilance databases: the example of sertindole.
      ]
      Clinical trial dataProvides detailed response on patient response to drugVery costly
      In vivo information: In toxicology as well as other research areas, most in vivo information is derived from studies in animals, while this is complemented by clinical studies in the pharmaceutical industry. From the respective subjects, a wide range of information can be derived including information on the response to compound treatment from the cellular to the whole-organism level. The DrugMatrix [
      • Gusenleitner D
      • Auerbach SS
      • Melia T
      • Gómez HF
      • Sherr DH
      • Monti S.
      Genomic models of short-term exposure accurately predict long-term chemical carcinogenicity and identify putative mechanisms of action.
      ] and Open TG-GATEs [
      • Igarashi Y
      • Nakatsu N
      • Yamashita T
      • et al.
      Open TG-GATEs: a large-scale toxicogenomics database.
      ] databases for instance contain transcriptomics, clinical chemistry, haematology and histopathology data from multiple organs in rats treated with 627 and 170 compounds, respectively, across multiple timepoints and doses (Table 2). These are invaluable resources in the field of toxicogenomics [
      • Alexander-Dann B
      • Pruteanu LL
      • Oerton E
      • et al.
      Developments in toxicogenomics: understanding and predicting compound-induced toxicity from gene expression data.
      ,
      • Chen M
      • Zhang M
      • Borlak J
      • Tong W.
      A decade of toxicogenomic research and its contribution to toxicological science.
      ] as they contain multiple types of information within the same animal through which it is possible to derive mechanistic links, e.g. between gene expression and histopathology, while accounting for the variability between biological replicates.
      Given that in vivo studies are costly and need to be ethically justified [
      • Sneddon LU
      • Halsey LG
      • Bury NR.
      Considering aspects of the 3Rs principles within experimental animal biology.
      ,
      • Nardini C.
      The ethics of clinical trials.
      ], these extensive studies across larger sets of compounds and data types are generally rare. One approach to tackle this is to integrate data from distinct studies, e.g. ToxRefDB [
      • Watford S
      • Pham LL
      • Wignall J
      • Shin R
      • Martin MT
      • Friedman KP.
      ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses.
      ] focussed on in vivo toxicity endpoints while the eTOX project [
      • Sanz F
      • Pognan F
      • Steger-Hartmann T
      • et al.
      Legacy data sharing to improve drug safety assessment: the eTOX project.
      ] aimed to integrate different types of pre-clinical data from multiple pharma companies. Additionally, various databases exist which gather omics data, including in vivo perturbation data, such as Gene Expression Omnibus (GEO) [
      • Edgar R
      • Domrachev M
      • Lash AE.
      Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.
      ] and ArrayExpress [
      • Athar A
      • Füllgrabe A
      • George N
      • et al.
      ArrayExpress update - from bulk to single-cell expression data.
      ]. An alternative strategy is to extract phenotypic information from real-world data, e.g. from the FDA Adverse Reporting System [
      • Banda JM
      • Evans L
      • Vanguri RS
      • Tatonetti NP
      • Ryan PB
      • Shah NH.
      A curated and standardized adverse drug event resource to accelerate drug safety research.
      ], which describes much larger populations but contains known biases [
      • de Boissieu P
      • P de Boissieu
      • Kanagaratnam L
      • et al.
      Notoriety bias in a database of spontaneous reports: the example of osteonecrosis of the jaw under bisphosphonate therapy in the French national pharmacovigilance database.
      ,
      • Moore N
      • Hall G
      • Sturkenboom M
      • Mann R
      • Lagnaoui R
      • Begaud B.
      Biases affecting the proportional reporting ratio (PRR) in spontaneous reports pharmacovigilance databases: the example of sertindole.
      ]. Generally, these efforts are often challenged by missing standardisation of expert-guided annotations, both concerning terminology and severity scales as well as batch effects for measured readouts. However, if these challenges are overcome, the integration of studies might provide better information on phenotypic effects on a population level.
      One general challenge in all in vivo studies is that there is not only a single response pattern to a compound, but instead, it is highly dependent upon the route of administration [
      • Benet LZ.
      Effect of route of administration and distribution on drug action.
      ] as well as on other host factors. For instance, sex [
      • Karlsson Lind L
      • von Euler M
      • Korkmaz S
      • Schenck-Gustafsson K
      Sex differences in drugs: the development of a comprehensive knowledge base to improve gender awareness prescribing.
      ], ethnicity, individual genetic variants [
      • van der Wouden CH
      • Cambon-Thomsen A
      • Cecchin E
      • et al.
      Implementing pharmacogenomics in Europe: design and implementation strategy of the ubiquitous pharmacogenomics consortium.
      ] and the microbiome [
      • Abdelsalam NA
      • Ramadan AT
      • ElRakaiby MT
      • Aziz RK.
      Toxicomicrobiomics: the human microbiome vs. pharmaceutical, dietary, and environmental xenobiotics.
      ] are known to affect drug response in patients, but also behavioural and environmental factors can confound drug response. With respect to mapping to chemical space, this means that some level of abstraction is necessary to summarise biological response to something which can be modelled and ideally accounts for biological variability at the same time. One outstanding example for this is the FDA's work curating information on Drug-Induced Liver Injury (DILI). This resulted in the DILIst [
      • Thakkar S
      • Li T
      • Liu Z
      • Wu L
      • Roberts R
      • Tong W.
      Drug-induced liver injury severity and toxicity (DILIst): binary classification of 1279 drugs by human hepatotoxicity.
      ] and DILIRank [
      • Chen M
      • Suzuki A
      • Thakkar S
      • Yu K
      • Hu C
      • Tong W.
      DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans.
      ] databases which have explicitly been established to allow better evaluation of predictive models. However, it is debatable whether such complex information can meaningfully be reduced to a classification problem [
      • Liu A
      • Walter M
      • Wright P
      • et al.
      Prediction and mechanistic analysis of Drug-Induced Liver Injury (DILI) based on chemical structure.
      ,
      • Vall A
      • Sabnis Y
      • Shi J
      • Class R
      • Hochreiter S
      • Klambauer G.
      The promise of AI for DILI prediction.
      ]. Hence, also efforts to model effects on a more gradual scale should be noted, such as ToxScores [
      • Sutherland JJ
      • Webster YW
      • Willy JA
      • et al.
      Toxicogenomic module associations with pathogenesis: a network-based approach to understanding drug toxicity.
      ] which summarise frequency and severity of histopathological findings across replicates as well as previous work by Galeano et al. who modelled drug side effect frequencies demonstrating that this semi-quantitative information can be modelled as ordered classes [

      Galeano D, Li S, Gerstein M et al. Predicting the frequencies of drug side effects. Nat Commun 2020;11,4575.

      ].

      In vitro approaches to characterise biological perturbation response

      In vitro assays can be separated into two main classes based on whether or not the assay aims to measure a specific readout of interest, so is hypothesis-based, or instead measure general changes in the model system and is hence hypothesis-free (Fig. 3). Hypothesis-based assays are currently more established in drug discovery and toxicology, as the readout is clearly indicative e.g., for binding to a specific target or a specific cellular phenotype with established or at least anticipated in vivo relevance. In contrast, hypothesis-free assays have the potential to be informative for a wide range of endpoints, but the usefulness of the assay for a given application needs to be evaluated in the first place.
      Fig 3:
      Fig. 3Hypothesis-based and hypothesis-free assays. Hypothesis-based assays aim to measure a known, and often low-dimensional, readout which is mechanistically linked to an in vivo endpoint and hence generally interpretable. In contrast, hypothesis-free assays measure biological response broadly. Consequently, their interpretation is not straightforward, however, they can potentially be informative for a wide range of endpoints.

      Hypothesis-based assays

      Compound profiling using different bioassays is commonly used to predict the potential risk of adverse effects as well as to evaluate potentially desired effects, such as in target-based drug discovery [
      • Croston GE.
      The utility of target-based discovery.
      ]. For instance, secondary pharmacology screening is conducted in the early stages of drug discovery to identify potential off-targets of the compound and generally includes proteins associated with clinical adverse drug reactions [
      • Lynch JJ
      • Van Vleet TR
      • Mittelstadt SW
      • Blomme EAG.
      Potential functional and pathological side effects related to off-target pharmacological activity.
      ,
      • Lounkine E
      • Keiser MJ
      • Whitebread S
      • et al.
      Large-scale prediction and testing of drug activity on side-effect targets.
      ,
      • Deaton AM
      • Fan F
      • Zhang W
      • Nguyen PA
      • Ward LD
      • Nioi P.
      Rationalizing secondary pharmacology screening using human genetic and pharmacological evidence.
      ,
      • Bendels S
      • Bissantz C
      • Fasching B
      • et al.
      Safety screening in early drug discovery: an optimized assay panel.
      ]. A total of 44 targets are suggested as a minimum panel for safety screening by AstraZeneca, GlaxoSmithKline, Novartis and Pfizer [
      • Bowes J
      • Brown AJ
      • Hamon J
      • et al.
      Reducing safety-related drug attrition: the use of in vitro pharmacological profiling.
      ], and these can be classified into five categories. 1) G protein-coupled receptors (GPCRs) are targeted by approximately 1/3 of all FDA-approved drugs in current medicines [
      • Hauser AS
      • Chavali S
      • Masuho I
      • et al.
      Pharmacogenomics of GPCR drug targets.
      ] and adenosine, adrenergic, cannabinoid, dopamine, opioid, muscarinic, 5-hydroxytryptamine and vasopressin receptors are prominently represented on secondary pharmacology panels. 2) Ion channels such as hERG have long been associated with undesired effects as they control cellular excitability and hence neuron and muscle function. Other potassium voltage-gated channels (Kv7.1, MinK), the sodium channel Nav1.5 and calcium channel Cav1.2 are also screened as they are related to cardiovascular risks. 3) Enzymes such as cyclooxygenase 2 (COX2) and monoamine oxidase (MAO) are associated with cardiovascular risks and hence are included in the screening panel [
      • Finckh A
      • Aronson MD.
      Cardiovascular risks of cyclooxygenase-2 inhibitors: where we stand now.
      ]. 4) Transporters are part of a big family and become increasingly important as we have more insight into their pathophysiological roles [
      • Lin L
      • Yee SW
      • Kim RB
      • Giacomini KM.
      SLC transporters as therapeutic targets: emerging opportunities.
      ]. The minimum panel includes dopamine, noradrenaline and serotonin transporters, which have a high hit rate and are associated with abnormal blood pressure and abuse liability. 5) Nuclear receptors include two targets, the androgen receptor and glucocorticoid receptor.
      Because the mechanisms of adverse effects are not fully understood and target-based assays may be insufficient to estimate the biological effects of a compound, also a variety of hypothesis-based phenotypic assays are employed to profile the biological response to compound treatment. For example, cytotoxicity [
      • Kepp O
      • Galluzzi L
      • Lipinski M
      • Yuan J
      • Kroemer G.
      Cell death assays for drug discovery.
      ], which may be indicative of neurodegenerative disorders, stroke, myocardial infarction etc., cellular stress response assays, genotoxicity and carcinogenicity assays [
      • Tcheremenskaia O
      • Battistelli CL
      • Giuliani A
      • Benigni R
      • Bossa C.
      In silico approaches for prediction of genotoxic and carcinogenic potential of cosmetic ingredients.
      ,] are frequently implemented to anticipate compound toxicity [
      • Simmons SO
      • Fan CY
      • Ramabhadran R.
      Cellular stress response pathway system as a sentinel ensemble in toxicological screening.
      ].

      Hypothesis-free assays

      Hypothesis-free assays measure general changes in the model system and are not targeted at a specific readout. This means that the applicability of the assay needs to be evaluated for each endpoint. As for all in vitro models, the interpolation of hypothesis-free assays to in vivo effects is not a trivial task as effects can depend on the exact model system used, dose, timepoint of administration and other factors. While modelling of phenotypic screening data will be discussed in the subsequent section 3, we will first introduce the background of common types of large-scale technologies for hypothesis-free compound profiling.
      One class of hypothesis-free assays, focussing on “-omics”, measures the abundance of specific biological entities (Fig. 4). Thereby, different technologies measure different types of biomolecules and these are continuously improving in terms of coverage, scalability and resolution [
      • Herholt A
      • Galinski S
      • Geyer PE
      • Rossner MJ
      • Wehr MC.
      Multiparametric assays for accelerating early drug discovery.
      ]. One advantage of omics technologies is that there is already a large pool of public data, e.g. signatures characterising diseases, drugs or targeted perturbations of individual genes or pathways, using which it is possible to identify similar or complementary signatures [
      • Iorio F
      • Rittman T
      • Ge H
      • Menden M
      • Saez-Rodriguez J.
      Transcriptional data: a new gateway to drug repositioning?.
      ], and knowledge about the measured entities, e.g. in the form of pathways or interaction networks, in particular for genes and gene products. Consequently, these can be easily mapped to a broader biological context providing mechanistic insight into the cellular state. Furthermore, the same methodology can be used to characterise different cell-based model systems, as well as living organisms, allowing versatile comparisons not only of perturbation response between compounds but also between biological systems. This enables studies on in vitro to in vivo extrapolation (IVIVE) which can provide insight into how well a model system resembles the in vivo response it tries to mimic [
      • Harrill J
      • Shah I
      • Setzer RW
      • et al.
      Considerations for strategic use of high-throughput transcriptomics chemical screening data in regulatory decisions.
      ]. For instance, it was shown that cancer cell lines showed similar transcriptomic responses to primary hepatocytes, which are frequently used as in vitro model for DILI, while concordance was poorer in comparison to liver gene expression profiles derived from repeat-dose studies in rats [
      • Liu Z
      • Zhu L
      • Thakkar S
      • Roberts R
      • Tong W.
      Can transcriptomic profiles from cancer cell lines be used for toxicity assessment?.
      ].
      Fig 4:
      Fig. 4Modalities of cellular response to compound perturbation and hypothesis-free assays which measure it.
      While different kinds of omics can be derived from all kinds of cell-based systems (Table 2, Table 3), we here focus on transcriptomics from in vitro cell cultures as these approaches are particularly cost-efficient and hence also particularly suited for large-scale compound screening. The two key technologies in this regard are microarrays, which quantify the abundance of a predefined set of mRNAs, and RNA-Sequencing, RNA-Seq, where captured mRNAs are instead sequenced so that in theory all sequences can be detected [
      • Lowe R
      • Shirley N
      • Bleackley M
      • Dolan S
      • Shafee T.
      Transcriptomics technologies.
      ]. While microarrays have been the more popular technology of choice in the early 2000s, RNA sequencing has overtaken more recently due to continuously decreasing costs. For both technologies, cheaper high-throughput versions have been developed which are employed for compound screening. In this regard, the array-based L1000 platform should be highlighted which measures the expression of 978 landmark genes and has been used to characterise 19,811 compounds [
      • Subramanian A
      • Narayan R
      • Corsello SM
      • et al.
      A next generation connectivity map: l1000 platform and the first 1,000,000 profiles.
      ] making it the most comprehensive resource for uniformly generated transcriptomic data on chemical perturbations. One major limitation, however, is the limited number of measured genes although these were picked to be non-redundant so that 9,196 genes can be predicted with high fidelity. More recently, multiple sequencing-based platforms have been published including the DRUG-seq platform which overcomes this limitation and measures whole transcriptome changes for 2-4$ per sample [
      • Ye C
      • Ho DJ
      • Neri M
      • et al.
      DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery.
      ], as well as targeted sequencing technologies such as TempoSeq [
      • Yeakley JM
      • Shepard PJ
      • Goyena DE
      • VanSteenhouse HC
      • McComb JD
      • Seligmann BE.
      A trichostatin A expression signature identified by TempO-Seq targeted whole transcriptome profiling.
      ]. Beyond increasing throughput and coverage, omics technologies are also improving in terms of resolution, providing measurements of single cells and in a spatial context which is particularly interesting for more complex systems modelling cell-cell interactions. While this is yet less explored in the context of chemical perturbations, the sci-Plex method should be mentioned which allows affordable measurement of single-cell transcriptomics from multiple experimental conditions through massive multiplexing and has been used to characterise 188 perturbations [
      • Srivatsan SR
      • McFaline-Figueroa JL
      • Ramani V
      • et al.
      Massively multiplex chemical transcriptomics at single-cell resolution.
      ]. Hence, transcriptomics is now an affordable and widely implemented technology which allows profiling at a wide range of resolution and throughput levels.
      Table 3Summary of datasets from different platforms used to profile compounds on different biological levels.
      ReadoutDatasetTechnologyModel systemnCompoundsReplicatesDoses (without vehicle)Time
      TranscriptomicsTG-GATEs
      • Igarashi Y
      • Nakatsu N
      • Yamashita T
      • et al.
      Open TG-GATEs: a large-scale toxicogenomics database.
      MicroarrayRats17033 doses3h, 6h, 9h, 12h, 24h, 4d, 8d, 15d, 29d
      DrugMatrix
      • Chen M
      • Zhang M
      • Borlak J
      • Tong W.
      A decade of toxicogenomic research and its contribution to toxicological science.
      MicroarrayRats6273Mostly 1 or 2 dosesMostly 1,3,5 and 0.25 days
      DRUG-Seq
      • Ye C
      • Ho DJ
      • Neri M
      • et al.
      DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery.
      Targeted RNA-seqU2OS cells43338 (10, 3.2, 1, 0.32 and 0.1 μM, 32, 10 and 3.2 nM)12h
      LINCS
      • Subramanian A
      • Narayan R
      • Corsello SM
      • et al.
      A next generation connectivity map: l1000 platform and the first 1,000,000 profiles.
      Targeted Microarray (L1000)71 cell lines in total, mostly VCAP, MCF7, PC3, A54919,811VariableMostly 5 and 10 µMMostly 24h and 6h
      CMAP
      • Lamb J
      • Crawford ED
      • Peck D
      • et al.
      The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease.
      MicroarrayMostly MCF7, also PC3, HL60, SKMEL5, ssMCF71,309Mostly 1-2Mostly 10 μMMostly 6h, also 12h
      sci-Plex
      • Srivatsan SR
      • McFaline-Figueroa JL
      • Ramani V
      • et al.
      Massively multiplex chemical transcriptomics at single-cell resolution.
      scRNA-seqA549, K562, MCF71882 (∼ 100 - 200 cells each)4 (10 and 100 nM, 1 and 10 μM)24 h
      Cell imagingCellPainting
      • Bray MA
      • Gustafsdottir SM
      • Rohban MH
      • et al.
      A dataset of images and morphological profiles of 30 000 small-molecule treatments using the cell painting assay.
      single microscopy-based assayU2OS30,6161- 8 replicatesMostly 3, 5 and 10 µM24 h and 48 hr
      Janssen

      Cox MJ, Jaensch S, Van de Waeter J, et al. Tales of 1,008 small molecules: phenomic profiling through live-cell imaging in a panel of reporter cell lines. doi:10.1101/2020.03.13.990093

      single microscopy-based assay15 reporter cell lines based on A549, HepG2, and WPMY11,000+24 (0.3, 1, 3, and 9 µM)24 h
      Bioactivity assaysToxCast
      • Richard AM
      • Judson RS
      • Houck KA
      • et al.
      ToxCast chemical landscape: paving the road to 21st century toxicology.
      varies according to individual tox assayA vast array of cell lines100 - 8,000 per assayOne assay hit call--
      Klaeger et al.
      • Klaeger S
      • Heinzlmeir S
      • Wilhelm M
      • et al.
      The target landscape of clinical kinase drugs.
      Binding measured by Kinobeads/Mass Spectrometry-243--
      Cell ImagingNCI60
      • Shoemaker RH.
      The NCI60 human tumour cell line anticancer drug screen.
      Growth inhibition measured by absorbance60 cell lines284,176-1 or 5 doses for dose response48 h
      A more recent type of cell profiling that is distinct from omics is derived from microscopy imaging of cell cultures, which represents changes in cell morphology (Table 3). One assay for deriving high-dimensional cell morphology data which has been heavily researched is the Cell Painting assay used by the Broad Institute [
      • Bray MA
      • Singh S
      • Han H
      • et al.
      Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes.
      ] for screening compound perturbations in human osteosarcoma (U2OS) cells. In this assay, the effects on the nucleus, nucleoli, cytoplasmic RNA, endoplasmic reticulum, Golgi apparatus, plasma membrane and the actin cytoskeleton are measured using live cell staining with respect to neutral DMSO control.
      As shown in Fig. 5, the Cell Painting assay output can be thought of as a versatile descriptor of cellular response and characterises compounds based on the phenotypic changes in the stained cell organelle. The retrieved morphological traits include shape, and adjacency statistics, as well as intensity, texture, microenvironment, and context features. For example, texture features identify periodic variations in cell intensity and are e.g. used for single-cell analysis and to assess the intensity regularity in images.
      Fig 5
      Fig. 5Diverse phenotypes across compounds as captured in U2OS cells in the Cell Painting assay compared to neutral control DMSO. In the Cell Painting assay, cells are perturbed with chemicals in a multi-well plate setup and the key organelles and sub-compartments are stained using a combination of six well-characterised fluorescent dyes: Hoechst 33342 (DNA), concanavalin A (endoplasmic reticulum), SYTO 14 (nucleoli and cytoplasmic RNA), phalloidin (actin) and WGA (Golgi and plasma membrane), and MitoTracker Deep Red (mitochondria). For example, microtubule stabilisers podophyllotoxin and paclitaxel show giant, multinucleated cells. These images were extracted from https://idr.openmicroscopy.org/webclient/?show=screen-1952.
      However, the high dimensionality, noise, and redundancy of cell image features pose some limitations when used directly as features for modelling [
      • Chandrasekaran SN
      • Ceulemans H
      • Boyd JD
      • Carpenter AE.
      Image-based profiling for drug discovery: due for a machine-learning upgrade?.
      ]. Furthermore, since these features are more computational and derived from images (correlations, adjacency, etc. between objects and images), their relevance in biological processes is not very well understood. There is some work towards interpretability, for example, it was found that features related to radial distribution and intensity in mitochondria object could be related to mitochondrial death [
      • Seal S
      • Carreras-Puigvert J
      • Trapotsi MA
      • Yang H
      • Spjuth O
      • Bender A.
      Integrating cell morphology with gene expression and chemical structure to aid mitochondrial toxicity detection.
      ].
      As the assays are generated to address a wide range of applications instead of having a clear target hypothesis in mind, an additional challenge is to determine how much signal a given readout contains for a property or a compound of interest. The applications section on hypothesis-free assays below discusses how these assays can and have been applied to model diverse endpoints.

      Computational approaches to model biological response

      Data and knowledge resources: Mapping between chemical structure, in vitro and in vivo readouts can be helpful to anticipate the properties of unseen compounds with predictive models, and to understand the critical events linked to the phenotype with mechanistic models, which will both be discussed in the subsequent sections. In either case, it is necessary to integrate data types describing different levels of information (Table 2). As the compound coverage between these data sources varies widely, we provide a summary of the overlaps between compounds, compared with standardised InChI, in Fig. 6 for the datasets previously introduced in Table 3.
      Fig 6:
      Fig. 6Compound overlaps between in vitro and in vivo datasets. (A) Compound overlap between bioactivity assays from Klaeger et al.
      [
      • Klaeger S
      • Heinzlmeir S
      • Wilhelm M
      • et al.
      The target landscape of clinical kinase drugs.
      ]
      and NCI60
      [
      • Shoemaker RH.
      The NCI60 human tumour cell line anticancer drug screen.
      ]
      ; cell morphology datasets derived from Cell Painting
      [
      • Bray MA
      • Gustafsdottir SM
      • Rohban MH
      • et al.
      A dataset of images and morphological profiles of 30 000 small-molecule treatments using the cell painting assay.
      ]
      and by Jannsen
      [

      Cox MJ, Jaensch S, Van de Waeter J, et al. Tales of 1,008 small molecules: phenomic profiling through live-cell imaging in a panel of reporter cell lines. doi:10.1101/2020.03.13.990093

      ]
      ; transcriptomics from LINCS
      [
      • Subramanian A
      • Narayan R
      • Corsello SM
      • et al.
      A next generation connectivity map: l1000 platform and the first 1,000,000 profiles.
      ]
      , sci-Plex
      [
      • Srivatsan SR
      • McFaline-Figueroa JL
      • Ramani V
      • et al.
      Massively multiplex chemical transcriptomics at single-cell resolution.
      ]
      and DRUG-Seq
      [
      • Ye C
      • Ho DJ
      • Neri M
      • et al.
      DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery.
      ]
      , and in vivo data from DrugMatrix
      [
      • Chen M
      • Zhang M
      • Borlak J
      • Tong W.
      A decade of toxicogenomic research and its contribution to toxicological science.
      ]
      , TG-GATEs
      [
      • Igarashi Y
      • Nakatsu N
      • Yamashita T
      • et al.
      Open TG-GATEs: a large-scale toxicogenomics database.
      ]
      (B) Intersections between 3 or more of the introduced datasets with at least 50 compounds.
      Besides compound-specific data, it is additionally possible to take advantage of prior knowledge on biological entities and their relations between each other to put the available data into a broader known biological context. For instance, on the cellular level, the Gene Ontology [
      • Ashburner M
      • Ball CA
      • Blake JA
      • et al.
      Gene ontology: tool for the unification of biology. The gene ontology consortium.
      ] as well as pathway databases [
      • Jassal B
      • Matthews L
      • Viteri G
      • et al.
      The reactome pathway knowledgebase.
      ,
      • Martens M
      • Ammar A
      • Riutta A
      • et al.
      WikiPathways: connecting communities.
      ,
      • Liberzon A
      • Subramanian A
      • Pinchback R
      • Thorvaldsdóttir H
      • Tamayo P
      • Mesirov JP.
      Molecular signatures database (MSigDB) 3.0.
      ,
      • Kanehisa M
      • Furumichi M
      • Tanabe M
      • Sato Y
      • Morishima K.
      KEGG: new perspectives on genomes, pathways, diseases and drugs.
      ] provide further insight into a protein's function while others provide information on protein family and domains [
      • Blum M
      • Chang HY
      • Chuguransky S
      • et al.
      The InterPro protein families and domains database: 20 years on.
      ]. Also, physical interactions between proteins [
      • Türei D
      • Korcsmáros T
      • Saez-Rodriguez J.
      OmniPath: guidelines and gateway for literature-curated signaling pathway resources.
      ], co-expression [
      • Langfelder P
      • Horvath S.
      WGCNA: an R package for weighted correlation network analysis.
      ] or text-mining [
      • Szklarczyk D
      • Gable AL
      • Nastou KC
      • et al.
      The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets.
      ] can provide useful information on shared biological functions which can be formalized as networks. While our understanding of higher biological scales is less complete, some databases link these to the molecular and cellular level based on known or statistical associations [
      • Ochoa D
      • Hercules A
      • Carmona M
      • et al.
      Open targets platform: supporting systematic drug–target identification and prioritisation.
      ,
      • Mattingly CJ
      • Colby GT
      • Forrest JN
      • Boyer JL.
      The comparative toxicogenomics database (CTD).
      ,
      • Kuhn M
      • Letunic I
      • Jensen LJ
      • Bork P.
      The SIDER database of drugs and side effects.
      ,
      • Tatonetti NP
      • Ye PP
      • Daneshjou R
      • Altman RB.
      Data-driven prediction of drug effects and interactions.
      ,
      • Piñero J
      • Bravo À
      • Queralt-Rosinach N
      • et al.
      DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants.
      ], as well as ontologies relating different tissue- and organ-level phenotypes to each other [
      • Pastor M
      • Quintana J
      • Sanz F.
      Development of an infrastructure for the prediction of biological endpoints in industrial environments. Lessons learned at the eTOX project.
      ,

      Smalheiser NR, Bonifield G. Two similarity metrics for medical subject headings (MeSH): an aid to biomedical text mining and author name disambiguation. doi:10.1101/039008

      ]. Each of the described relations, as well as a combination thereof, can be modelled as networks [
      • Vulliard L
      • Menche J.
      Complex networks in health and disease.
      ,
      • Schaffer LV
      • Ideker T.
      Mapping the multiscale structure of biological systems.
      ] which can be a prior knowledge-driven approach to quantify compound similarity or, e.g. in the context of drug combinations, also complementarity.

      Predicting toxicity and other endpoints: applications of hypothesis-based assays

      If enough compounds have been measured using a given assay, this information can be used to predict the activity of novel compounds using so-called Quantitative Structure-Activity Relationship (QSAR) models which are based on supervised learning (Fig. 7). In these models, the chemical descriptors, as outlined in Table 1, are used to characterise the compound and the already existing data is used to train a predictive model. While this approach is generally feasible for any kind of assay and endpoint, it works well for common assays with abundant public data, such as in case of assays used in secondary pharmacology screening. The advantage of QSAR is that only chemical structure is required to predict a new compound while also the cost of computing is far lower than for approaches based on molecular modelling and simulation. Furthermore, various kinds of properties can be modelled, e.g. chemical structure cannot only be linked to target proteins but also to more complex readouts characterising biological responses such as those described in Table 2 [
      • Bertoni M
      • Duran-Frigola M
      • Badia-I-Mompel P
      • et al.
      Bioactivity descriptors for uncharacterized chemical compounds.
      ]. However, there are also drawbacks to QSAR models, which are fundamentally linked to the fact that the model only predicts what was already seen in the training data: for instance, data imbalance might affect model performance [
      • Bender A
      • Cortes-Ciriano I.
      Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data.
      ], and only modes of action seen in the training set can be successfully predicted.
      Fig 7:
      Fig. 7Supervised learning of properties (Y) from compound descriptors (X). Machine learning algorithms, such as Random Forests or Support Vector Machines can learn how to predict compound properties (Y), e.g. in vivo phenotypes or hypothesis-based assays, from sets of descriptors (X), e.g. based on chemical structure or assay readouts.
      In addition to QSAR models, which use chemical structure to model compound activity against a certain protein, it is also possible to use already known chemical-protein associations to predict new potential connections[
      • Wu Z
      • Li W
      • Liu G
      • Tang Y.
      Network-based methods for prediction of drug-target interactions.
      ]. An advantage of these kinds of approaches is that negative samples are not compulsory because chemical-protein associations are sparse (most of the bioactivities between the pairs are missing) and we can assume that most compounds are inactive for most target proteins. The basic concept underlying this method class is that if two compounds have similar bioactivities in target assays, it is likely that they have more common targets than those already measured. This approach is particularly useful when the bioactivity of a compound is already well-known, e.g. from previous screening panels. However, one limitation of the current approaches is that they cannot quantitatively assess the binding affinity, which may reduce the impact of the models compared to QSAR-based models.
      Also, bioactivity readouts themselves can be used as compound descriptors if multiple targets are known to be linked to the phenotype. For example, Liu et al. used high-throughput screening data from ToxCast to predict the five most frequent chronic organ-level adverse outcomes (liver, kidney, adrenal gland, and lung) and found that bioactivity descriptors produced more accurate classifications than chemical descriptors in supervised models [
      • Liu J
      • Patlewicz G
      • Williams AJ
      • Thomas RS
      • Shah I.
      Predicting organ toxicity using in vitro bioactivity data and chemical structure.
      ]. However, it should be noted that bioactivity data needs to be generated in the first place, and may not be available, especially for new chemicals. In this case, computationally predicted bioactivity can then be used to fill the missing values [
      • Liu A
      • Walter M
      • Wright P
      • et al.
      Prediction and mechanistic analysis of Drug-Induced Liver Injury (DILI) based on chemical structure.
      ]. This integration of computational bioactivity fingerprints with QSAR was for instance previously found to be beneficial for scaffold hopping when compared to chemical descriptors[
      • Xiong GL
      • Zhao Y
      • Liu L
      • et al.
      Computational bioactivity fingerprint similarities to navigate the discovery of novel scaffolds.
      ].

      Predicting toxicity and other endpoints: applications of hypothesis-free assays

      Biological features from hypothesis-free assays can be used for a wide variety of purposes in the context of compound profiling, including unsupervised identification of functionally similar compounds and supervised prediction of specific properties. While the overall aim in computational toxicology is to predict adverse effects, in vivo data is generally rare which also limits how well the predictivity of an assay or descriptor can be evaluated. In contrast, in vitro data, e.g. from secondary pharmacology screening, is much more abundant and can hence provide a useful proxy to establish signals in assays, e.g. in comparison to chemical structure. We will hence discuss how hypothesis-free assays introduced above, namely transcriptomics and image-based cell profiling, compare to ligand-based descriptors and how these have been applied to model in vitro and in vivo endpoints.
      Among transcriptomic signatures from compound perturbations, a significant similarity was found in 20% of cases for two structurally similar chemicals [
      • Chen B
      • Greenside P
      • Paik H
      • Sirota M
      • Hadley D
      • Butte AJ.
      Relating chemical structure to cellular response: an integrative analysis of gene expression, bioactivity, and structural data across 11,000 compounds.
      ]. This shows that while chemical similarity indeed indicates a more likely similar functional effect, there are discrepancies which are not fully understood and may be a result of both technical noise and biological signal in the data. It was previously found that a drug-drug similarity network based on transcriptional signatures was able to identify ATC class enriched modules indicating that gene expression response is informative for a compound's mechanism of action [
      • Musa A
      • Tripathi S
      • Dehmer M
      • Yli-Harja O
      • Kauffman SA
      • Emmert-Streib F.
      Systems pharmacogenomic landscape of drug similarities from LINCS data: drug association networks.
      ]. Furthermore, it was found that gene expression performed better compared to chemical structure in 25% of target prediction tasks [
      • Baillif B
      • Wichard J
      • Méndez-Lucio O
      • Rouquié D.
      Exploring the use of compound-induced transcriptomic data generated from cell lines to predict compound activity toward molecular targets.
      ], and some studies also found that the combination of both achieved even better performance in the context of adverse event prediction [
      • Wang Z
      • Clark NR
      • Ma'ayan A.
      Drug-induced adverse events prediction with the LINCS L1000 data.
      ,
      • Gardiner LJ
      • Carrieri AP
      • Wilshaw J
      • Checkley S
      • Pyzer-Knapp EO
      • Krishna R
      Using human in vitro transcriptome analysis to build trustworthy machine learning models for prediction of animal drug toxicity.
      ].
      The L1000 platform has also been used in the context of lead identification and optimization with the underlying assumption that it is possible to compare and prioritise compounds with respect to their expression signature instead of their chemical structure. Janssen profiled 31K compounds of the primary screening deck using the L1000 platform and found that these expression profiles were able to detect active and chemically diverse compounds in a case study on activity against the glucocorticoid receptor and the Hsp90 chaperone. Consequently, it has been suggested as a way for scaffold hopping [
      • De Wolf H
      • Cougnaud L
      • Van Hoorde K
      • et al.
      High-throughput gene expression profiles to define drug similarity and predict compound activity.
      ], a direction which was further explored by Mendez-Lucio et al. in generative models predicting hit-like molecules from gene expression [
      • Méndez-Lucio O
      • Baillif B
      • Clevert DA
      • Rouquié D
      • Wichard J.
      De novo generation of hit-like molecules from gene expression signatures using artificial intelligence.
      ]. Furthermore, it was explored as readout guiding lead optimisation comparably to QSAR and was found to be particularly useful in flagging off-target effects [
      • Verbist B
      • Klambauer G
      • Vervoort L
      • et al.
      Using transcriptomics to guide lead optimization in drug discovery projects: lessons learned from the QSTAR project.
      ]. While the transcriptome itself describes a broad biological response and hence not a specific phenotype, it's often found that only a small subset of genes is needed to predict endpoints [
      • Sutherland JJ
      • Webster YW
      • Willy JA
      • et al.
      Toxicogenomic module associations with pathogenesis: a network-based approach to understanding drug toxicity.
      ,
      • Kohonen P
      • Parkkinen JA
      • Willighagen EL
      • et al.
      A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury.
      ]. These predictive genes cannot only give insights into changes in biological processes but can practically lead to the development of targeted gene panels which aim to predict concrete endpoints at even lower cost.
      High-throughput imaging data describing cell morphology can also be interpreted as a biological fingerprint that is characteristic for a compound, at least to the extent to which activity is visible in readout space. One of the first studies using morphological data [
      • Simm J
      • Klambauer G
      • Arany A
      • et al.
      Repurposing high-throughput image assays enables biological activity prediction for drug discovery.
      ,
      • Lapins M
      • Spjuth O.
      Evaluation of gene expression and phenotypic profiling data as quantitative descriptors for predicting drug targets and mechanisms of action.
      ] derived features from microscopy-based screens that were specifically built for glucocorticoid receptor nuclear translocation. Machine learning models were trained using cell morphological data to predict assay-specific biological activity. For two drug discovery projects, their findings showed a 60- to 250-fold improvement in hit rates over the initial high-throughput screens showing that cell morphology data can describe biological information not always contained in ligand structure. This has been further shown by Trapotsi et al. [
      • Trapotsi MA
      • Mervin LH
      • Afzal AM
      • et al.
      Comparison of chemical structure and cell morphology information for multitask bioactivity predictions.
      ] who directly compared ECFP and cell morphology (Cell Painting) information for bioactivity prediction. Out of 224 targets, they could predict around 45% of targets using ECFP and 40% using image data with high AUC-ROC (>0.80). They showed that both descriptors worked better in other cases showing the partially complementary nature of cell morphology features to ligand structure. Furthermore, morphological features have also been shown to predict cell health phenotypes [
      • Way GP
      • Kost-Alimova M
      • Shibue T
      • et al.
      Predicting cell health phenotypes using image-based morphology profiling.
      ], cytotoxicity and cell proliferation [
      • Seal S
      • Yang H
      • Vollmers L
      • Bender A.
      Comparison of cellular morphological descriptors and molecular fingerprints for the prediction of cytotoxicity- and proliferation-related assays.
      ], bioactivity and mechanism of action [
      • Simm J
      • Klambauer G
      • Arany A
      • et al.
      Repurposing high-throughput image assays enables biological activity prediction for drug discovery.
      ,
      • Lapins M
      • Spjuth O.
      Evaluation of gene expression and phenotypic profiling data as quantitative descriptors for predicting drug targets and mechanisms of action.
      ,
      • Trapotsi MA
      • Mervin LH
      • Afzal AM
      • et al.
      Comparison of chemical structure and cell morphology information for multitask bioactivity predictions.
      ,
      • Moshkov N
      • Becker T
      • Yang K.
      • et al.
      Predicting compound activity from phenotypic profiles and chemical structures.
      ] as well as organ toxicity such as drug-induced liver injury [
      • Chavan S
      • Scherbak N
      • Engwall M
      • Repsilber D.
      Predicting chemical-induced liver toxicity using high-content imaging phenotypes and chemical descriptors: a random forest approach.
      ].
      As both, transcriptomics and cell morphology, describe general biological response, the question arises whether both describe complementary information or whether there might be clear links between them. Previous studies on the relationship between changes in gene expression and cell morphology found that they could predict changes in cell morphology based on a transcriptomic query hence confirming that transcriptomic changes are related to morphological changes [
      • Nassiri I
      • McCall MN.
      Systematic exploration of cell morphological phenotypes associated with a transcriptomic query.
      ]. Way et al. showed that for the analysis of mechanism of action (MOA) classes, Cell Painting was more reproducible across profiles and could weakly cluster together a greater number of MOA classes while gene expression was better at predicting MOA and could cluster together a lesser number of MOA classes with a stronger signal [
      • Way GP
      • Natoli T
      • Adeboye A
      • et al.
      Morphology and gene expression profiling provide complementary information for mapping cell state.
      ]. When combining hypothesis-free data, either gene expression or cell morphology was able to produce a predictive model for many MOAs with AUC exceeding 0.70 suggesting they capture different aspects of the cell's response to chemical perturbations [
      • Lapins M
      • Spjuth O.
      Evaluation of gene expression and phenotypic profiling data as quantitative descriptors for predicting drug targets and mechanisms of action.
      ]. Overall, these studies show that, although transcriptomic changes are related to morphological changes, both are not redundant in terms of contained information [
      • Wawer MJ
      • Li K
      • Gustafsdottir SM
      • et al.
      Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling.
      ].
      Additional complementary information can be provided by ligand structure or chemical fingerprints, which is distinct from the biological information in cell morphology or transcriptomics data. Moshkov et al. compared chemical structures, cell morphology (Cell Painting), and gene expression profiles (L1000) for 270 assays from drug discovery projects where a combination of the feature spaces of morphology, gene expression, and chemical structure could predict 21% of all assays with high accuracy, compared to 6-10% when using single feature spaces [
      • Moshkov N
      • Becker T
      • Yang K.
      • et al.
      Predicting compound activity from phenotypic profiles and chemical structures.
      ]. Further they showed that compound bioactivity predictions could be improved further when combining predictions from phenotypic profiles and chemical fingerprints compared to using only one source of information. Overall, these investigations provide support for the use of gene expression and cell profiling data to advance areas such as target identification, mechanistic analysis, and toxicity prediction in conjunction with structural data.

      Mechanistic adverse outcome pathways

      Beyond prediction, data can also be used to derive mechanistic relevant events, e.g., on the molecular or cellular level, which are informative for adverse effects on the organism- or population-level. In the context of toxicology, for instance, the Adverse Outcome Pathway (AOP) framework was created which aims at describing event cascades describing the relation from Molecular Initiating Events (MIE) to the Adverse Event (AE) through Key Events (KE) on different biological levels [
      Users’ handbook supplement to the guidance document for developing and assessing adverse outcome pathways.
      ,
      • Ankley GT
      • Bennett RS
      • Erickson RJ
      • et al.
      Adverse outcome pathways: a conceptual framework to support ecotoxicology research and risk assessment.
      ]. Mechanisms can be formalised as directed event cascades or networks, which describe how upstream perturbations result in downstream phenotypic changes [
      • Knapen D
      • Angrish MM
      • Fortin MC
      • et al.
      Adverse outcome pathway networks I: development and applications.
      ]. In this context, for instance, target binding assays can give insight into MIEs, while -omics data provides insight into KEs on the cellular level (Fig. 8). However, it should also be noted that to estimate the risk of adverse effects in vivo it is not only necessary to understand potential MIE but also whether these are likely to take place at the given treatment dose which is strongly dependent on the compound's ADME properties.
      Fig 8:
      Fig. 8Adverse Outcome Pathways (AOP). AOPs describe the event cascades from the first interaction of a compound with the biological system, termed Molecular Initiating Event or MIE, to Key Events (KEs) on different biological levels to the Adverse Event (AE). Practically, AOPs can help to anticipate AEs by identifying useful intermediate events, which can be measured or estimated using suitable assays, and then in turn are informative for the likelihood of the subsequent events including the AE itself.
      Mechanistically understood MIE or KE can in turn guide the development of assays, such as the hypothesis-based assays discussed above, which help to anticipate potential risks for adverse effects (Fig. 8). In this context, it should be noted that knowledge on relevant targets is largely incomplete and is often only identified after side effects of drugs targeting a particular protein are observed [
      • Bowes J
      • Brown AJ
      • Hamon J
      • et al.
      Reducing safety-related drug attrition: the use of in vitro pharmacological profiling.
      ]. For instance, COX2 is part of the secondary pharmacology panel of all four companies studied by Bowes et al. and has only been linked to cardiovascular risks after rofecoxib, a selective inhibitor targeting COX2, was voluntarily withdrawn from the market due to potential risks of heart arrest and stroke [
      • Bowes J
      • Brown AJ
      • Hamon J
      • et al.
      Reducing safety-related drug attrition: the use of in vitro pharmacological profiling.
      ].
      When studying mechanisms of pathogenesis, the goal is to identify not only statistical associations but causal relations which can help rationalize decision-making. Different strategies exist to derive evidence for causality, e.g. in the AOP development guidelines, evidence for causality is defined based on the Bradford Hill criteria [
      • Fedak KM
      • Bernal A
      • Capshaw ZA
      • Gross S.
      Applying the Bradford Hill criteria in the 21st century: how data integration has changed causal inference in molecular epidemiology.
      ] and includes time concordance, dose concordance and incidence concordance as empirical criteria as well as biological plausibility and essentiality [
      Users’ handbook supplement to the guidance document for developing and assessing adverse outcome pathways.
      ]. While this is also true to some extent for the other criteria, biological plausibility is particularly suited to be evaluated based on the different available data types and the available prior biological knowledge [
      • Oki NO
      • Nelms MD
      • Bell SM
      • Mortensen HM
      • Edwards SW.
      Accelerating adverse outcome pathway development using publicly available data sources.
      ]. For example, Oki et al. integrate compound-disease associations from the Comparative Toxicogenomics Database (CTD) and compound-gene associations, either based on ToxCast or CTD, using frequent itemset mining to identify associations between genes and diseases [
      • Oki NO
      • Edwards SW.
      An integrative data mining approach to identifying adverse outcome pathway signatures.
      ], while Bell et al. derive compound-gene associations from ToxCast which are then mapped to compound-pathway associations and combined with pathway-phenotype interactions to a computationally predicted AOP network [
      • Bell SM
      • Angrish MM
      • Wood CE
      • Edwards SW.
      Integrating publicly available data to generate computationally predicted adverse outcome pathways for fatty liver.
      ]. Jointly, these kinds of knowledge graphs can depict the biological understanding at a given time and thereby can support hypothesis generation, e.g. in expert-driven AOP development. In the context of mechanistic hypotheses, particularly prior knowledge on causal interactions should be highlighted which can be used to infer upstream mechanisms resulting in transcriptomic changes [
      • Bradley G
      • Barrett SJ.
      CausalR: extracting mechanistic sense from genome scale data.
      ,
      • Liu A
      • Trairatphisan P
      • Gjerga E
      • Didangelos A
      • Barratt J
      • Saez-Rodriguez J.
      From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL.
      ] or to generate causal hypotheses linking cellular entities estimated from transcriptomics, phosphoproteomics and metabolomics [
      • Dugourd A
      • Kuppe C
      • Sciacovelli M
      • et al.
      Causal integration of multi-omics data with prior knowledge to generate mechanistic hypotheses.
      ]. However, also e.g. time or dose concordance, can be explored in a data-driven manner given the right dataset [
      • Liu A
      • Han N
      • Munoz-Muriedas J
      • Bender A.
      Deriving time-concordant event cascades from gene expression data: a case study for Drug-Induced Liver Injury (DILI).
      ].
      To derive practically useful event relationships from computationally inferred mechanisms, these need to be further characterized with respect to their applicability domain, the strength of evidence and predictive performance, which are some of the aims of quantitative AOPs (qAOPs) [
      • Spinu N
      • Cronin MTD
      • Enoch SJ
      • Madden JC
      • Worth AP.
      Quantitative adverse outcome pathway (qAOP) models for toxicity prediction.
      ,
      • Perkins EJ
      • Gayen K
      • Shoemaker JE
      • et al.
      Chemical hazard prediction and hypothesis testing using quantitative adverse outcome pathways.
      ]. For instance, mechanistic qAOPs evaluate dose-response and time-course behaviour using experiments targeting individual event relationships. However, once an AOP is confidently established this information can be used to predict AEs from earlier MIEs or KEs. For instance, Burgoon et al. predicted a compound's probability to cause steatosis from in vitro targets representing known MIE using an AOP-based Bayesian network [
      • Burgoon LD
      • Angrish M
      • Garcia-Reyero N
      • Pollesch N
      • Zupanic A
      • Perkins E.
      Predicting the probability that a chemical causes steatosis using Adverse Outcome Pathway Bayesian Networks (AOPBNs).
      ]. Overall, this shows how computational approaches can, both, support the identification of mechanistic knowledge and help to leverage this knowledge practically in the form of predictive models.

      Conclusions and future directions

      While there are already in vitro model-based strategies to link compound structure to in vivo safety, it is often unclear which specific model, assay or readout should be included to detect toxic drug candidates as early as possible in drug discovery. However, in vitro assays are advancing on the experimental and technical side resulting in more relevant model systems, such as organoids and microfluidic systems, as well as better readouts with higher throughput, e.g. achieved through assay multiplexing. Consequently, it can be anticipated that more in vitro data will become available to characterise the biological effects of compounds, hence also resulting in better models allowing to bridge chemical space to complex biological endpoints. This will not only help to better predict biological properties of new compounds but also will help untangle how the underlying biology functions, and consequently also how these effects can potentially be mitigated in drug discovery projects.

      Funding

      A.L. received funding from GlaxoSmithKline. S.S. acknowledges funding from the Cambridge Commonwealth, European and International Trust, Boak Student Support Fund (Clare Hall), Jawaharlal Nehru Memorial Fund, Allen, Meek and Read Fund, and Trinity Henry Barlow (Trinity College). S.S. acknowledges support with funding from the Cambridge Centre for Data Driven Discovery and Accelerate Programme for Scientific Discovery under the project title “Theoretical, Scientific, and Philosophical Perspectives on Biological Understanding in the Age of Artificial Intelligence”, made possible by a donation from Schmidt Futures. H.Y. acknowledges support from the Cambridge Alliance on Medicines Safety (CAMS).

      Declaration of Competing Interest

      The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:
      Anika Liu reports financial support was provided by GlaxoSmithKline. Srijit Seal reports financial support was provided by Cambridge Commonwealth, European and International Trust, Boak Student Support Fund (Clare Hall), Jawaharlal Nehru Memorial Fund, Allen, Meek and Read Fund, and Trinity Henry Barlow (Trinity College). Srijit Seal reports financial support was provided by Cambridge Centre for Data Driven Discovery and Accelerate Programme for Scientific Discovery made possible by a donation from Schmidt Futures. Hongbin Yang reports financial support was provided by Cambridge Alliance on Medicines Safety (CAMS). Anika Liu reports a relationship with Boehringer Ingelheim Pharma GmbH & Co. KG that includes: employment. Andreas Bender reports a relationship with Healx Ltd that includes: equity or stocks. Andreas Bender reports a relationship with PharmEnable Ltd that includes: equity or stocks. Andreas Bender reports a relationship with Pangea Botanica that includes: employment. Srijit Seal reports a relationship with AbsoluteAI Ltd that includes: employment.

      Acknowledgments

      The authors would like to thank members of the Bender Group (University of Cambridge) for their valuable input during the preparation of this work.

      References

        • Alves VM
        • Muratov EN
        • Zakharov A
        • Muratov NN
        • Andrade CH
        • Tropsha A.
        Chemical toxicity prediction for major classes of industrial chemicals: is it possible to develop universal models covering cosmetics, drugs, and pesticides?.
        Food Chem Toxicol. 2018; 112: 526-534https://doi.org/10.1016/j.fct.2017.04.008
        • Labbe G
        • Pessayre D
        • Fromenty B.
        Drug-induced liver injury through mitochondrial dysfunction: mechanisms and detection during preclinical safety studies.
        Fundam Clin Pharmacol. 2008; 22: 335-353
        • Glem RC
        • Bender A
        • Arnby CH
        • Carlsson L
        • Boyer S
        • Smith J.
        Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME.
        IDrugs. 2006; 9: 199-204
        • Tropsha A.
        Predictive quantitative structure–activity relationship modeling.
        Comprehensive Medicinal Chemistry II. Elsevier, 2007: 149-165
        • Brown RD
        • Martin YC.
        The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding.
        J Chem Inf Comput Sci. 1997; 37: 1-9
        • Wu Z
        • Ramsundar B
        • Feinberg EN
        • et al.
        MoleculeNet: a benchmark for molecular machine learning.
        Chem Sci. 2018; 9: 513-530
        • Rogers D
        • Hahn M.
        Extended-connectivity fingerprints.
        J Chem Inf Model. 2010; 50: 742-754
        • Carhart RE
        • Smith DH
        • Venkataraghavan R.
        Atom pairs as molecular features in structure-activity studies: definition and applications.
        J Chem Inf Comput Sci. 1985; 25: 64-73
        • Durant JL
        • Leland BA
        • Henry DR
        • Nourse JG.
        Reoptimization of MDL keys for use in drug discovery.
        J Chem Inf Comput Sci. 2002; 42: 1273-1280
        • Axen SD
        • Huang XP
        • Cáceres EL
        • Gendelev L
        • Roth BL
        • Keiser MJ.
        A simple representation of three-dimensional molecular structure.
        J Med Chem. 2017; 60: 7393-7409
        • Todeschini R
        • Gramatica P
        • Provenzani R
        • Marengo E.
        Weighted holistic invariant molecular descriptors. Part 2. Theory development and applications on modeling physicochemical properties of polyaromatic hydrocarbons.
        Chemometrics Intell Lab Syst. 1995; 27: 221-229
        • Bender A
        • Glen RC.
        Molecular similarity: a key technique in molecular informatics.
        Org Biomol Chem. 2004; 2: 3204-3218
        • Maggiora GM.
        On outliers and activity cliffs–why QSAR often disappoints.
        J Chem Inf Model. 2006; 46: 1535
        • Kalgutkar AS.
        Designing around structural alerts in drug discovery.
        J Med Chem. 2020; 63: 6276-6302
        • Duran-Frigola M
        • Pauls E
        • Guitart-Pla O
        • et al.
        Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.
        Nat Biotechnol. 2020; 38: 1087-1096
        • Pollard TD.
        A guide to simple and informative binding assays.
        Mol Biol Cell. 2010; 21: 4061-4067
        • Croston GE.
        The utility of target-based discovery.
        Expert Opin Drug Discov. 2017; 12: 427-429
        • Hartung T.
        Food for thought ... on cell culture.
        ALTEX, 2007: 143-147https://doi.org/10.14573/altex.2007.3.143 (Published online)
        • Coecke S
        • Ahr H
        • Blaauboer BJ
        • et al.
        Metabolism: a bottleneck in in vitro toxicological test development. The report and recommendations of ECVAM workshop 54.
        Altern Lab Anim. 2006; 34: 49-84
        • Gerets HHJ
        • Tilmant K
        • Gerin B
        • et al.
        Characterization of primary human hepatocytes, HepG2 cells, and HepaRG cells at the mRNA level and CYP activity in response to inducers and their predictivity for the detection of human hepatotoxins.
        Cell Biol Toxicol. 2012; 28: 69-87https://doi.org/10.1007/s10565-011-9208-4
        • Marx U
        • Akabane T
        • Andersson TB
        • et al.
        Biology-inspired microphysiological systems to advance patient benefit and animal welfare in drug development.
        2020
        • Skardal A
        • Aleman J
        • Forsythe S
        • et al.
        Drug compound screening in single and integrated multi-organoid body-on-a-chip systems.
        Biofabrication. 2020; 12025017
        • Phan N
        • Hong JJ
        • Tofig B
        • et al.
        A simple high-throughput approach identifies actionable drug sensitivities in patient-derived tumor organoids.
        Communications Biology. 2019; 2: 1-11
        • Cong Y
        • Han X
        • Wang Y
        • et al.
        Drug toxicity evaluation based on organ-on-a-chip technology: a review.
        Micromachines. 2020; 11https://doi.org/10.3390/mi11040381
        • Ma C
        • Peng Y
        • Li H
        • Chen W.
        Organ-on-a-chip: a new paradigm for drug development.
        Trends Pharmacol Sci. 2021; 42: 119-133
        • Watanabe H
        • Honda Y
        • Deguchi J
        • Yamada T
        • Bando K.
        Usefulness of cardiotoxicity assessment using calcium transient in human induced pluripotent stem cell-derived cardiomyocytes.
        J Toxicol Sci. 2017; 42: 519-527
        • Armitage JM
        • Wania F
        • Arnot JA.
        Application of mass balance models and the chemical activity concept to facilitate the use of in vitro toxicity data for risk assessment.
        Environ Sci Technol. 2014; 48: 9770-9779
        • Fischer FC
        • Henneberger L
        • König M
        • et al.
        Modeling exposure in the Tox21 in vitro bioassays.
        Chem Res Toxicol. 2017; 30: 1197-1208
        • Proença S
        • Escher BI
        • Fischer FC
        • et al.
        Effective exposure of chemicals in in vitro cell systems: a review of chemical distribution models.
        Toxicol In Vitro. 2021; 73105133
        • Herland A
        • Maoz BM
        • Das D
        • et al.
        Quantitative prediction of human pharmacokinetic responses to drugs via fluidically coupled vascularized organ chips.
        Nat Biomed Eng. 2020; 4: 421-436
        • Gusenleitner D
        • Auerbach SS
        • Melia T
        • Gómez HF
        • Sherr DH
        • Monti S.
        Genomic models of short-term exposure accurately predict long-term chemical carcinogenicity and identify putative mechanisms of action.
        PLoS One. 2014; 9e102579
        • Igarashi Y
        • Nakatsu N
        • Yamashita T
        • et al.
        Open TG-GATEs: a large-scale toxicogenomics database.
        Nucleic Acids Res. 2015; 43: D921-D927https://doi.org/10.1093/nar/gku955
        • Alexander-Dann B
        • Pruteanu LL
        • Oerton E
        • et al.
        Developments in toxicogenomics: understanding and predicting compound-induced toxicity from gene expression data.
        Mol Omics. 2018; 14: 218-236
        • Chen M
        • Zhang M
        • Borlak J
        • Tong W.
        A decade of toxicogenomic research and its contribution to toxicological science.
        Toxicol Sci. 2012; 130: 217-228
        • Sneddon LU
        • Halsey LG
        • Bury NR.
        Considering aspects of the 3Rs principles within experimental animal biology.
        J Exp Biol. 2017; 220: 3007-3016
        • Nardini C.
        The ethics of clinical trials.
        Ecancermedicalscience. 2014; 8: 387
        • Watford S
        • Pham LL
        • Wignall J
        • Shin R
        • Martin MT
        • Friedman KP.
        ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses.
        Reprod Toxicol. 2019; 89: 145-158https://doi.org/10.1016/j.reprotox.2019.07.012
        • Sanz F
        • Pognan F
        • Steger-Hartmann T
        • et al.
        Legacy data sharing to improve drug safety assessment: the eTOX project.
        Nat Rev Drug Discov. 2017; 16: 811-812
        • Edgar R
        • Domrachev M
        • Lash AE.
        Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.
        Nucleic Acids Res. 2002; 30: 207-210
        • Athar A
        • Füllgrabe A
        • George N
        • et al.
        ArrayExpress update - from bulk to single-cell expression data.
        Nucleic Acids Res. 2019; 47: D711-D715
        • Banda JM
        • Evans L
        • Vanguri RS
        • Tatonetti NP
        • Ryan PB
        • Shah NH.
        A curated and standardized adverse drug event resource to accelerate drug safety research.
        Sci Data. 2016; 3160026
        • de Boissieu P
        • P de Boissieu
        • Kanagaratnam L
        • et al.
        Notoriety bias in a database of spontaneous reports: the example of osteonecrosis of the jaw under bisphosphonate therapy in the French national pharmacovigilance database.
        Pharmacoepidemiol Drug Saf. 2014; 23: 989-992https://doi.org/10.1002/pds.3622
        • Moore N
        • Hall G
        • Sturkenboom M
        • Mann R
        • Lagnaoui R
        • Begaud B.
        Biases affecting the proportional reporting ratio (PRR) in spontaneous reports pharmacovigilance databases: the example of sertindole.
        Pharmacoepidemiol Drug Saf. 2003; 12: 271-281https://doi.org/10.1002/pds.848
        • Benet LZ.
        Effect of route of administration and distribution on drug action.
        J Pharmacokinet Biopharm. 1978; 6: 559-585https://doi.org/10.1007/bf01062110
        • Karlsson Lind L
        • von Euler M
        • Korkmaz S
        • Schenck-Gustafsson K
        Sex differences in drugs: the development of a comprehensive knowledge base to improve gender awareness prescribing.
        Biol Sex Differ. 2017; 8: 32
        • van der Wouden CH
        • Cambon-Thomsen A
        • Cecchin E
        • et al.
        Implementing pharmacogenomics in Europe: design and implementation strategy of the ubiquitous pharmacogenomics consortium.
        Clin Pharmacol Ther. 2017; 101: 341-358
        • Abdelsalam NA
        • Ramadan AT
        • ElRakaiby MT
        • Aziz RK.
        Toxicomicrobiomics: the human microbiome vs. pharmaceutical, dietary, and environmental xenobiotics.
        Front Pharmacol. 2020; 11: 390
        • Thakkar S
        • Li T
        • Liu Z
        • Wu L
        • Roberts R
        • Tong W.
        Drug-induced liver injury severity and toxicity (DILIst): binary classification of 1279 drugs by human hepatotoxicity.
        Drug Discov Today. 2020; 25: 201-208
        • Chen M
        • Suzuki A
        • Thakkar S
        • Yu K
        • Hu C
        • Tong W.
        DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans.
        Drug Discov Today. 2016; 21: 648-653
        • Liu A
        • Walter M
        • Wright P
        • et al.
        Prediction and mechanistic analysis of Drug-Induced Liver Injury (DILI) based on chemical structure.
        2019https://doi.org/10.21203/rs.3.rs-16599/v1 (Published online)
        • Vall A
        • Sabnis Y
        • Shi J
        • Class R
        • Hochreiter S
        • Klambauer G.
        The promise of AI for DILI prediction.
        Front Artif Intell. 2021; 4https://doi.org/10.3389/frai.2021.638410
        • Sutherland JJ
        • Webster YW
        • Willy JA
        • et al.
        Toxicogenomic module associations with pathogenesis: a network-based approach to understanding drug toxicity.
        Pharmacogenomics J. 2018; 18: 377-390
        • Bender A
        • Cortes-Ciriano I.
        Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data.
        Drug Discov Today. 2021; 26: 1040-1052
        • Gry M
        • Rimini R
        • Strömberg S
        • et al.
        Correlations between RNA and protein expression profiles in 23 human cell lines.
        BMC Genomics. 2009; 10: 365
        • Pinches MD
        • Thomas R
        • Porter R
        • Camidge L
        • Briggs K.
        Curation and analysis of clinical pathology parameters and histopathologic findings from eTOXsys, a large database project (eTOX) for toxicologic studies.
        Regul Toxicol Pharmacol. 2019; 107104396
        • Lynch JJ
        • Van Vleet TR
        • Mittelstadt SW
        • Blomme EAG.
        Potential functional and pathological side effects related to off-target pharmacological activity.
        J Pharmacol Toxicol Methods. 2017; 87: 108-126https://doi.org/10.1016/j.vascn.2017.02.020
        • Lounkine E
        • Keiser MJ
        • Whitebread S
        • et al.
        Large-scale prediction and testing of drug activity on side-effect targets.
        Nature. 2012; 486: 361-367
        • Deaton AM
        • Fan F
        • Zhang W
        • Nguyen PA
        • Ward LD
        • Nioi P.
        Rationalizing secondary pharmacology screening using human genetic and pharmacological evidence.
        Toxicol Sci. 2019; 167: 593-603
        • Bendels S
        • Bissantz C
        • Fasching B
        • et al.
        Safety screening in early drug discovery: an optimized assay panel.
        J Pharmacol Toxicol Methods. 2019; 99106609
        • Bowes J
        • Brown AJ
        • Hamon J
        • et al.
        Reducing safety-related drug attrition: the use of in vitro pharmacological profiling.
        Nat Rev Drug Discov. 2012; 11: 909-922
        • Hauser AS
        • Chavali S
        • Masuho I
        • et al.
        Pharmacogenomics of GPCR drug targets.
        Cell. 2018; 172: 41-54.e19
        • Finckh A
        • Aronson MD.
        Cardiovascular risks of cyclooxygenase-2 inhibitors: where we stand now.
        Ann Intern Med. 2005; 142: 212-214
        • Lin L
        • Yee SW
        • Kim RB
        • Giacomini KM.
        SLC transporters as therapeutic targets: emerging opportunities.
        Nat Rev Drug Discov. 2015; 14: 543-560
        • Kepp O
        • Galluzzi L
        • Lipinski M
        • Yuan J
        • Kroemer G.
        Cell death assays for drug discovery.
        Nat Rev Drug Discov. 2011; 10: 221-237
        • Tcheremenskaia O
        • Battistelli CL
        • Giuliani A
        • Benigni R
        • Bossa C.
        In silico approaches for prediction of genotoxic and carcinogenic potential of cosmetic ingredients.
        Comput Toxicol. 2019; 11: 91-100
      1. data.europa.eu. Accessed February 25, 2022. https://data.europa.eu/data/datasets/database-pesticide-genotoxicity-endpoints?locale=en

        • Simmons SO
        • Fan CY
        • Ramabhadran R.
        Cellular stress response pathway system as a sentinel ensemble in toxicological screening.
        Toxicol Sci. 2009; 111: 202-225
        • Herholt A
        • Galinski S
        • Geyer PE
        • Rossner MJ
        • Wehr MC.
        Multiparametric assays for accelerating early drug discovery.
        Trends Pharmacol Sci. 2020; 41: 318-335
        • Iorio F
        • Rittman T
        • Ge H
        • Menden M
        • Saez-Rodriguez J.
        Transcriptional data: a new gateway to drug repositioning?.
        Drug Discov Today. 2013; 18: 350-357
        • Harrill J
        • Shah I
        • Setzer RW
        • et al.
        Considerations for strategic use of high-throughput transcriptomics chemical screening data in regulatory decisions.
        Curr Opin Toxicol. 2019; 15: 64-75
        • Liu Z
        • Zhu L
        • Thakkar S
        • Roberts R
        • Tong W.
        Can transcriptomic profiles from cancer cell lines be used for toxicity assessment?.
        Chem Res Toxicol. 2020; 33: 271-280
        • Lowe R
        • Shirley N
        • Bleackley M
        • Dolan S
        • Shafee T.
        Transcriptomics technologies.
        PLoS Comput Biol. 2017; 13e1005457
        • Subramanian A
        • Narayan R
        • Corsello SM
        • et al.
        A next generation connectivity map: l1000 platform and the first 1,000,000 profiles.
        Cell. 2017; 171: 1437-1452.e17
        • Ye C
        • Ho DJ
        • Neri M
        • et al.
        DRUG-seq for miniaturized high-throughput transcriptome profiling in drug discovery.
        Nat Commun. 2018; 9: 4307
        • Yeakley JM
        • Shepard PJ
        • Goyena DE
        • VanSteenhouse HC
        • McComb JD
        • Seligmann BE.
        A trichostatin A expression signature identified by TempO-Seq targeted whole transcriptome profiling.
        PLoS One. 2017; 12e0178302
        • Srivatsan SR
        • McFaline-Figueroa JL
        • Ramani V
        • et al.
        Massively multiplex chemical transcriptomics at single-cell resolution.
        Science. 2020; 367: 45-51
        • Bray MA
        • Singh S
        • Han H
        • et al.
        Cell painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes.
        Nat Protoc. 2016; 11: 1757-1774
        • Chandrasekaran SN
        • Ceulemans H
        • Boyd JD
        • Carpenter AE.
        Image-based profiling for drug discovery: due for a machine-learning upgrade?.
        Nat Rev Drug Discov. 2021; 20: 145-159
        • Seal S
        • Carreras-Puigvert J
        • Trapotsi MA
        • Yang H
        • Spjuth O
        • Bender A.
        Integrating cell morphology with gene expression and chemical structure to aid mitochondrial toxicity detection.
        Commun Biol. 2022; 5: 858
        • Lamb J
        • Crawford ED
        • Peck D
        • et al.
        The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease.
        Science. 2006; 313: 1929-1935
        • Bray MA
        • Gustafsdottir SM
        • Rohban MH
        • et al.
        A dataset of images and morphological profiles of 30 000 small-molecule treatments using the cell painting assay.
        Gigascience. 2017; 6: 1-5
      2. Cox MJ, Jaensch S, Van de Waeter J, et al. Tales of 1,008 small molecules: phenomic profiling through live-cell imaging in a panel of reporter cell lines. doi:10.1101/2020.03.13.990093

        • Richard AM
        • Judson RS
        • Houck KA
        • et al.
        ToxCast chemical landscape: paving the road to 21st century toxicology.
        Chem Res Toxicol. 2016; 29: 1225-1251
        • Klaeger S
        • Heinzlmeir S
        • Wilhelm M
        • et al.
        The target landscape of clinical kinase drugs.
        Science. 2017; 358https://doi.org/10.1126/science.aan4368
        • Shoemaker RH.
        The NCI60 human tumour cell line anticancer drug screen.
        Nat Rev Cancer. 2006; 6: 813-823
        • Ashburner M
        • Ball CA
        • Blake JA
        • et al.
        Gene ontology: tool for the unification of biology. The gene ontology consortium.
        Nat Genet. 2000; 25: 25-29
        • Jassal B
        • Matthews L
        • Viteri G
        • et al.
        The reactome pathway knowledgebase.
        Nucleic Acids Res. 2020; 48: D498-D503
        • Martens M
        • Ammar A
        • Riutta A
        • et al.
        WikiPathways: connecting communities.
        Nucleic Acids Res. 2021; 49: D613-D621
        • Liberzon A
        • Subramanian A
        • Pinchback R
        • Thorvaldsdóttir H
        • Tamayo P
        • Mesirov JP.
        Molecular signatures database (MSigDB) 3.0.
        Bioinformatics. 2011; 27: 1739-1740
        • Kanehisa M
        • Furumichi M
        • Tanabe M
        • Sato Y
        • Morishima K.
        KEGG: new perspectives on genomes, pathways, diseases and drugs.
        Nucleic Acids Res. 2017; 45: D353-D361
        • Blum M
        • Chang HY
        • Chuguransky S
        • et al.
        The InterPro protein families and domains database: 20 years on.
        Nucleic Acids Res. 2021; 49: D344-D354
        • Türei D
        • Korcsmáros T
        • Saez-Rodriguez J.
        OmniPath: guidelines and gateway for literature-curated signaling pathway resources.
        Nat Methods. 2016; 13: 966-967https://doi.org/10.1038/nmeth.4077
        • Langfelder P
        • Horvath S.
        WGCNA: an R package for weighted correlation network analysis.
        BMC Bioinf. 2008; 9https://doi.org/10.1186/1471-2105-9-559
        • Szklarczyk D
        • Gable AL
        • Nastou KC
        • et al.
        The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets.
        Nucleic Acids Res. 2021; 49: D605-D612https://doi.org/10.1093/nar/gkaa1074
        • Ochoa D
        • Hercules A
        • Carmona M
        • et al.
        Open targets platform: supporting systematic drug–target identification and prioritisation.
        Nucleic Acids Res. 2021; 49: D1302-D1310https://doi.org/10.1093/nar/gkaa1027
        • Mattingly CJ
        • Colby GT
        • Forrest JN
        • Boyer JL.
        The comparative toxicogenomics database (CTD).
        Environ Health Perspect. 2003; (Published online)https://doi.org/10.1289/txg.6028
        • Kuhn M
        • Letunic I
        • Jensen LJ
        • Bork P.
        The SIDER database of drugs and side effects.
        Nucleic Acids Res. 2016; 44: D1075-D1079https://doi.org/10.1093/nar/gkv1075
        • Tatonetti NP
        • Ye PP
        • Daneshjou R
        • Altman RB.
        Data-driven prediction of drug effects and interactions.
        Sci Transl Med. 2012; 4: 125ra31
        • Piñero J
        • Bravo À
        • Queralt-Rosinach N
        • et al.
        DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants.
        Nucleic Acids Res. 2017; 45: D833-D839
        • Pastor M
        • Quintana J
        • Sanz F.
        Development of an infrastructure for the prediction of biological endpoints in industrial environments. Lessons learned at the eTOX project.
        Front Pharmacol. 2018; 9: 1147
      3. Smalheiser NR, Bonifield G. Two similarity metrics for medical subject headings (MeSH): an aid to biomedical text mining and author name disambiguation. doi:10.1101/039008

        • Vulliard L
        • Menche J.
        Complex networks in health and disease.
        Syst Med. 2021; (Published online): 26-33https://doi.org/10.1016/b978-0-12-801238-3.11640-x
        • Schaffer LV
        • Ideker T.
        Mapping the multiscale structure of biological systems.
        Cell Syst. 2021; 12: 622-635
        • Bertoni M
        • Duran-Frigola M
        • Badia-I-Mompel P
        • et al.
        Bioactivity descriptors for uncharacterized chemical compounds.
        Nat Commun. 2021; 12: 3932
        • Wu Z
        • Li W
        • Liu G
        • Tang Y.
        Network-based methods for prediction of drug-target interactions.
        Front Pharmacol. 2018; 9: 1134
        • Liu J
        • Patlewicz G
        • Williams AJ
        • Thomas RS
        • Shah I.
        Predicting organ toxicity using in vitro bioactivity data and chemical structure.
        Chem Res Toxicol. 2017; 30: 2046-2059
        • Xiong GL
        • Zhao Y
        • Liu L
        • et al.
        Computational bioactivity fingerprint similarities to navigate the discovery of novel scaffolds.
        J Med Chem. 2021; 64: 7544-7554
        • Chen B
        • Greenside P
        • Paik H
        • Sirota M
        • Hadley D
        • Butte AJ.
        Relating chemical structure to cellular response: an integrative analysis of gene expression, bioactivity, and structural data across 11,000 compounds.
        CPT Pharmacometrics Syst Pharmacol. 2015; 4: 576-584
        • Musa A
        • Tripathi S
        • Dehmer M
        • Yli-Harja O
        • Kauffman SA
        • Emmert-Streib F.
        Systems pharmacogenomic landscape of drug similarities from LINCS data: drug association networks.
        Sci Rep. 2019; 9: 7849
        • Baillif B
        • Wichard J
        • Méndez-Lucio O
        • Rouquié D.
        Exploring the use of compound-induced transcriptomic data generated from cell lines to predict compound activity toward molecular targets.
        Front Chem. 2020; 0https://doi.org/10.3389/fchem.2020.00296
        • Wang Z
        • Clark NR
        • Ma'ayan A.
        Drug-induced adverse events prediction with the LINCS L1000 data.
        Bioinformatics. 2016; 32: 2338-2345
        • Gardiner LJ
        • Carrieri AP
        • Wilshaw J
        • Checkley S
        • Pyzer-Knapp EO
        • Krishna R
        Using human in vitro transcriptome analysis to build trustworthy machine learning models for prediction of animal drug toxicity.
        Sci Rep. 2020; 10: 9522
        • De Wolf H
        • Cougnaud L
        • Van Hoorde K
        • et al.
        High-throughput gene expression profiles to define drug similarity and predict compound activity.
        Assay Drug Dev Technol. 2018; 16: 162-176
        • Méndez-Lucio O
        • Baillif B
        • Clevert DA
        • Rouquié D
        • Wichard J.
        De novo generation of hit-like molecules from gene expression signatures using artificial intelligence.
        Nat Commun. 2020; 11: 10
        • Verbist B
        • Klambauer G
        • Vervoort L
        • et al.
        Using transcriptomics to guide lead optimization in drug discovery projects: lessons learned from the QSTAR project.
        Drug Discov Today. 2015; 20: 505-513
        • Kohonen P
        • Parkkinen JA
        • Willighagen EL
        • et al.
        A transcriptomics data-driven gene space accurately predicts liver cytopathology and drug-induced liver injury.
        Nat Commun. 2017; 8https://doi.org/10.1038/ncomms15932
        • Simm J
        • Klambauer G
        • Arany A
        • et al.
        Repurposing high-throughput image assays enables biological activity prediction for drug discovery.
        Cell Chem Biol. 2018; 25: 611-618.e3
        • Lapins M
        • Spjuth O.
        Evaluation of gene expression and phenotypic profiling data as quantitative descriptors for predicting drug targets and mechanisms of action.
        bioRxiv, 2019580654https://doi.org/10.1101/580654 (Published online July 3)
        • Trapotsi MA
        • Mervin LH
        • Afzal AM
        • et al.
        Comparison of chemical structure and cell morphology information for multitask bioactivity predictions.
        J Chem Inf Model. 2021; 61: 1444-1456
        • Way GP
        • Kost-Alimova M
        • Shibue T
        • et al.
        Predicting cell health phenotypes using image-based morphology profiling.
        Mol Biol Cell. 2021; 32: 995-1005
        • Seal S
        • Yang H
        • Vollmers L
        • Bender A.
        Comparison of cellular morphological descriptors and molecular fingerprints for the prediction of cytotoxicity- and proliferation-related assays.
        Chem Res Toxicol. 2021; 34: 422-437
        • Moshkov N
        • Becker T
        • Yang K.
        • et al.
        Predicting compound activity from phenotypic profiles and chemical structures.
        bioRxiv, 2022https://doi.org/10.1101/2020.12.15.422887v4 (Published online April 102020.12.15.422887)
        • Chavan S
        • Scherbak N
        • Engwall M
        • Repsilber D.
        Predicting chemical-induced liver toxicity using high-content imaging phenotypes and chemical descriptors: a random forest approach.
        Chem Res Toxicol. 2020; 33: 2261-2275
        • Nassiri I
        • McCall MN.
        Systematic exploration of cell morphological phenotypes associated with a transcriptomic query.
        Nucleic Acids Res. 2018; 46 (-e116): e116
        • Way GP
        • Natoli T
        • Adeboye A
        • et al.
        Morphology and gene expression profiling provide complementary information for mapping cell state.
        bioRxiv, 2021https://doi.org/10.1101/2021.10.21.465335 (Published online October 222021.10.21.465335)
        • Wawer MJ
        • Li K
        • Gustafsdottir SM
        • et al.
        Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling.
        Proc Natl Acad Sci USA. 2014; 111: 10911-10916
      4. Users’ handbook supplement to the guidance document for developing and assessing adverse outcome pathways.
        OECD, 2018https://doi.org/10.1787/5jlv1m9d1g32-en (Published online February 14)
        • Ankley GT
        • Bennett RS
        • Erickson RJ
        • et al.
        Adverse outcome pathways: a conceptual framework to support ecotoxicology research and risk assessment.
        Environ Toxicol Chem. 2010; 29: 730-741
        • Knapen D
        • Angrish MM
        • Fortin MC
        • et al.
        Adverse outcome pathway networks I: development and applications.
        Environ Toxicol Chem. 2018; 37: 1723-1733https://doi.org/10.1002/etc.4125
        • Fedak KM
        • Bernal A
        • Capshaw ZA
        • Gross S.
        Applying the Bradford Hill criteria in the 21st century: how data integration has changed causal inference in molecular epidemiology.
        Emerg Themes Epidemiol. 2015; 12https://doi.org/10.1186/s12982-015-0037-4
        • Oki NO
        • Nelms MD
        • Bell SM
        • Mortensen HM
        • Edwards SW.
        Accelerating adverse outcome pathway development using publicly available data sources.
        Curr Environ Health Rep. 2016; 3: 53-63https://doi.org/10.1007/s40572-016-0079-y
        • Oki NO
        • Edwards SW.
        An integrative data mining approach to identifying adverse outcome pathway signatures.
        Toxicology. 2016; 350-352: 49-61
        • Bell SM
        • Angrish MM
        • Wood CE
        • Edwards SW.
        Integrating publicly available data to generate computationally predicted adverse outcome pathways for fatty liver.
        Toxicol Sci. 2016; 150: 510-520https://doi.org/10.1093/toxsci/kfw017
        • Bradley G
        • Barrett SJ.
        CausalR: extracting mechanistic sense from genome scale data.
        Bioinformatics. 2017; 33: 3670-3672
        • Liu A
        • Trairatphisan P
        • Gjerga E
        • Didangelos A
        • Barratt J
        • Saez-Rodriguez J.
        From expression footprints to causal pathways: contextualizing large signaling networks with CARNIVAL.
        2019https://doi.org/10.1101/541888 (Published online)
        • Dugourd A
        • Kuppe C
        • Sciacovelli M
        • et al.
        Causal integration of multi-omics data with prior knowledge to generate mechanistic hypotheses.
        Mol Syst Biol. 2021; 17: e9730
        • Liu A
        • Han N
        • Munoz-Muriedas J
        • Bender A.
        Deriving time-concordant event cascades from gene expression data: a case study for Drug-Induced Liver Injury (DILI).
        PLoS Comput Biol. 2022; 18e1010148
        • Spinu N
        • Cronin MTD
        • Enoch SJ
        • Madden JC
        • Worth AP.
        Quantitative adverse outcome pathway (qAOP) models for toxicity prediction.
        Arch Toxicol. 2020; 94: 1497-1510
        • Perkins EJ
        • Gayen K
        • Shoemaker JE
        • et al.
        Chemical hazard prediction and hypothesis testing using quantitative adverse outcome pathways.
        ALTEX. 2019; 36: 91-102
        • Burgoon LD
        • Angrish M
        • Garcia-Reyero N
        • Pollesch N
        • Zupanic A
        • Perkins E.
        Predicting the probability that a chemical causes steatosis using Adverse Outcome Pathway Bayesian Networks (AOPBNs).
        Risk Anal. 2020; 40: 512-523
      5. Galeano D, Li S, Gerstein M et al. Predicting the frequencies of drug side effects. Nat Commun 2020;11,4575.