If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Recent efforts for increasing the success in drug discovery focus on an early, massive, and routine mechanistic and/or kinetic characterization of drug-target engagement as part of a design-make-test-analyze strategy. From an experimental perspective, many mechanistic assays can be translated into a scalable format on automation platforms and thereby enable routine characterization of hundreds or thousands of compounds.
However, now the limiting factor to achieve such in-depth characterization at high-throughput becomes the quality-driven data analysis, the sheer scale of which outweighs the time available to the scientific staff of most labs. Therefore, automated analytical workflows are needed to enable such experimental scale-up. We have implemented such a fully automated workflow in Genedata Screener for time-dependent ligand-target binding analysis to characterize non-equilibrium inhibitors. The workflow automates Quality Control (QC) / data modelling and decision-making process in a staged analysis: (1) quality control of raw input data—fluorescence signal-based progress curves — featuring automated rejection of unsuitable measurements; (2) automated model selection — one-step versus two-step binding model — using statistical methods and biological validity rules; (3) result visualization in specific plots and annotated result tables, enabling the scientist to review large result sets efficiently and, at the same time, to rapidly identify and focus on interesting or unusual results; (4) an interactive user interface for immediate adjustment of automated decisions, where necessary.
Applying this workflow to first-pass, high-throughput kinetic studies on kinase projects has allowed us to surmount previously rate-limiting manual analysis steps and boost productivity; and is now routinely embedded in a biopharma discovery research process.
Drug development programs are under pressure to increase their success rate to translate into clinically validated therapeutics. The process involves target identification and validation, identification of hits in high-throughput screening, design-make-test-analyze cycles enabling the optimization of the hits to leads and the final selection of drug candidates, preclinical safety studies and clinical trials. To reduce the probability of failure at any one of these stages, several credible approaches have been proposed [
]. For example, in hit discovery, the insight now is that a “bigger is better” approach - brute-force screening and characterization of more and more compounds -, does not lead to higher success rates [
]. To a large extent, improvements in success rate came from better target validation, curation of screening libraries, employing assay platforms that are robust in terms of accuracy and precision and are specific, triaging hits based on the experimental results which are produced by a growing set of in vitro and in vivo assays as well as in silico predictions [
]. However, an important cornerstone for increasing the success rate of a drug discovery program is the earliest possible inclusion of assays providing mechanistic and kinetic insights into drug-target engagement to complement equilibrium potency information [
One prime example of such predictive mechanistic information is drug residence time on a particular target of interest. This is the time a small-molecule spends on the target of interest and is expressed as the reciprocal of the dissociation rate constant. Often, increasing the residence time increases the therapeutic window[
], allowing lower dosing, yielding sustained mode of action, lower probability of mechanism-based toxicity and reduced off-target interaction, in turn reducing undesirable toxicities[
]. The residence time is contextual to the half-life of the protein and the pharmacokinetic clearance of the compound from systemic circulation and hence needs to be interpreted within those boundary conditions [
]. The optimization of the affinity or the binding kinetics of the drug for the target of interest is done in iterative cycles of structure-activity/kinetic relationship (SAR/SKR) based medicinal chemistry. In each of these optimization steps lead-target engagement parameters must be both monitored and analyzed properly, which analysis becomes particularly challenging for non-equilibrium modalities of inhibition as discussed below [
During advanced stages of lead optimization, affinity gains beyond 0.1 μM are usually driven by reducing dissociation rates and increasing residence times of the molecule on the target of interest. This increase in residence time is achieved by an induced reorganization of the binding site (isomerization). This isomerization process ties the reduction in the dissociation rate to a reduction in the association rate due to the kinetic barrier that isomerization poses. Inhibitors falling within these optimization zones are usually known as slow-onset inhibitors [
Insights into the slow-onset tight-binding inhibition of Escherichia coli dihydrofolate reductase: detailed mechanistic characterization of pyrrolo [3,2-f] quinazoline-1,3-diamine and its derivatives as novel tight-binding inhibitors.
], showing slow association rates on the time scales of the assay of the catalytic turnover of the substrate by the enzyme. A few prominent examples of slow-onset inhibitors are methotrexate, captopril and DuP69716. Thus, during later stages in the drug discovery cascade, when the lead affinities are in the low nanoMolar to picoMolar range, all lead molecules invariably show non-equilibrium slow-onset behavior towards their target of interest.
As the dissociation rate approaches zero, inhibition appears irreversible, another prominent example of non-equilibrium inhibition. Here, a covalent adduct is formed between the small molecule with the protein target of interest, often via a cysteine residue for protein kinases. The electrophilic group on the small-molecule (examples are acrylamide, allenamide and other α,β-unsaturated carbonyl compounds) reacts with nucleophilic cysteine thiol to form an irreversible covalent bond. This increases the residence time of the small molecule on the target of interest to infinity, in turn decoupling the pharmacokinetics (PK) from the pharmacodynamics (PD) and leading to sustained target engagement limited only by the half-life of the protein target. A few examples of irreversible inhibitors include aspirin, penicillin, proton pump inhibitors (PPIs) (omeprazole and lansoprazole), EGFR inhibitors such as osimertinib (tagrisso) [
]. Irreversible inhibitors can also be employed to disrupt protein-protein interactions (PPI) by targeting a specific cysteine. A few specific examples of these include Kelch-like ECH-associated protein 1 (Keap1) and nuclear factor erythroid 2 related factor 2 (Nrf2) [
] . Some slow-onset inhibitors with very low dissociation rates (on the time scale of the assay) may appear and behave like irreversible inhibitors and can only be distinguished from these by orthogonal detection methods, such as mass-spectrometry or other analytical techniques such as NMR, HPLC, radiolabelled ligand or spectroscopic methods [
]. Both non-equilibrium modalities result in sustained target engagement at low inhibitor concentration as well as non-dilution of the inhibitor-target interaction potency in face of competition with endogenous ligands/substrates.
To shorten lead optimization projects by feeding more molecules through information-rich assays and chemical designs, different groups in drug discovery have been adopting high-throughput data analysis pipelines to accelerate the pace of their output[
]. However, such initiative has been lacking especially in the study of non-equilibrium modalities of inhibition, where the analysis and annotation continue to be labor-intensive and time-consuming. Such analysis becomes rate-limiting with the increased assay throughput supporting today's SAR/SKR-guided medicinal chemistry optimization in early drug discovery.
In the work presented here, we developed a set of analytical methods and criteria and embedded them with a data analysis automation workflow for slow-binding and irreversible inhibition modalities. The workflow automates in Genedata Screener the data QC, experimental condition validation, selection of appropriate model and parameter fitting. It has been tested and validated for the characterization of irreversible inhibition in several kinase projects.
2. Materials and methods
2.1 Theory of slow binding inhibition
Compounds are called slow-binding inhibitors when the rate of association slows down with the slowing rate of dissociation on the time scales of the assay. Additionally, slow-onset of inhibition can also be occassionally seen with slow rate for k4 even in the absence of accompanying reductions in k3 or k2 (scheme for two-step mechanism shown below).
A two-step reaction mechanism assumes a rapid formation of an EI complex that undergoes a slow isomerization to an EI* complex:
If the binding of the final EI* complex is irreversible covalent ("irreversible inhibition”), k4 equals zero and k3 is called kinact. The pseudo-equilibrium dissociation constant includes the isomerization step and is defined as KI* = KI (k4 / (k3 + k4)), where KI = (k2+k3)/k1). This is distinct compared to an equilibrium dissociation constant which is defined as Ki= k2/k1.
A one-step reaction mechanism is a special case where k3 >> k2 or k3 << k2 [
A novel high-throughput FLIPR tetra–based method for capturing highly confluent kinetic data for structure–kinetic relationship guided early drug discovery.
], such that the formation of the EI complex is assumed to happen in a single slow step:
If the binding of the final EI complex is covalent, k2 equals zero and k1 is called kinact/KI.
When product formation is measured during the reaction, yielding a "progress curve”, its time dependent behaviour can be represented using Eq. (1):
(1)
Where St is the signal measured at time t and S0 is the background signal at the beginning of the reaction. vi and vs are the initial and steady state product formation velocities, and kobs is the rate constant for the transition from initial inhibition to steady-state inhibition.
The parameters vi, vs and kobs depend on the characteristics of the reaction mechanism described above. In a one-step process, vi is independent of inhibitor concentration, vs has a nonlinear and kobs has a linear dependency on the inhibitor concentration. In a two-step process all parameters have a nonlinear dependency on inhibitor concentration. This dependency can be described for a simple competitive slow-binding inhibition one-step reaction with
(2)
And for a two-step reaction with
(3)
Where VM and KM are the maximum velocity and the Michaelis constant of the enzyme-substrate reaction in the absence of inhibitor, respectively.
The progress curves and their corresponding plots of kobs vs. inhibitor concentration are illustrated in Fig. 1 both for one-step and two-step binding mechanisms.
Fig. 1Simulated data using Kintek Explorer. (A) Primary progress curves for enzyme catalyzed reaction in the presence of several concentration of inhibitors. [E] = 1 nM, [S] = 20 µM, Km = 20 µM, kcat = 0.1 s−1, [I] = 0.012- 12 µM (2-fold series dilution), k1 = 10 µM−1s−1, k2 = 10 s−1, k3 = 0.01 s-1, k4 = 0 s−1. (B) Replot of kobs vs. [I] generated from progress curves in Figure A. (C) [E] = 1 nM, [S] = 1000 µM, Km = 20 µM, kcat = 0.1 s−1, [I] = 0.005- 1.28 µM (2-fold series dilution), k1 = 10 µM−1s−1, k2 = 0.01 s−1, k3 = 0.1 s-1, k4 = 0 s−1. (D) Replot of kobs vs. [I] generated from progress curves in Figure C.
The goal of the analysis is to use a set of recorded progress curves for different inhibitor concentrations to infer the underlying reaction mechanism and the dissociation constants Ki and Ki* in Eqs. (2) and (3). An overview of the analytical workflow is presented in Fig. 2. The workflow is initiated by an analysis of the progress curve(s) without inhibitor to determine the time window that must be used for the analysis. The last measured data point within this window used as normalization reference to determine relative inhibition per inhibitor concentration. This analysis serves (a) as a visual QC and (b) to instantiate automatic QC of progress curves by validating the experimental condition for a valid progress curve fit (Eq. (1)) and exclude inhibitors that are either too strong or too weak, or curves that show no clear indication of a time-dependent inhibition. Once these conditions are validated, the one- and two-step models are fitted in parallel, the statistical and biological validity of each result is assessed, and the best valid model is selected.
Fig. 2Flow chart of the workflow. Analysis of non-equilibrium slow-onset or irreversible modalities of inhibition.
The reaction equations in Section 2.1 are only valid as long as the free substrate concentration shows no considerable decrease. Beyond this regime, even progress curves without inhibitor, i.e., neutral controls, show a decrease in the rate of product formation, which would confound the analysis.
Therefore, we restrict the analysis window to this linear regime of the reaction. This process is automated for each plate by analyzing the progress curves of the neutral controls. The algorithm is based on the take-off-point calculation in Tichopad et al. [
]. For each progress curve, the initial 25% of the data is used to perform a linear fit and the standard error of the intercept and slope are used to define a linearity band. Then, starting from the last data point used in this fit, if the next three points are outside of this band the point is marked as the end of the linear regime (see Fig. 3). Otherwise, the linear fit is repeated including the next point, new linearity bands are calculated, and the process is repeated iteratively. Finally, the median window calculated over the time windows from all neutral controls in the plate is used in further analysis steps.
2.2.2 Quality control
For determining the effect of an inhibitor on the reaction, concentration-response curves are constructed as follows: The signal of each progress curve at the last frame of the analysis window is normalized to “percent-of-control” by the median of the neutral control signals at the beginning and the end of the analysis window. These percent-of-control signals are subjected to the following thresholds (see Fig. 4A):
1. The reliable signal range: The range where the progress curve can be clearly distinguished from zero or full inhibition. In a well-designed experiment the signal for most progress curves should lie within this range, so that progress curves display a clear transition from initial to steady state velocity and allow a reliable estimation of the kinetic reaction parameters.
2. The noise threshold: The minimal signal that a progress curve must reach to be considered for analysis. Below this threshold a high fraction of signal comes from recording noise and is hard to distinguish from artifacts.
Fig. 4Progress curve Quality Control. A) The signal plot is constructed by using the last frame within the analysis window. Green lines mark the reliable range of signal change, and the black line marks the noise threshold. B) Automatic outlier masking from progress curves not following the expected monotonic decrease as a function of concentration. Curves with reported signal below the noise threshold are also masked.
The threshold levels are defined in the user interface and have a default value of 20-80% for the reliable signal range and 7.5% for the noise thresholds.
2.2.3 Outlier detection
While one can visually spot outlier data points and remove those from further analysis, such manual outlier detection does not scale and is prone to bias. For automating outlier detection, we utilize the monotonically decreasing relation between percent-of-control signals derived from progress curves and inhibitor concentration. More specifically, parameters describing a progress curve in Eq. (1) have a direct dependency on inhibitor concentration as shown in Eqs. (2)-(3), leading to an inverse relationship between amount of product generated and inhibitor concentration.
We identify outlier progress curves in the following way (see Fig. 4B): For each point, the two points with the closest concentration in each direction are used to establish a local trend in the signal. The progress curve belonging to the central point is automatically masked if that point significantly deviates from this monotonically decreasing trend. A masked progress curve is excluded from any downstream calculations, but the curve remains visible as a dashed line for visual review.
In addition, progress curves showing a percent-of-control signal below the noise threshold are automatically masked. This removes progress curves where complete inhibition occurs, and which therefore provide little information about the kinetic parameters of the reaction but can bias the fit model by artificially lowering its total error.
2.2.4 Validation of the experimental condition
Each tested inhibitor is evaluated by the QC features as described in Section 2.2.2. and placed into the following five categories, based on the reliable signal range defined above:
•
Too Strong Inhibitor: Less than N unmasked points above the lower threshold.
•
Weak Inhibitor: Less than N unmasked points below the upper threshold.
•
Too Few Reliable Traces: Less than N unmasked points within the reliable range (see Fig. 5A and B). In these cases, it is recommended to modify the concentration range of the inhibitor and repeat the experiment.
Fig. 5Example curves illustrating the validation of the experimental condition. A-B) Based on the number of points within the reliable signal range, an inhibitor can be marked as either too strong or weak. C) If all progress curves within the analysis window are within a linear range, the equilibrium is marked as reversible.
Equilibrium Reversible: Following Eq. (1), those progress curves with endpoints within the reliable range should show a nonlinear behavior, which then allows to estimate the parameters of the model. All progress curves are therefore tested for linearity using the approach described in Section 2.2.1. If the last linear frame for all curves lies outside the analysis window, the inhibitor is assigned to this category and the model estimation is interrupted (see Fig. 5C).
•
Passed QC: At least N unmasked points lie within the reliable range and at least one curve shows nonlinear behavior within the analysis range. In this case the inhibitor has “Passed” the QC and the progress curves above the noise threshold are used for subsequent model fitting.
The number of required points N is defined in the user interface and has a default value of 5.
2.2.5 Model fitting
A precise estimation of the kinetic rate constants is achieved by fitting a global kinetic model that simultaneously reconstructs the (unmasked) measured progress curves in the presence of an inhibitor. Each measured progress curve can be represented using Eq. (1) with the parameters vi, vs and kobs. For competitive inhibition, each parameter in turn depends on the characteristics of the enzymatic reaction, the substrate and inhibitor concentration, and the kinetic parameters and inhibition constants, as shown in Eqs. (2)-(3).
Using the notation
the equations can be rewritten without loss of generality for a one-step process as:
(4)
And for a two-step process as:
(5)
This rewrite reduces the description of the measured progress curves to a 5-parameter model for one-step inhibition and a 6-parameter model for the two-step inhibition. In case of irreversible inhibition, vs is fixed to zero, removing one free parameter from the one- and two-step models, KiApp and Ki*App, respectively. For situations where the instrumentation only has a small delay (e.g., a few seconds) between the start of the reaction and the beginning of the recording from each well, the variability in the initial signal can be subtracted from the progress curve and the parameter S0 can be fixed to zero. Finally, as vmax represents the speed of the reaction in the absence of inhibition, the median slope of the neutral controls in the plate is used to fix this parameter.
The remaining free parameters of the models are fitted using the signal of the unmasked progress curves within the temporal analysis window in an iterative process using a gradient descent method [
]. In brief, in each iteration a set of parameters is produced and the progress curves for the measured inhibition concentrations are simulated using Eqs. (1) and Eqs. (4)-(5). The total squared deviation between the generated and the measured curves is calculated, defining the current error of the fit, which is used as the objective value in iterative optimization of the tested parameter set. The process stops when the current error is below a defined threshold or when a maximal number of iterations is reached.
2.2.6 Model selection
The model selection provides an automated assessment whether the reaction follows a one- or two-step process by using a combination of heuristics and statistics. The validity assessment of the result set per model is based on the quality of the fitted parameters (the coefficient of variation of both the parameters KiApp and Ki*App must be below 0.3) and the fit residuals (the square root of the total error divided by the degrees of freedom must be less than 1.5 times the intrinsic noise of the measurement). To invalidate overfitted models, the parameters KiApp and Ki*App are constrained to be above 1/100 times the lowest and below 100 times the largest tested inhibitor concentration. All numerical values for the validity thresholds can be adjusted in the user interface. In the event just one of the models is valid, its parameters are selected for result review. Otherwise, model selection is performed by applying the Bayesian information criterion (BIC) [
], a statistical method for model selection where a lower BIC implies a better model fit to the data.
2.2.7 Result review
To augment the automated QC and data analysis workflow, a set of diagnostic plots is provided for scientist's review (see Fig. 6). Including, a goodness of fit plot - where simulated traces from the fitted parameters are overlaid to the measured progress curves, a concentration-response chart - where the percent-of-control signals of the progress curves are plotted against the inhibitor concentration and, finally, plots of vi, vs and kobs against inhibitor concentration using the selected model which are calculated to give an overview of the underlying process.
Fig. 6Visual review - Plots. A) Overlay of fitted progress curves, B) Expected signal at last frame within the analysis window, C) The fitted initial velocity vi normalized by the velocity of the neutral controls v0 as a function of concentration, D) The fitted rate constant kobs as a function of concentration.
Irreversible inhibitors of two kinase targets (Kinase 1, Kinase 2) were selected from the AZ compound collection. Assay ready compound plates (ARP) were prepared by the compound management unit. The compounds were dispensed acoustically in a range of different volumes into 384-well black, clear bottomed microtiter plates (Corning 3544) to create a 16-point concentration response curve. All wells were backfilled with the appropriate volume of DMSO to 1% v/v final concentration in a 10 µl final assay volume. All ARPs included neutral controls (no inhibition, 1% v/v DMSO) and inhibitor controls (100 % inhibition, 20 µM of kinase-specific inhibitor).
2.3.2 Kinase assays and data acquisition
Chelation enhanced fluorescence (CHEF), as implemented by AssayQuant technology (AQT) (http://www.assayquant.com/), was used to quantify real-time phosphorylation of the sox peptide and to assess kinetic activity in a continuous and medium throughput manner [
] (Fig. 7). 5 µl/well of the substrate mix (AQT peptide and ATP) was dispensed into ARPs followed by 5 µl/well of protein mix (Kinase 1 mutant or WT) using Certus FLEX, a liquid dispenser. The final buffer composition for the assay run was 20 mM HEPES pH 7.5, 0.005% Brij 35, 0.5 mg/ml BSA, 5 mM MgCl2 and 5 % glycerol for both the mutant and WT Kinase 1. The plates were briefly centrifuged at 300 ×g prior to measuring fluorescence intensity (FI) using FlexStation® 3 Multi-Mode Microplate Reader with kinetic measurements taken every 70 seconds for 103 reads. Final assay concentrations of kinase 1 WT/Mut, AQT peptide, ATP and DMSO were 50 nM/15 nM, 10 µM, 500 µM and 1 %, respectively. Fluorescent intensity was measured at 360 nm ± 20 nm excitation (Ex) / 545 nm ± 30 nm emission (Em), gain: 40, exposure time: 30 seconds, excitation settings: 80% with read time intervals every 70 seconds for 103 reads. The incorporated SoftMax® Pro software was used to generate the progress curve. Preliminary visualization of progress curves was done using SoftMax software associated with FlexStation®3. The progress curves were analyzed using the Genedata Screener Mechanistic Analysis Package as described above. Neutral (100 % activity) and inhibitor (0% activity) controls were used to assess signal window and assay linearity. Full automation of data import and analysis significantly reduced the time required for data analysis, contributing to the goal of delivering high-throughput data sets to the chemists in a timely fashion for SAR optimizations.
Fig. 7Schematic illustration of the CHEF method implemented by AssayQuant Inc.
Using sox-based fluorescence quantum yield gain as a function of phosphorylation (of the hydroxyl sidechain of serine or threonine moiety) and magnesium chelation. The dotted lines indicate coordination bonds, and the sphere indicates the magnesium ion.
Typically during drug optimization, a reversible inhibitor is optimized based on its potency increase as assessed by the pharmacological parameter IC50 (a substrate concentration- and time-dependent parameter) or, in some instances, the Ki (an equilibrium substrate concentration- and time-independent parameter). However, slow-onset and irreversible inhibition can often involve an isomerization step of a protein-inhibitor complex (kiso) or a step that involves irreversible chemical modification of enzyme by inhibitor (kinact) as has been discussed in the Methods. This confers a time-dependent aspect to the inhibition with the reaction velocity transitioning from an initial rate to a (lower) steady-state rate as a function of time. This time-dependence can be assessed by the second-order rate constant for the formation of the tight-binding or covalent complex using either kiso/K* (where, K*=(kiso+koff)/kon) for slow-onset or kinact/KI (where, KI =(kinact+koff)/kon). For irreversible inhibitors, kinact/KI becomes kinact/Ki for inhibitors that react slowly (i.e., kinact ≪ koff) and becomes kon for inhibitors that react very fast (i.e., kinact ≫ koff). Despite these specific differences (as a rule of thumb, steady-state velocity is non-zero for slow-onset inhibition while it is zero for irreversible inhibition), the mode of analysis for both slow-onset and irreversible inhibitors remains the same, so that an automated analysis pipeline can cover both.
The goal of the work presented here is to enable an increased throughput for characterizing irreversible and slow-onset inhibitors and to provide high quality mechanistic and kinetic information on drug-target engagement. We show how the analysis bottleneck was removed towards achieving this goal and present the automated data analytical workflow, translating a manual into an automated analytical workflow for routine characterization of thousands of molecules in a timely fashion - a quality-driven data analysis at scale. The necessity for accuracy and precision in automated generation of result sets still requires scientific expertise to set the rules and have a final review and input on the appropriateness of the output parameters. This implies that the expert can override the automated output if there is enough evidence to justify such an intervention.
Automation of data analytics for complex experimental data, like progress curves, is required to robustly operate on a huge variety of experimental outcomes in terms of precision and accuracy. As -so far- such data analysis is highly dependent on expert input, we translated a best-practice, human driven workflow into a computer assisted workflow with checkpoints along the process (see Fig. 2). These implementations include (1) validation of the input (measured) data to assess whether experimental conditions are met; (2) an automated outlier detection of the data; (3) an automatic model selection using statistical and biological validity rules; (4) an automatic result consolidation (and review) by result visualization. The details are discussed below.
3.1 Validating the data to assess whether the experimental conditions are met
In the analysis of covalent inhibition, the progress curve at zero inhibitor concentration plays an important role. Unlike an ideal curve, where the zero-inhibitor control is linear and non-linearity of progress curve arises exclusively because of the time-dependent inhibition brought about by slow-onset or irreversible inhibitors, some zero-inhibitor containing enzyme progress curves can show non-linearity. This could be because of several reasons including, but not limited to, enzyme inactivation, substrate depletion, product inhibition, allosteric modulation, post-translation modification, assay window and so forth [
]. The initial velocity assumption in kinetic data analysis eliminates the possibility of substrate depletion or product inhibition, depending on the relative affinity of product for the enzyme vis-à-vis substrate, contributing to the non-linearity. However, other factors have been shown to contribute to non-linearity. Thus, it becomes necessary to estimate the correct range of linearity to deconvolute the effects arising because of inhibitor from the inherent non-linearity of enzyme progress curves. We found this a delicate problem to solve, because applying stringent criterion and conservative estimates for determining the linearity of the zero-inhibitor control can lead to loss of precious information about inhibitor induced non-linearity. On the other hand, relaxation of the criterion to determine linearity can lead to intertwining of information content when estimating kobs because of inhibitor mediated non-linearity and the inherent non-linearity of the curve.
To address this, we developed a method instituting an iterative linear fit approach to explicitly treat the zero-inhibitor control in estimating the correct range for data analysis (Fig. 3). The literature proposes several methods for measuring the extent of non-linearity [
], however these approaches fail to scale for practicable reasons. For instance, defining a numeric geometric curvature threshold to determine the linear regime of a progress curve is not suitable for production data with hundred or thousand measured compounds showing a high variability in the signal scale and time granularity. Similarly, fitting a tool function with an explicit nonlinear dependency to approximate the linear regime from the result requires assumptions on the underlying model, which can bias the results if such are not valid in a large-scale context. The statistical approach shown here relies solely on the zero-inhibitor control data on a plate-by-plate basis, allowing the linearity assumption to be tested at scale.
3.2 Automated outlier detection and experimental condition validation
To improve the quality of model fitting and to avoid hands-on outlier deletion, which is especially acute for very poor and very potent inhibition, we develop a protocol that removes data points based on their individual value, the value compared to their nearest neighbors, and the total number of values within a target range.
Our protocol works in the following way: The reported signal plot constructed from the last frame of each progress curve within the analysis window (see Fig. 4A) gives a concise overview of the underlying reaction process and makes potential outliers and non-optimal experimental conditions evident. It represents the product of the reaction generated as a function of inhibitor concentration for a given timepoint in the reaction. In a slow-binding reaction this function should decrease monotonically with concentration. Therefore, our workflow identifies reporting points that significantly violate monotonicity with respect to their neighborhood and automatically masks them, removing the corresponding progress curve from the analysis (see Fig. 4B). By normalizing the signal at this timepoint to the corresponding signal of the zero-inhibitor controls it is possible to define plausible signal ranges of progress curves in a successful experiment. However, at very high inhibitor concentrations the curves are flat and can hardly be separated from instrument noise. Curves in this regime contain little information but heavily bias the fit algorithm towards erroneous parameter estimates. Therefore, progress curves with reporting points below a user defined percentage of the signal from the zero-inhibitor controls are masked and removed from the analysis (see Fig. 4B).
For high-throughput assays with a high number of compounds it is common that a few of the tested inhibitor titrations are not ‘optimal’ to characterize the enzymatic reaction in the timespan of the experiment. This leads to measuring multiple linear progress curves which are superposed either close to full inhibition or to the zero-inhibition control. If most curves for an inhibitor are in this regime this leads, once again, to an erroneous estimation of the model parameters. Therefore, a reliable signal range based on percentages of the zero-inhibitor signal is defined, within which the progress curves are expected to have a clear transition between initial and final velocity and contribute to an accurate parameter estimation. The slow-binding models are fit only if at least the specified number of report points are within the reliable range. Otherwise, if most reporting points are below the reliable range the inhibitor is labeled as “Too strong” (see Fig. 5A) or if most are above it is labeled as “Weak” (see Fig. 5B), and the fit process is interrupted. Using this approach, the risk of propagating false parameters into the downstream drug discovery campaign is significantly decreased.
3.3 The selection of models and data fitting
During a manual data analysis, model selection had been contingent upon the subjective selection of the right analysis range, a subjective outlier detection and QC assessment, and an inspection of both local and global model fits before one arrived at an inference on the appropriate model to use for analysis and reporting values. To avoid the immense subjectivity of such an approach that has the potential to introduce a lot of variability, we introduced an automatic model selection approach based on statistical methods and biological validity rules.
The major step in parameter estimation for non-equilibrium inhibitors is the correct identification of a one-step versus a two-step model for the appropriate analysis to be undertaken. This challenge is compounded by resorting to either the local versus global fit for kobs estimation that either leads to non-constrained or constrained/simulated values for the latter. However, for work shown here we prefer a global fit over a local fit. This is motivated from Monte-Carlo methods, which show that parameters estimated using local fits are highly cross-correlated. This has direct bearing on the estimation of confidence interval (CIs) since deviation in one parameter can be compensated by the calibration of other parameters retaining the overall χ2 (Chi [
]). The incidence of these covariance valleys on the χ2 hypersurface can be reduced by global fitting whereby multiple curves are analyzed simultaneously using a single mathematical model to yield shared parameters [
The kinetics following a one-step reaction can be described using the two-step model using extreme parameters. It is therefore not sufficient to perform model selection based on goodness of fit alone. Therefore, we impose constraints on the parameters KiApp and Ki*App that force their fitted values to be within a biologically relevant scale, trading off with the fit quality if not supported by the data. The statistical validity of the fitted parameters is then evaluated by thresholding the standard error of KiApp and Ki*App and the overall Chi [
]. By limiting the selection to such valid models, we overcome the problem of overfitting the data using the two-step model. Likewise, in case that both results have the same validity, the model with the lowest Bayesian information criterion (BIC) is selected (we used the BIC instead of the Akaike information criterion (AIC) because it more strongly penalizes the two-step model for having more parameters).
3.4 Qualifying result - very potent vs. very weak inhibitors
As discussed above, both very poor and very potent inhibitors within tested concentrations, pose specific problems. One being the difficulty associated with comparing parameters for such inhibitors across two different proteins to compute metrics, like selectivity ratios for instance. Values of kinact/KI less than 20 M−1 sec−1 would indicate very poor inhibition and are at the limit of the sensitivity for routine assays and experimental equipment. Therefore, the accuracy in estimating the values below 20 M−1sec−1 is unreliable and variation in their reporting can result in disparate selectivity assessments. For instance, though kinact/KI values for protein A of 10, 1, 0.1 or 10−2 (units of M−1 sec−1) across replicate measurements would essentially refer to inaccuracy in the estimates because of variation at the limit of sensitivity of the analysis, they can result in hugely different selectivity window when compared against another protein (protein B) with a fixed kinact/KI value of 100. Likewise, with extremely potent compounds, the estimation of kinact/KI values becomes dependent on very few data points that have not been completely inhibited leading to high levels of noise. This can result in estimations of kinact/KI values almost near the diffusion limit (106-108 M−1sec−1), and these estimates are highly unreliable and have oftentimes resulted in skewed and unexpected outcomes when analyzing irreversible inhibitors across different project cascades in the past. Thus, it is necessary to institute an appropriate way of reporting the unreliability of the parameter for both poor and highly potent compounds. We therefore flag these values in the analysis workflow as either non-computable or unreliable depending on the number of data points in transition of the dose-response curve generated at the end of signal range, as discussed in Section 2.2.4. This enables a clearer interpretation of results by corresponding teams, as a “greater than” or “less than” sign indicates that above or below the pre-determined thresholds, the estimated parameters cannot be reliably interpreted.
3.5 Differentiating equilibrium reversible vs. non-equilibrium slow-onset and irreversible inhibitors
Equilibrium reversible inhibitors always show a dose dependence of inhibition without revealing any time-dependency. This is in-contrast to non-equilibrium modalities that show both dose- and time-dependent inhibition. For instance, during early stages of drug discovery in a project cascade aiming to design irreversible kinase inhibitors, the goal is to optimize the affinity of the small molecule with very weakly or moderately intrinsic reactivity of the electrophile (which, for example, can be assessed by its reactivity with a free thiol group such as glutathione or reduced DTNB) towards cysteine nucleophiles that are mostly quiescent (compared to surface cysteines). This is essential to build in an aspect of binding conferred specificity and to avoid optimizing for an affinity-label. If this is the case, it results in poorly determined kinact value. Under such conditions, because the koff is far greater in magnitude vis-à-vis kinact, the kinact/KI estimate is actually estimating kinact/Ki. This is because the kinact term in the KI can be ignored safely (KI = koff+kinact/kon). Hence, under this regime of values, these small-molecules may appear reversible with no time-dependence of inhibition revealing itself within the span of the assay. This needs to be appropriately reported to the chemists for informed SAR optimizations.
Additionally, for reasons that are not discussed in the current manuscript, the project might decide to screen and optimize a mix of both reversible and irreversible inhibitors. In such cases the ability to discriminate between equilibrium reversible (those that show exclusive dose dependence of inhibition) and non-equilibrium slow-onset and irreversible inhibition (those that show both dose and time-dependence of inhibition) becomes necessary. The presented workflow addresses this necessity by testing the progress curves for linearity using the same approach as for the zero-inhibitor control, as discussed in Section 3.1.1. If for all progress curves the last timepoint satisfying the linearity condition is beyond the established analysis range, the curves are annotated as equilibrium reversible. Otherwise, the curves are annotated as non-equilibrium slow-onset or irreversible. We would like to state that these annotations are not a reflection of the actual mechanistic model that a particular inhibitor-enzyme interaction conforms to and is a mere reflection of the linearity or otherwise of the progress curves within the timespan of the analysis to inform chemistry either about reversibility or the magnitude of the kinact.
3.6 Benefit of the new workflow compared to traditional analysis
The workflow in a ‘traditional data analysis’ involves two distinct steps of non-linear curve fitting with the progress curves and a manual inspection of the subsequent kobs versus inhibitor plots [
Insights into the slow-onset tight-binding inhibition of Escherichia coli dihydrofolate reductase: detailed mechanistic characterization of pyrrolo [3,2-f] quinazoline-1,3-diamine and its derivatives as novel tight-binding inhibitors.
]. When the data quality is exceptional good, a data analysis itself would take in average 15-20 minutes for the extraction of parameters for just a single compound. This estimate is a ‘conservative optimistic’ number given that manifold factors influence the quality of experimental data, rendering it a lot noisier than given from textbook data. Coming from a screening perspective, where testing of many analytes is a goal, human labor becomes a limiting factor. For instance, data analysis of 100 compounds requires 1500-2000 minutes (=25 -33 hours). The new workflow introduced here allows the same number of compounds to be completed in at most 30 minutes (0.5 hours) - including the time taken to import the data, perform regular QC checks, detect, and eliminate the outliers and model fitting and selection – a dramatic gain of approximately 60-fold (see Fig. 8). This results in substantial cost saving in terms of number of hours per project, thus amounting to substantial full-time equivalents (FTEs) timesaving.
Fig. 8Efficiency gains of the new automated workflow vs. the legacy analysis workflow.
The total time required for producing validated result parameters splits into the time required to produce experimental data and time required for the QC and analysis. Producing data involves the generation of assay ready plates, preparation of substrate + protein mix, followed by reading the signal with plate readers. The data analysis reflects the QC, fitting of progress curves and the result consolidation (too strong inhibitor, weak inhibitor, equilibrium reversible, 1 or 2 step). The new workflow permits a 60-fold efficiency gain compared to the legacy workflow.
We have introduced and implemented in Genedata Screener an automated workflow to scale-up the analysis of non-equilibrium modalities of inhibition (slow-onset and irreversible inhibitors) to keep pace with rising demand for massive hit and lead characterization in early discovery. The cornerstones of this workflow are (1) automated data validation steps, (2) robust (in terms of precision and accuracy), automated model fitting and selection, (3) result consolidation and display for interactive review and adaptation by a scientist, (4) publication of results to data warehouses by ‘a mouse click’.
The workflow provides a high gain in efficiency for the project scientist, reducing data analysis time by more than 60-fold, while producing a high number of mechanistic and kinetic quality endpoints guiding early drug discovery on non-equilibrium inhibitors. Further, this workflow is poised to increase consistency of results from these assays, reducing person-to-person or site-to-site variability.
In the future, with more drug discovery projects realigning their goals to exploit non-equilibrium mechanisms of inhibition from the outset, this workflow will enable the objective and high-throughput analysis of hit and lead molecules to inform SAR/SKR optimizations during initial drug-discovery. Having applied this workflow to first-pass, high-throughput kinetic studies, AstraZeneca has seen the removal of previously rate-limiting data analysis steps, a boost in throughput, and reduction in variability and increased robustness of results in terms of precision, according to common quality metrics.
Data and software availability
All data reported in this manuscript will be provided upon reasonable request. The data and analysis pipeline reported in this manuscript is considered proprietary by AstraZeneca PLC and Genedata AG. However, the pipeline has unique scientific value because it allows the analysis of small-molecule inhibitors with non-equilibrium modality of inhibition in a high-throughput manner. This is the first ever demonstration of such an analysis pipeline for accelerating decision making in early drug discovery of non-equilibrium inhibitors. The high-throughput scientific method has been very clearly explained in the manuscript and access to the data is not a prerequisite for reproducing the results reported in the manuscript.
Declaration of Competing Interest
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: All authors are current or past employees of AstraZeneca or Genedata AG.
Acknowledgements
We are thankful to Maria Flocco, Rachel Grimley, Liz Roberts, Geoff Holdgate and James Robinson for their constant support and enthusiasm for work on this Genedata Screener embedded workflow. We acknowledge the initial inputs provided by Fredrik Wågberg, Omar Alkhatib and Xiaoxiao Guo.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
References
Morgan P.
Brown D.G.
Lennard S.
Anderton M.J.
Barrett J.C.
Eriksson U.
Fidock M.
Hamrén B.
Johnson A.
March R.E.
Matcham J.
Mettetal J.
Nicholls D.J.
Platz S.
Rees S.
Snowden M.A.
Pangalos M.N
Impact of a five-dimensional framework on R&D productivity at AstraZeneca.
Insights into the slow-onset tight-binding inhibition of Escherichia coli dihydrofolate reductase: detailed mechanistic characterization of pyrrolo [3,2-f] quinazoline-1,3-diamine and its derivatives as novel tight-binding inhibitors.
A novel high-throughput FLIPR tetra–based method for capturing highly confluent kinetic data for structure–kinetic relationship guided early drug discovery.