Rapid and Accurate Calculation of Ligand-Protein Binding Free Energies
Dieter Kranzlmüller(1) and Perter V. Coveney (2)
(1) Ludwig-Maximilians-Universität, München (Germany), (2) Centre for Computational Science, University College London (UK)
Local Project ID:
HPC Platform used:
SuperMUC of LRZ
Rapid and accurate calculation of binding free energies is of major concern in drug discovery and personalized medicine. A pan-European research team leveraged the computing power of LRZ’s SuperMUC system to predict the strength of macromolecular binding free energies of ligands to proteins. An in-house developed, highly automated, molecular-simulation-based free energy calculation workflow tool assisted the team in achieving optimal efficiency in its modelling and calculations, resulting in rapid, reliable, accurate and precise predictions of binding free energies.
Most drugs work by binding to specific proteins and blocking their physiological functions. The binding affinity of a drug to its target protein is hence a central quantity in pharmaceutical drug discovery and clinical drug selection (Fig. 1). For successful uptake in drug design and discovery, reliable predictions of binding affinities need to be made on time scales which influence experimental programs. For applications in personalized medicine, the selection of suitable drugs needs to be made within a few hours to influence clinical decision making. Therefore, speed is of the essence if free energy based calculation methods were to be used in these areas. This work is on developing an automatic workflow which ensures that the binding affinity results are accurate and reproducible, and can be delivered rapidly.
The binding affinity calculations would be very lengthy, tedious, and error-prone to perform manually. They consist of a large number of steps, including model building, production MD and data analytics performed on the resulting trajectory files. To perform modelling and calculation with optimal efficiency, we have developed the Binding Affinity Calculator (BAC) , a highly automated molecular simulation based free energy calculation workflow tool (Fig. 1). Its execution is much faster and more error-proof when performed in an automated fashion. A user-friendly version of BAC, namely uf-BAC, has been developed to extend its accessibility to nontechnical users..
Two approaches are included in BAC for the binding free energy calculations of ligands to proteins. One is ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) ; the other is TIES (thermodynamic integration with enhanced sampling) . The underlying computational method is based on classical molecular dynamics (MD). In MD simulations, macroscopic properties corresponding to experimental observables are defined in terms of ensemble averages. Free energy is such a measurement. ESMACS and TIES use ensemble averaging and the recognition of the Gaussian random process (GRP) properties computed from MD trajectories. On multicore machines such as SuperMUC, ensemble simulations play into our hands because, in the time it takes to perform one such calculation, all of the members of an ensemble can be computed. The method is therefore fast, with free energies being determined within around 8 hours.
We have found that an ensemble consisting of ca. 25 replicas for an ESMACS study, and an ensemble of a minimum of 5 replicas for a TIES study are required per free energy calculation in order to guarantee reproducibility of predictions. Our approaches have now been standardized; ESMACS and TIES have been applied by us to over 20 different sets of compounds and protein targets, of which many have been performed using the substantial allocation of cycles on the GCS supercomputer SuperMUC at the Leibniz Supercomputing Centre.
In an unprecedented project, executed in the context of an extreme scaling workshop at LRZ, we ran a giant workflow on Phases 1 and 2 of SuperMUC, from which more than 60 free energy calculations were performed in 37 hours. This was the first time that both phases of SuperMUC were jointly allocated exclusively for one project. The accumulated compute power of phase 1 and 2 amounts to 5.71 PFlop/s Linpack. In contrast to a monolithic application, which would span the whole machine via MPI, the big challenge was to keep the system busy by the multiple threads of the workflow. This required constant monitoring of the job’s progress and immediate fixing of problems during the time of the run.
We not only attained all our planned objectives for the Giant run but achieved even more than anticipated, thanks to the exceptional performance of the computer. The GCS issued a press release, and distinguished science writer and journalist Dr. Roger Highfield wrote a blog post about the experience .
With the resource allocation in the current project, we have applied ESMACS and TIES to study a wide range of proteins which have diverse functions in the human body and are important targets for pharmaceutical drug design and discovery, and for clinical therapies. We have made very important progress in our research this year. We have been able to produce rapid, reliable, accurate and precise predictions of binding free energies using both ESMACS and TIES. Studies of many of the molecular systems have been completed and results published [3, 5, 6], others are either at the post-processing stage or at earlier stages where more simulations and calculations are required.
Our predictions from ensemble simulations, some of them performed blindly, are in good agreement with experimental findings, including those released to us by leading pharmaceutical companies worldwide after our computational predictions were made [3, 5, 6].
Our findings have demonstrated that this approach is able to deliver an accurate ranking of ligand binding affinities quickly and reproducibly. We have recently reported the performance of the TIES approach when applied to a diverse set of protein targets and ligands . The results (Fig. 2) are in very good agreement with experimental data (90% of calculations agree to within 1 kcal/mol), while the method is reproducible by construction. Statistical uncertainties of the order of 0.5 kcal/mol or less are achieved.
In direct collaborations with two pharmaceutical companies, our approaches were tested in a realistic pharmaceutical setting [5, 6]. The calculations were performed, initially blind, to investigate the ability of our methods to reproduce the experimentally measured trends. Good correlations were obtained from both of the methods. Energetic and dynamic information at the atomistic level are forthcoming from the simulations, which cannot be obtained from experiments. Such information not only explains the experimental observations but sheds light on how to make modifications in the laboratory to improve the ligand binding and/or ligand selectivity (Fig. 3).
Prof. Dieter Kranzlmüller/PI, Dr Nils Otto vor dem gentschen Felde (both LMU München, Germany), Prof. Peter Coveney/PI, Dr Shunzhou Wan, Agastya Bhati, Serge Jovanovic (all UCL, London)
Prof. Shantenu Jha, Department of Computer Engineering, Rutgers University, USA
We acknowledge the Leibniz Supercomputing Centre for providing access to SuperMUC and the very able assistance of its scientific support staff.
 S. Sadiq, D. Wright, S. Watson, S. Zasada, I. Stoica, P. Coveney. 2008: J. Chem. Inf. Model. 48, 1909–19.
 S. Wan, B. Knapp, D. W. Wright, C. M. Deane, P. V. Coveney: J. Chem. Theory Comput. 2015, 11, 3346−3356.
 A. Bhati, S. Wan, D. Wright, P. Coveney: J. Chem. Theory Comput., 2017, 13, 210–222
 S. Wan, A. Bhati, S. Zasada, I. Wall, D. Green, P. Bamborough, P. Coveney: J. Chem. Theory Comput., 2017, 13, 784–795
 S. Wan, A. Bhati, S. Skerratt, K. Omoto, V. Shanmugasundaram, S. Begal, P. Coveney: J. Chem. Inf. Model., 2017, 57, 897–909
This article first appeared in magazin InSiDE, Vol. 15 No. 1 (Spring 2017)
Peter V. Coveney
Centre for Computational Science (CCS)
Department of Chemistry
University College London,
20 Gordon Street
London WC1H 0AJ, UK
email: p.v.coveney [at] ucl.ac.uk
Project ID: pr87be