Towards retrobiosynthesis: Predicting substrate-binding profiles for potential new esterases
Principal Investigator:
Prof. Dr. Holger Gohlke
Affiliation:
Heinrich-Heine-Universität Düsseldorf, Institut für Pharmazeutische und Medizinische Chemie, Germany
Local Project ID:
Lipases
HPC Platform used:
JUWELS BOOSTER at JSC
Date published:
Prof. Dr. Holger Gohlke, Pablo Cea Medina, and Alena Endres used the computing resources of the Jülich Supercomputing Centre to study the substrate specificity of esterases. Esterases are enzymes with multiple biotechnological applications, as they can degrade a wide variety of substrates. However, finding which esterase can degrade which substrates is hard, expensive, and time-consuming. Therefore, the team seeks to develop a rational understanding of how different esterases recognize their substrates, to effectively predict which esterase would be suitable for a given task.
The team used two approaches: One focused on integrating data for machine learning (ML) (Figure 1, Subproject I), and another focused on atomic-scale modeling (Figure 1, Subproject II). For subproject I, the team leveraged the high amount of esterase sequences available on public databases and a curated dataset with well-described substrate preferences to generate a dataset of high-quality 3D models. Then, the properties of the active site of each enzyme were described as interaction potential grids, which can be condensed into simple 1D Zernike descriptors. These vectors were used to reveal trends and similarities among the enzymes, as well as input for ML models. For subproject II, the team described how a set of newly described enzymes can degrade specific substrates. By combining molecular docking and molecular dynamics simulations, they could predict how different substrates bind to the active site of these enzymes, revealing the key interactions between them.

Figure 1. Summary of the workflows of each subproject. In subproject I, data mining and high-throughput modeling were used to generate a large collection of models of characterized and uncharacterized esterases. Then the essential information of the active site was expressed as 1D-Zernike descriptors, derived from affinity grids. This served as input for ML, and to study clustering patterns in latent-space representations. In subproject II, the binding mode of newly crystallized enzymes was studied by performing docking, MD simulations and structural clustering. Altogether, these studies provide insights into the molecular basis of how esterases function.
Esterases are hydrolytic enzymes that cleave ester bonds, releasing an alcohol and an acid. Their high chemo-, regio- and stereoselectivity make them attractive for the chemoenzymatic synthesis of fine chemicals. Moreover, some esterases can cleave synthetic polymers such as polyethylene terephthalate (PET) enabling green biodegradation of plastic waste.
Thanks to metagenomics, the amount of sequence data available is higher than ever. This opens up the possibility to identify new esterases that catalyze industrially relevant reactions through sequence mining. However, the current approach, high-throughput screening, is time-consuming and often fails to identify enzymes with the desired specificity and selectivity. The abundance of sequence information, combined with the advent of extremely accurate tools for protein structure prediction and the resounding success of machine learning-based methods to solve biological problems, make the field ripe for the development of predictive tools that can aid in the search for novel biotechnologically relevant esterases.
During this computing time grant, Prof. Dr. Holger Gohlke, Pablo Cea Medina, and Alena Endres investigated the use of structure-based features that describe the physicochemical and dynamic properties of the active site, as a basis to build an effective substrate preference predictor. Our team developed so-called Zernike descriptors (ZDs) as the main feature to capture the essential characteristics of the active site. ZDs are large 1-dimensional vectors, obtained by applying a Zernike polynomial expansion on 3D molecular interaction fields derived from docking grids. Furthermore, it has been extensively shown that conformational dynamics is a key aspect influencing the catalytic promiscuity of enzymes. Therefore, our calculation of ZDs was not performed on a single static structure but considering conformational ensembles derived from molecular dynamics simulations.
By performing dimensionality reduction to generate a projection space to visualize the distribution of experimentally characterized enzymes according to their ZDs, our team found that ZDs alone cannot accurately reproduce the clustering patterns observed when grouping enzymes by their experimental substrate preference profile. Nonetheless, our results show that ZDs are able to effectively discriminate between proteins according to their sequence classification. For this reason, the team saw the opportunity to utilize ZDs to map the uncharted esterase sequence space. To achieve this goal, our team constructed the esterome: a comprehensive library of high-quality structural models of a large and diverse pool of esterase sequences. Calculating ZDs of MD ensembles of representative structures of the esterome allowed us to test the completeness of the sequence space currently sampled in our experimental dataset and point to possible regions that require further experimental characterization. These efforts allowed us to identify candidate enzymes, whose active site properties deviate significantly from those previously characterized and thus provide an opportunity to expand the repertoire of hydrolyzed substrates.
Furthermore, to expand our understanding of how esterases can catalyze other diverse reactions, such as the depolymerization of large urethane-containing plastics or the degradation of lactonic rings, our team generated putative binding modes for multiple substrates across different newly described enzymes by combining chemical knowledge with docking and molecular dynamics simulations.