3.2 Protease substrate specificity profiling using YESS.
Because all its components are DNA-encoded, the YESS system offers a platform capable of performing three high-throughput experiments: enzyme engineering, substrate specificity profiling, and mutational scanning. Performing each experiment would typically require three different technologies. For instance, one can engineer a protease and profile the substrate specificity of evolved variants during the engineering campaign. Furthermore, when combined with next-generation sequencing and deep learning, YESS can map the substrate specificity landscape of proteases (Figure 3B).
To optimize YESS for protease substrate specificity profiling, Qing and coworkers sought to analyze and remove major endogenous proteolytic events in the yeast secretory pathway, which could convolute analysis of cleavage specificities of recombinantly expressed proteases (Li et al., 2017). Screening a DNA-encoded pentapeptide library revealed that a secretory pathway protease cleaved many arginine and lysine-containing sequences. This protease was identified as the Golgi residentkex2 protease, with a major cleavage pattern of Ali/Leu-X-Lys/Arg-Arg. These results helped generate a kex2knockout yeast strain, a superior strain to profile the substrate specificity of proteases, particularly ones with trypsin-like cleavage patterns.
Predicting PTM-enzyme substrate specificity is essential for designing specific activity probes and inhibitors, inferring physiological substrates, and guiding PTM-enzyme substrate specificity engineering. The main obstacle to overcome in enzyme-substrate specificity profiling is undersampling. Substrate specificity is relative, and for promiscuous enzymes, it is better defined when more substrates are interrogated. Unfortunately, even the largest substrate libraries generated with yeast or phage display (>109 unique sequences) only sample a fraction of possible amino acid combinations in a heptapeptide library. Machine learning can overcome this bottleneck, and the DNA-encoded substrate libraries in the YESS system provide the sequence-function datasets to build ML models for substrate specificity prediction. Khare and coworkers judiciously showed that combining the YESS system, computational modeling, and machine learning allows one to entirely map the P6-P2 substrate specificity and energetic landscape of HCVp (Pethe et al., 2019). They sorted a naïve pentapeptide library spanning the P6 to P2 sites of HCVp and selected three distinct populations by FACS: uncleaved, partially cleaved, and completely cleaved sequences. They showed that fully and partially cleaved sequences form separate clusters and that one can map sequence preference trajectories by single substrate mutation tracking within the data. To predict the cleavability of the entire pentapeptide library diversity (3.2 million sequences), they implemented a support vector machine method trained on energetic features of experimentally derived sequences obtained from Rosetta modeling. This approach allowed them to reconstruct the pentapeptide substrate landscape completely. Most importantly, they discovered and characterized a novel cleavage pattern (PSTVF) in addition to the four previously known HCVp cleavage specificities. This deep analysis could be tailored to any PTM-enzyme and its variants (including drug-resistant mutations) to explore sequence and structure landscapes of enzyme-substrate interactions not possible with experiments alone. One obvious next step would be to leverage machine learning and substrate profiling to infer physiological substrates as a complement to more expensive proteomics approaches such as SILAC.