2.2 | Protein-ligand complexes.
The ligand topic is not new to CASP: in CASPs 6 through 10 predicting
ligand binding sites was a sub-challenge in the function prediction
category 29-32. Given
the recent advances in the accuracy of protein modeling methods11,12,
CASP organizers decided to include prediction of protein- and RNA- small
molecule ligand complexes into the scope of CASP15 experiment hoping to
boost development of methods in this area. Participants are provided
with the sequence and stoichiometry of protein (or RNA) receptors,
Simplified Molecular Input Line Entry System (SMILES) codes of bound
ligands, and are asked to predict structures of protein- (RNA-) ligand
complexes.
2.2.1 | Macromolecule-ligand complex prediction format(https://predictioncenter.org/casp15/
index.cgi?page=format#LG). One important requirement for the ligand
prediction format was the need to encode atom connectivity in a robust
and reliable manner, as the correct atom connectivity is required for
symmetry correction, a necessary step in accurate ligand assessment.
Unfortunately, the PDB format, which is commonly used in CASP, is not
able to reliably encode connectivity for arbitrary ligands. The MDL
molfile format 33 is a
common format for ligands which was used in earlier ligand docking
challenges such as D3R34-37. This is a
text-based, fixed column format that besides atom coordinates also
encodes the bonds. Unlike the PDB format, atoms are not named and only
identified by their element and connectivity. The format allows
reporting additional properties such as charge, valence, or isotope, but
those were not required nor used here. Bonds between atoms are encoded
explicitly, one by line, together with the bond type (single, double,
triple, or aromatic) and stereochemistry. The format also includes
header lines, a COUNTS line, which can help check the integrity of the
file, and an M END line which indicates the end of the ligand data.
For CASP15, we devised a hybrid submission format where the receptor’s
model (protein or RNA) and ligand model are submitted as separate files
in the same spatial frame of reference. The receptor is submitted in the
PDB format, while the ligand in MDL (see below for details). Similarly
to the regular protein structure submission, a CASP ligand submission
(LG format) starts with a CASP header including format specification
code, target identifier, author identifier, and description of the
modeling method. Two new keywords are introduced: the LIGAND keyword,
which defines ligand name and the beginning of the ligand data, and the
POSE keyword, which defines the pose number for the selected ligand.
Participants are allowed to submit up to 5 poses of a given ligand for a
selected receptor model.
An example of LG prediction is provided in Example 6 on the CASP15
format page.