2.2.2 | Preparation of targets and model templates. 
A FASTA file of the receptor sequence is prepared by the CASP organizers. For known small molecules, SMILES are retrieved from the PDB component dictionary. In the case of novel small molecules (not present in the PDB component dictionary), SMILES are provided by the experimentalists. In both cases, SMILES are compared and modified based on those derived from the PDB coordinates. If necessary, stereochemistry is assigned using the AssignStereochemistryFrom3D function from RDKit, and the protonation state adjusted by manually editing the SMILES based on the visual inspection of protein-ligand interactions.
The relevance of each small molecule is decided case by case for each target. Only biologically relevant small molecules are retained. Common crystallographic reagents and ions are ignored if not interacting with the small molecules or part of a structural motif (e.g., zinc binding motif).
A script to prepare prediction templates (MDL files) is provided by the CASP organizers. It is implemented in python 3 and RDKit python bindings (http://www.rdkit.org/) . The script initially converts the input SMILES strings to RDKit Mol objects using the rdkit.Chem.MolFromSmiles method. At this stage, the Mol objects contain only the information related to small molecule properties, like atom types and bond formation. Coordinate section is added to the Mol objects using the RDKit’s ETKDG method38. Subsequently, the Mol objects are written to the MDL-formatted file33, which can be used as a ligand submission template.
2.2.3 | Setting up the acceptance system. Validation of ligand predictions is performed with scripts written in python 2.7 and RDKit. Initial checks verify the CASP header section (availability and correctness of PFRMAT, TARGET, AUTHOR, and MODEL/END records). Once submissions have passed this phase, ligand models are converted to RDKit Mol objects and compared with the template for downstream evaluation. Each molecule in the submitted file is validated by comparison with a reference Mol object generated from the corresponding SMILES string as described above. To validate the submissions, comparisons of the following parameters are undertaken:
Additionally, to account for atom connectivity and chirality in submitted models, the maximum common substructures between the submitted and reference ligands are calculated using the FindMCS function in RDKit. To pass the validation, a molecule must have a maximum common substructure equal to the number of atoms in the reference model.
Finally, a validation report is created showing the results of the validation process to aid in troubleshooting invalid submissions.
2.2.4 | Macromolecule-ligand complex evaluation measures. Previous ligand docking challenges like Teach Discover Treat (TDT) 39, Continuous Evaluation of Ligand Prediction Performance (CELPP)40 and Drug Discovery Data Resource (D3R)34-37 have used two main types of metrics to assess how well participants can model receptor-ligand complexes. These evaluated how close a predicted ligand is to the target within the binding site in absolute terms with the RMSD metric, and how well the native receptor-ligand interactions are reproduced. CASP experiment brings additional assessment challenges: (1) because the receptor structure is not given but rather modeled, ligands in the model and reference complexes can be bound to different configurations of binding sites, and thus calculation of any superposition-based scores requires preliminary alignment of binding pockets with ligands in two complexes, which is not a trivial task; (2) chain mapping needs to be established; (3) incomplete ligands in some targets require partial graph matching for the symmetry correction; and (4) multiple copies of ligands in the targets and models have to be mapped (assigned) uniquely, in order to avoid scoring target or predicted ligands multiple times.
To address these challenges, we developed two scores, which are described in more detail in the CASP15 Ligand Assessment paper7. The Binding-Site Superposed, Symmetry-Corrected Pose Root Mean Square Deviation (BiSyRMSD) score defines the binding sites and the superpositions to compute RMSDs between target’s and model’s ligands. The Local Distance Difference Test for Protein-Ligand Interactions (lDDT-PLI) measure assesses how well native contacts between the receptor and the ligand are reproduced in the model with an lDDT-based metric and symmetry correction. When used in combination, these scores give a better account of modeling receptor-ligand complexes.