MATERIALS AND METHODS

PEZYFoldings in CASP15

Overall pipeline

A schematic representation of the pipeline is shown in Fig. 1A. The default AF2 pipeline process can be broadly divided into the following steps: MSA construction, structure prediction, and relaxation using OpenMM18. The main differences between the default AF2 pipeline and the PEZYFoldings pipeline include a more extensive sequence similarity search in the MSA construction step and the introduction of refinement steps. Details of each step are described in the following sections.

Sequence similarity search and MSA construction

The MSAs constructed in the pipeline are summarized below. In addition, the URLs and data downloaded from the databases are listed in Table S1.
PZLAST-MSA : Query sequences were submitted to the PZLAST10,11 web API service with option “max_out=10000.” Because hits from PZLAST are fragmented sequences directly translated from sequencer reads, I aligned them with jackhmmer19,20 and assembled them using a simple script; if aligned regions of two sequences were longer than 20 aa and the regions had an identity > 95 %, the sequences were merged.
PSIBLAST-MSA : PZLAST-MSA was inputted to PSI-BLAST15 version 2.13.0 with PSI-BLASTexB16 customization with the -in_msa option and options “-evalue 0.00001 -outfmt \”6 qseqid sallacc evalue pident nident qlen staxids sseq\” -max_target_seqs 100000 -num_threads 128.” nr14and an in-house metagenomic database (described later) were searched simultaneously. In the early season, the number of iterations was set to two. In the later season (from T1173), it was changed to search iteratively up to three times using Position Specific Scoring Matrix (PSSM) checkpoint files; when the number of hit sequences was greater than 10,000, the iteration was terminated. If the final number of hit sequences was small (<10,000), PZLAST-MSA was merged. Sequences were aligned using jackhmmer. The taxonomy IDs of the sequences were added to a TaxID tag, which was used for sequence pairing in a later step.
HHBLITS-UNIREF-MSA : PSIBLAST-MSA was inputted to hhblits21 (hhsuite22 v3.3.0) using the UniRef3023 database. With options ”-all -n 3 -cpu 128.”
HHBLITS-BFD-MSA : PSIBLAST-MSA was inputted to hhblits using the BFD24 database with options “-all -n 2 -cpu 6.” If the number of sequences in the MSA was larger than 10000, the MSA was filtered using hhfilter with options “-cov 30 -id 100 -diff 10000.”
JACKHMMER-UNIPROT-MSA : A query sequence was inputted to jackhammer (hmmer19,20 suite 3.3.2) using the Uniprot25 database with options “–cpu 128 -E 0.00001 -N 3.”
JACKHMMER-MGNIFY-MSA : A query sequence was inputted to jackhmmer using the MGnify26 database with options “–cpu 128 -E 0.00001 -N 3.” If the number of sequences in the MSA was larger than 10000, the MSA was filtered using hhfilter with options “-cov 30 -id 100 -diff 10000.”
Final input MSA : PSIBLAST-MSA, HHBLITS-UNIREF-MSA, HHBLITS-BFD-MSA, JACKHMMER-UNIPROT-MSA, and JACKHMMER-MGNIFY-MSA were concatenated and filtered using hhfilter with options “-id 100 -cov 30 -maxseq 500000.”

Construction procedure of the in-house metagenomic database

The metadata of the assembly entries was downloaded from the NCBI FTP site on 2022-03-28. e entries that had “metagenome” in their description were extracted. The entries were checked to see whether they had translated_cds.faa, protein.faa.gz, cds_from_genomic.fna.gz, rna_from_genomic.fna.gz, or genomic.fna.gz in this order of priority. If the sequence data were nucleotides, they were translated using prodigal27 with the default settings. If the prodigal could not be processed using the default settings, the “-p meta” option was used. A unique ID was generated for each entry and considered a taxonomy ID.

MSA filtering and feature building

After constructing the MSAs, I filtered them using several criteria and created variations of the MSAs according to the sequence identities: 1) clustered with sequence identity 95 %, 2) clustered with sequence identity 90 %, 3) filtered out if sequence identity with the query was less than 80 %, 4) filtered out if sequence identity with the query was less than 60 %, and 5) no identity filters were applied. Filtering was performed using hhfilter. I used the “-cov 30” option; however, in the middle of the season, I noticed that all unpaired sequences of a subunit were filtered out if the subunit length was less than 30 % of the total length of the multimeric structures. Therefore, the coverage values changed arbitrarily during the season. The input features for the AF2 networks are created in this step. This step allows flexible manipulation of the input features for AF2; for example, one can deliberately pair or unpair sequences, such as the AF2Complex28 , and provide sparse residue indices to generate partial structures. I added extra gaps (the residue index) between subunits to predict multimer structures with the monomer version of AF2 13,28. TaxID tags or OX tags in the headers of the FASTA entries were used to pair sequences in the MSAs. TaxID tags were added to the headers of the sequences extracted from the nr and in-house metagenomic database. When sufficient computational resources were available, features were also created with skipping the pairing step. For antibody-antigen complexes, the paring step was always skipped (the sequences for H1140 were paired because of my error). In addition, I provided a3m files to the official feature-building pipeline and created input features for the network considering the possibility that I had bugs in my scripts.

Structure prediction by AlphaFold2 or AlphaFold-Multimer

The prediction was made with normal AlphaFold2 (model_1~5) and AlphaFold-Multimer parameters (model_1~5_multimer_v2) downloaded from https://storage.googleapis.com/alphafold/alphafold_params_2022-03-02.tar on 2022-03-11. The number of recycling steps was typically set from 5 to 30, considering the time and computer resources. Intermediate structures were produced during recycling. Therefore, the pipeline produced approximately 1000-2000 structures in standard cases.

Model ranking and selection

The process of model ranking and selection involved utilizing the self-confidence metrics generated by AF2 as the criteria. For monomer targets, a sum of per-residue plDDTs higher than 70 was used because of the possibility of disordered regions. For multimer targets, the weighted sum of the predicted TM-score29 (iptm × 0.8 + ptm × 0.2)4 was used. When I predicted multimer structures with the monomer version, as it did not produce multimer metrics, all the unrelaxed structures were processed with the refinement model (see below). The top models were typically selected. For the rest of the submission, the TM-score software or MM-align30was used to maintain the variation in the structures (e.g., highly similar structures were not selected), considering ensembles, alternative forms, or mispredictions. Various human interventions were utilized in this step due to numerous issues that needed to be addressed. For example, models in which subunits did not interact with other subunits often had low TM-scores with other models and were selected in the semi-automatic pipeline. However, such models were avoided, as it was evident that the prediction was incomplete.

Refinement

I constructed a deep-learning model that refined the predicted structures by fine-tuning the official AlphaFold-Multimer weight (model_1_multimer_v2). It uses a predicted structure and its amino acid sequence as the input and output refined structures. Further details on this model were provided in the independent paper17. The training conditions employed for the model used in CASP15 are listed in Table S2. The five structures selected as submission candidates were input into the refinement model. When sufficient time and resources were available, all predicted structures except the intermediate ones were fed into the refinement model.

Manual interventions

Domain parsing

Structures were usually predicted using full-length sequences of all subunits. However, when the total number of amino acids was large and could not be handled with my GPU, I performed domain parsing and MSA cropping or predicted the entire structure using the CPU mode. Domain parsing is divided into several steps. First, the sequences were split into fragments of lengths ranging from 500 to 1000 aa and selected with a random guess. In addition, I sometimes used the results of domain prediction using SMART31. Next, the structures were predicted using AF2, and the regions or subunits that interacted with them were visually inspected. Then, I decided on new boundaries to avoid disturbing the interface. Subsequently, the structures were predicted again, and the resulting models were assessed. The boundary decision and partial structure building steps were repeated until the quality of the partial structures was satisfactory. They were then concatenated with simple scripts, which performed structural alignment using the overlapped regions.

MSA depth arrangement

In cases where the targets encompassed markedly conserved domains, the resulting MSAs sometimes displayed considerable depth imbalances (Fig. 1B). If the depth of the MSA was highly skewed, sequences with amino acids in the sharrow regions were retained, and other sequences were randomly selected to flatten the depth (Fig. 1C). When the depth was insufficient, additional searches were performed to obtain additional sequences around the sharrow regions.

Visual inspections of the refined structures

Because the refinement model was trained with globular proteins17, it sometimes produced globular structures (Fig. 1D, 1E) or many atom clashes. Therefore, I visually inspected the models, and if I observed any problems in the refined models, I did not use them.

Comparison with other teams’ models

As ColabFold32 team, NBIS-AF2-standard team, and NBIS-AF2-multimer team provided publicly available prediction results, I compared their models with my models and assessed whether the protocols worked well. If I perceived my model’s quality as inferior to that of other teams, I undertook protocol revision by conducting extra sequence similarity searches or augmenting the number of recycling steps.

Docking or de novo-like structure prediction by the refinement model

When I could not build good structures using my basic pipeline, I performed docking or de novo -like structure prediction using the refinement model. The process for this approach was straightforward. When the predicted chains were randomly moved and fed into the refinement model, the model created complexes from the chains. Similarly, by feeding a structure with randomly placed atoms into the refinement model resulted in the generation of a reasonable structure.

Target-specific process

Some other interventions such as point mutations on T1109 were conducted. A concise summary of target-specific processes can be found in Supplementary Text 1.

Assessment of the impact of individual element

Impact of extended sequence similarity search

To investigate the impact of the MSA construction protocol without manual intervention, I compared the MSAs with the baseline MSAs generated using the default settings of the AF2 pipeline. Targets less than or equal to 1,200 aa were considered because long sequences require manual intervention to avoid out-of-memory errors. Baseline MSAs provided by the NBIS-AF2-standard and NBIS-AF2-multimer teams were downloaded from http://duffman.it.liu.se/casp15 on 2022-12-27. The subunits of the assembly targets were predicted using AlphaFold-Multimer as the assembly entries. The number of sequences (Nseq) in MSA was calculated as the number of clusters using cd-hit33with the option “-c 1.0 -G 0 -n 5 -aS 0.9 -M 64000 -T 8.” Feature building was performed without identity filtering. The structures were predicted using AF2 by setting the number of recycling steps to 15. Z-M1-GDT (Z-scores of MODEL 1 based on GDT-TS) were extracted from TSV files downloaded from the CASP15 website.

Impact of the refinement model

To evaluate the effect of the refinement model, the precision of the intermediate structures was measured by comparing their accuracy before and after refinement. The intermediate structures of the submitted models were collected from the backup files.