AI Co-folding: Boltz2 – CDD Support

To enable AI in your vault, please contact support@collaborativedrug.com. Once enabled, please note that the number of allowed jobs are set on a per-account basis. Please contact support@collaborativedrug.com if you have questions regarding the number of allowed jobs for your account.

Boltz-2 is a structural biology foundation model, developed by MIT Jameel Clinic and Recursion, that exhibits strong performance for both structure and affinity prediction. Boltz-2 is the first AI model to approach the performance of free-energy perturbation (FEP) methods in estimating small molecule–protein binding affinity achieving strong correlation with experimental readouts on many benchmarks, while being at least 1000× more computationally efficient than FEP.

For more in depth information about Boltz2, please refer to the original paper cited below.

Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Ram Somnath, V., Getz, N., Portnoi, T., Roy, J., Stark, H., Kwabi-Addo, D., Beaini, D., Jaakkola, T., & Barzilay, R. (2025). Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction [Preprint]. bioRxiv. https://www.biorxiv.org/content/10.1101/2025.06.14.659707v1

Implementing a Boltz-2 co-folding protocol in CDD Vault:

Creating a co-folding protocol in CDD Vault is as simple as creating a traditional protocol. For a refresher on creating a protocol, please refer to this article before proceeding.

In the following example, we will work through creating a co-folding protocol against the TYK2 kinase with a known TYK2 inhibitor. The requirements for this type of protocol will be a co-folding and co-folding trigger readout definition. The co-folding readout definition will contain the Boltz2 model target protein whereas the co-folding trigger allows users to selectively run one model (one readout definition) if they have many readouts in one protocol such as for different targets or one protocol that holds all of their computational models.

Users will first create the co-folding trigger readout definition. In this example, we have chosen a “pick list” data type with an allowed value of “Yes”, however, this could be a text or number field as well. Please note that whenever the value of this field changes, the job will trigger and triggering a readout row more than once will overwrite the previous data for that row.

Next, we will create the co-folding readout definition. Users will choose “Boltz2” for the data type as well as define the name of the readout definition and specify the previously created trigger. Users will next supply information about the target sequence represented either as a FASTA string or as a .CIF file. As shown in the following example, the FASTA string can simply be pasted into the text box while a CIF file, the new standard for crystallography which supersedes the older PDB format, can be selected from the users file directory.

The FASTA string will be represented in the following format:

>6X8F_1|Chains A, B[auth C]|Non-receptor tyrosine-protein kinase TYK2|Homo sapiens (9606)

MAHHHHHHHHHHGALEVLFQGPGDPTVFHKRYLKKIRDLGEGHFGKVSLYCYDPTNDGTGEMVAVKALKADAGPQHRSGWKQEIDILRTLYHEHIIKYKGCCEDAGAASLQLVMEYVPLGSLRDYLPRHSIGLAQLLLFAQQICEGMAYLHSQHYIHRDLAARNVLLDNDRLVKIGDFGLAKAVPEGHEYYRVREDGDSPVFWYAPECLKEYKFYYASDVWSFGVTLYELLTHCDSSQSPPTKFLELIGIAQGQMTVLRLTELLERGERLPRPDKCPAEVYHLMKNCWETEASFRPTFENLIPILKTVHEKYQGQAPS

Please note that it may make sense to modify a sequence retrieved from PDB or Expasy to focus on the relevant part of the protein structure. Here, for example, the sequence contains “MAHHHHHHH” at the start.

The query "HHHHHHH" refers to a poly-histidine tag, a common tool in biochemistry to purify proteins. In protein structure, this sequence is not a standard structural element like an alpha-helix or beta-sheet, but rather a short artificial sequence of histidine amino acids attached to a protein to aid in its isolation and purification.

We have now created the minimum requirements to run the co-folding model within the safety and convenience of CDD Vault.

In order to run the co-folding model, users may choose to run a single compound or specify a set of compounds to run in bulk. To run a co-folding simulation for one compound, create a run within the co-folding protocol and then select “add a readout” from the All Data tab of that run.

Next, fill out the relevant information including the molecule name, batch number, and the trigger prior to clicking “add this readout”.

The job has now been submitted and the job status will periodically update to alert users as to what stage the model currently is including when the job started, its current status, and when it is completed.

While it is important to consider the compute resources to run large data sets in co-folding models, users may also start co-folding jobs through the bulk importer found in the “Import Data” tab of CDD Vault. The typical data format to trigger and run co-folding simulations is quite simple and is exemplified below:

Viewing and interpreting docking data in CDD Vault:

Co-folding results may be viewed and searched across on the explore data tab just like any other data type stored in Vault.

When a Boltz2 co-folding protocol has been executed, the following parameters will be automatically output:

Job Id
Job Status
Job Errors
Started At
Finished At
Updated At
Affinity Prediction Value
Affinity Probability Binary
Affinity Prediction Value 1
Affinity Probability Binary 1
Affinity Prediction Value 2
Affinity Probability Binary 2
Confidence Score
PTM
iPTM
Ligand iPTM
Protein iPTM
Complex pLDDT
Complex ipLDDT
Complex PDE (Å)
Complex iPDE (Å)

How to Interpret the Boltz2 Outputs:

Readout	Meaning	Unit	Good Value
Affinity Prediction Value	Encoded affinity level (pIC50 M)	M	Higher = stronger affinity
Affinity Probability Binary	Binary prediction of binding threshold	–	1 = likely binder
Confidence Score	Weighted global+interface confidence	–	>0.7 good
PTM	Global topology accuracy	–	>0.7 good
iPTM	Interface accuracy	–	>0.7 good
Ligand iPTM	Protein–ligand interface accuracy	–	>0.7 good
Protein iPTM	Protein–protein interface accuracy	–	>0.7 good
Complex pLDDT	Local per-residue accuracy (whole complex)	–	>0.7 good
Complex ipLDDT	Local per-residue accuracy (interfaces)	–	>0.7 good
Complex PDE	Predicted distance error (all residues)	Å	Lower = better
Complex iPDE	Predicted distance error (interfaces)	Å	Lower = better

Table derived from the GitHub official repository for the Boltz biomolecular interaction models

Users will notice that some readouts, such as affinity prediction values and affinity probability binaries, have “replicate” readouts (e.g., Affinity Prediction Value 1 and 2). To improve robustness and overall performance in Boltz2, two affinity models with distinct hyperparameters are utilized. The models differ in binder-to-decoy loss weighting (λfocal = 0.8 vs. 0.6), the number of transformer layers (4 vs. 8), and training duration (one is trained longer while the other is early-stopped). This diversity not only enhances predictive accuracy through ensembling but also serves an important role in downstream molecule generation. It is recommended that users gauge the affinity prediction and binding probability prediction using the average score of each (Passaro, S et al).

Target-Ligand Interaction Metrics

There are two main predictions in the affinity output: affinity prediction value and affinity probability binary. They are trained on largely different datasets, with different supervisions, and should be used in different contexts. The affinity probability binary field should be used to detect binders from decoys, for example in a hit-discovery stage. Its value ranges from 0 to 1 and represents the predicted probability that the ligand is a binder. The affinity prediction value aims to measure the specific affinity of different binders and how this changes with small modifications of the molecule. This should be used in ligand optimization stages such as hit-to-lead and lead-optimization. It reports a binding affinity value as pIC50, derived from an IC50 measured in M (Passaro, S et al).

The confidence score is comprised of the predicted local difference distance test and the interface predicted template modelling score. This score provides insight into the overall confidence of the co-folded complex.

Affinity Prediction Value: Average predicted affinity as pIC50 in M across both replicates
Affinity Probability Binary: Average probability that a ligand is a binder to its protein target (0 = weak/non-binder, 1 = binder) across both replicates
Affinity Prediction Value 1: predicted affinity as pIC50 in M in model 1
Affinity Probability Binary 1: predicted binding probability binary in model 1
Affinity Prediction Value 2: predicted affinity as pIC50 in M in model 2
Affinity Probability Binary 2: predicted binding probability binary in model 2

Target Folding Metrics

Given that Boltz2 is a co-folding model, the parameters representing the predicted structure of the target are equally important to the target-ligand interaction parameters.

Confidence Score= A composite score to rank predictions, balancing global structural accuracy (pLDDT) and interface accuracy (ipTM) calculated as (0.8 * pLDDT) + (0.2 * iPTM).

pTM- Predicted Template Modeling score (0-1): A predicted measure of how well the overall predicted structure matches the “true” (unknown) global fold. Small prediction errors at the local scale tend not to have an impact on this global pTM score. This score is used to assess the global topology of the predicted structure (whether the fold is right).
iPTM- Interface Predicted Template Modelling score (0-1): Interface refers to the interface between the protein and ligand. This score measures how well the overall predicted structure matches the “true” (unknown) global fold (just like the pTM), but is restricted to interfaces between the protein and ligand. This score is crucial in complex prediction, because the global fold might look good but the interfaces may be wrong. Values closer to 1 represent high confidence predictions.
Ligand iPTM (0-1): Only computed for protein–ligand contacts.
Protein iPTM (0-1): Only computed for protein–protein contacts.
Complex pLDDT (0-1)- Complex Predicted Local Distance Difference Test (0-1): A metric used to estimate local confidence, at the level of individual amino acid residues, in a predicted protein structure. It is scaled from 0-1, with higher scores indicating higher confidence.
Complex ipLDDT- Complex Interface Predicted Local Distance Difference Test (0-1): An interface (protein-ligand) specific pLDDT representing residues at the binding interface
Complex PDE (Å)- Derived from AlphaFold’s predicted aligned error (PAE). It reflects the expected alignment error between residue pairs. Lower values = better. Unlike pLDDT/pTM (where higher is better), here small PDE means high accuracy. This score is the average PAE across all residue pairs in the complex.
Complex iPDE (Å)- complex interface predicted distance error: Average PAE across all residues at the protein-ligand interface.

CDD Vault’s PDB Viewer:

CDD Vault has a native PDB viewer that can be accessed for co-folding, molecular docking, protein folding, or viewing experimentally resolved crystal structures. The right-hand menu allows users to quickly toggle on the active-site specific view as well as turn on protein residue labels to quickly view where key interactions may be taking place.

Additionally, three different protein surfaces are available:

Solvent accessible surface (SAS): Created by rolling a sphere of roughly the size of water across the protein surface. This is useful because it shows where ligand atoms should be located to have good van-der-Waals interactions. This is useful for identifying additional pockets that could be filled by the ligand.
Solvent excluded surface (SES)
Van der Waals surface

The option “one sided surface” only shows the surface from the outside. This sometimes helps to visualize deep pockets.

Users will also have the option to toggle on the van der Waals surface for the ligand.

Please note, whichever view selections you make in the PDB viewer will be retained when you exit back to the search results page as shown below: