To enable AI in your vault, please contact support@collaborativedrug.com. Once enabled, please note that the number of allowed jobs are set on a per-account basis. Please contact support@collaborativedrug.com if you have questions regarding the number of allowed jobs for your account.
Boltz-2 is a structural biology foundation model, developed by MIT Jameel Clinic and Recursion, that exhibits strong performance for both structure and affinity prediction. Boltz-2 is the first AI model to approach the performance of free-energy perturbation (FEP) methods in estimating small molecule–protein binding affinity achieving strong correlation with experimental readouts on many benchmarks, while being at least 1000× more computationally efficient than FEP.
For more in depth information about Boltz2, please refer to the original paper cited below.
Passaro, S., Corso, G., Wohlwend, J., Reveiz, M., Thaler, S., Ram Somnath, V., Getz, N., Portnoi, T., Roy, J., Stark, H., Kwabi-Addo, D., Beaini, D., Jaakkola, T., & Barzilay, R. (2025). Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction [Preprint]. bioRxiv. https://www.biorxiv.org/content/10.1101/2025.06.14.659707v1
Implementing a Boltz-2 co-folding protocol in CDD Vault:
Creating a co-folding protocol in CDD Vault is as simple as creating a traditional protocol. For a refresher on creating a protocol, please refer to this article before proceeding.
In the following example, we will work through creating a co-folding protocol against the TYK2 kinase with a known TYK2 inhibitor. The requirements for this type of protocol will be a co-folding and co-folding trigger readout definition. The co-folding readout definition will contain the Boltz2 model target protein whereas the co-folding trigger allows users to selectively run one model (one readout definition) if they have many readouts in one protocol such as for different targets or one protocol that holds all of their computational models.
Users will first create the co-folding trigger readout definition. In this example, we have chosen a “pick list” data type with an allowed value of “Yes”, however, this could be a text or number field as well. Please note that whenever the value of this field changes, the job will trigger and triggering a readout row more than once will overwrite the previous data for that row.
Next, we will create the co-folding readout definition. Users will choose “Boltz2” for the data type as well as define the name of the readout definition and specify the previously created trigger. Users will next supply information about the target sequence represented either as a FASTA string or as a .CIF file. As shown in the following example, the FASTA string can simply be pasted into the text box while a CIF file, the new standard for crystallography which supersedes the older PDB format, can be selected from the users file directory.
The FASTA string will be represented in the following format:
>6X8F_1|Chains A, B[auth C]|Non-receptor tyrosine-protein kinase TYK2|Homo sapiens (9606)
MAHHHHHHHHHHGALEVLFQGPGDPTVFHKRYLKKIRDLGEGHFGKVSLYCYDPTNDGTGEMVAVKALKADAGPQHRSGWKQEIDILRTLYHEHIIKYKGCCEDAGAASLQLVMEYVPLGSLRDYLPRHSIGLAQLLLFAQQICEGMAYLHSQHYIHRDLAARNVLLDNDRLVKIGDFGLAKAVPEGHEYYRVREDGDSPVFWYAPECLKEYKFYYASDVWSFGVTLYELLTHCDSSQSPPTKFLELIGIAQGQMTVLRLTELLERGERLPRPDKCPAEVYHLMKNCWETEASFRPTFENLIPILKTVHEKYQGQAPS
Please note that it may make sense to modify a sequence retrieved from PDB or Expasy to focus on the relevant part of the protein structure. Here, for example, the sequence contains “MAHHHHHHH” at the start.
The query "HHHHHHH" refers to a poly-histidine tag, a common tool in biochemistry to purify proteins. In protein structure, this sequence is not a standard structural element like an alpha-helix or beta-sheet, but rather a short artificial sequence of histidine amino acids attached to a protein to aid in its isolation and purification.
We have now created the minimum requirements to run the co-folding model within the safety and convenience of CDD Vault.
In order to run the co-folding model, users may choose to run a single compound or specify a set of compounds to run in bulk. To run a co-folding simulation for one compound, create a run within the co-folding protocol and then select “add a readout” from the All Data tab of that run.
Next, fill out the relevant information including the molecule name, batch number, and the trigger prior to clicking “add this readout”.
The job has now been submitted and the job status will periodically update to alert users as to what stage the model currently is including when the job started, its current status, and when it is completed.
While it is important to consider the compute resources to run large data sets in co-folding models, users may also start co-folding jobs through the bulk importer found in the “Import Data” tab of CDD Vault. The typical data format to trigger and run co-folding simulations is quite simple and is exemplified below:
Viewing and interpreting docking data in CDD Vault:
Co-folding results may be viewed and searched across on the explore data tab just like any other data type stored in Vault.
When a Boltz2 co-folding protocol has been executed, the following parameters will be automatically output:
- Job Id
- Job Status
- Job Errors
- Started At
- Finished At
- Updated At
- Affinity Prediction Value
- Affinity Probability Binary
- Affinity Prediction Value 1
- Affinity Probability Binary 1
- Affinity Prediction Value 2
- Affinity Probability Binary 2
- Confidence Score
- PTM
- iPTM
- Ligand iPTM
- Protein iPTM
- Complex pLDDT
- Complex ipLDDT
- Complex PDE (Å)
- Complex iPDE (Å)
How to Interpret the Boltz2 Outputs:
| Readout | Meaning | Unit | Good Value |
| Affinity Prediction Value | Encoded affinity level (pIC50 M) | M | Higher = stronger affinity |
| Affinity Probability Binary | Binary prediction of binding threshold | – | 1 = likely binder |
| Confidence Score | Weighted global+interface confidence | – | >0.7 good |
| PTM | Global topology accuracy | – | >0.7 good |
| iPTM | Interface accuracy | – | >0.7 good |
| Ligand iPTM | Protein–ligand interface accuracy | – | >0.7 good |
| Protein iPTM | Protein–protein interface accuracy | – | >0.7 good |
| Complex pLDDT | Local per-residue accuracy (whole complex) | – | >0.7 good |
| Complex ipLDDT | Local per-residue accuracy (interfaces) | – | >0.7 good |
| Complex PDE | Predicted distance error (all residues) | Å | Lower = better |
| Complex iPDE | Predicted distance error (interfaces) | Å | Lower = better |
Table derived from the GitHub official repository for the Boltz biomolecular interaction models
Users will notice that some readouts, such as affinity prediction values and affinity probability binaries, have “replicate” readouts (e.g., Affinity Prediction Value 1 and 2). To improve robustness and overall performance in Boltz2, two affinity models with distinct hyperparameters are utilized. The models differ in binder-to-decoy loss weighting (λfocal = 0.8 vs. 0.6), the number of transformer layers (4 vs. 8), and training duration (one is trained longer while the other is early-stopped). This diversity not only enhances predictive accuracy through ensembling but also serves an important role in downstream molecule generation. It is recommended that users gauge the affinity prediction and binding probability prediction using the average score of each (Passaro, S et al).
Target-Ligand Interaction Metrics
There are two main predictions in the affinity output: affinity prediction value and affinity probability binary. They are trained on largely different datasets, with different supervisions, and should be used in different contexts. The affinity probability binary field should be used to detect binders from decoys, for example in a hit-discovery stage. Its value ranges from 0 to 1 and represents the predicted probability that the ligand is a binder. The affinity prediction value aims to measure the specific affinity of different binders and how this changes with small modifications of the molecule. This should be used in ligand optimization stages such as hit-to-lead and lead-optimization. It reports a binding affinity value as pIC50, derived from an IC50 measured in M (Passaro, S et al).
The confidence score is comprised of the predicted local difference distance test and the interface predicted template modelling score. This score provides insight into the overall confidence of the co-folded complex.
- Affinity Prediction Value: Average predicted affinity as pIC50 in M across both replicates
- Affinity Probability Binary: Average probability that a ligand is a binder to its protein target (0 = weak/non-binder, 1 = binder) across both replicates
- Affinity Prediction Value 1: predicted affinity as pIC50 in M in model 1
- Affinity Probability Binary 1: predicted binding probability binary in model 1
- Affinity Prediction Value 2: predicted affinity as pIC50 in M in model 2
- Affinity Probability Binary 2: predicted binding probability binary in model 2
Target Folding Metrics
Given that Boltz2 is a co-folding model, the parameters representing the predicted structure of the target are equally important to the target-ligand interaction parameters.
- Confidence Score= A composite score to rank predictions, balancing global structural accuracy (pLDDT) and interface accuracy (ipTM) calculated as (0.8 * pLDDT) + (0.2 * iPTM).
- pTM- Predicted Template Modeling score (0-1): A predicted measure of how well the overall predicted structure matches the “true” (unknown) global fold. Small prediction errors at the local scale tend not to have an impact on this global pTM score. This score is used to assess the global topology of the predicted structure (whether the fold is right).
- iPTM- Interface Predicted Template Modelling score (0-1): Interface refers to the interface between the protein and ligand. This score measures how well the overall predicted structure matches the “true” (unknown) global fold (just like the pTM), but is restricted to interfaces between the protein and ligand. This score is crucial in complex prediction, because the global fold might look good but the interfaces may be wrong. Values closer to 1 represent high confidence predictions.
- Ligand iPTM (0-1): Only computed for protein–ligand contacts.
- Protein iPTM (0-1): Only computed for protein–protein contacts.
- Complex pLDDT (0-1)- Complex Predicted Local Distance Difference Test (0-1): A metric used to estimate local confidence, at the level of individual amino acid residues, in a predicted protein structure. It is scaled from 0-1, with higher scores indicating higher confidence.
- Complex ipLDDT- Complex Interface Predicted Local Distance Difference Test (0-1): An interface (protein-ligand) specific pLDDT representing residues at the binding interface
- Complex PDE (Å)- Derived from AlphaFold’s predicted aligned error (PAE). It reflects the expected alignment error between residue pairs. Lower values = better. Unlike pLDDT/pTM (where higher is better), here small PDE means high accuracy. This score is the average PAE across all residue pairs in the complex.
- Complex iPDE (Å)- complex interface predicted distance error: Average PAE across all residues at the protein-ligand interface.
CDD Vault’s PDB Viewer:
CDD Vault has a native PDB viewer that can be accessed for co-folding, molecular docking, protein folding, or viewing experimentally resolved crystal structures. The right-hand menu allows users to quickly toggle on the active-site specific view as well as turn on protein residue labels to quickly view where key interactions may be taking place.
Additionally, three different protein surfaces are available:
- Solvent accessible surface (SAS): Created by rolling a sphere of roughly the size of water across the protein surface. This is useful because it shows where ligand atoms should be located to have good van-der-Waals interactions. This is useful for identifying additional pockets that could be filled by the ligand.
- Solvent excluded surface (SES)
- Van der Waals surface
The option “one sided surface” only shows the surface from the outside. This sometimes helps to visualize deep pockets.
Users will also have the option to toggle on the van der Waals surface for the ligand.
Please note, whichever view selections you make in the PDB viewer will be retained when you exit back to the search results page as shown below: