Information submitted through the support site is private but is not hosted within your secure CDD Vault. Please do not include sensitive intellectual property in your support requests.

AI+ Folding: AlphaFold2 and ESMFold

To enable AI+ in your vault, please contact support@collaborativedrug.com. Once enabled, please note that the number of allowed Folding and Docking jobs are set on a per-account basis. Please contact support@collaborativedrug.com if you have questions regarding the number of allowed jobs for your account.

 

CDD Vault offers multiple options for users to predict protein structures including AlphaFold2 and ESMFold. AlphaFold2 is a deep learning model from DeepMind that predicts a protein’s 3D structure from its amino acid sequence with a high accuracy by using sequence alignment to known structures to feed the model. ESMFold is a transformer-based model from Meta AI that rapidly predicts a protein’s 3D structure from a single sequence by leveraging a large protein language model. It offers a faster way to obtain protein structures from sequence data.

Implementing a folding protocol in CDD Vault:

Creating a folding protocol in CDD Vault will be quite familiar to users given that these AI+ models rely on the same infrastructure as traditional experimental protocols. For a refresher on creating a protocol, please refer to this article before proceeding. 

In the following example, we will work through creating a folding protocol for the 105 amino acid Cytochrome C (P99999) protein. The requirements for this type of protocol will be a folding and folding trigger readout definition. The folding readout definition will of course contain the structural prediction computational model whereas the folding trigger is a safeguard against accidental folding predictions for a large set of sequences requiring significant computational power.

First, users must create the folding trigger readout definition. In this example, we have chosen a “pick list” data type with an allowed value of “Yes” for both the AlphaFold2 and ESMFold models. Please note that whenever the value of this field changes, the folding protocol will trigger. For example, if a folding trigger is a pick list with values of “Yes” or “No”, the protocol would execute regardless of which value is picked or if it is changed from “Yes” to “No”. If users intend to retrigger this protocol after the initial run, it is recommended to have a picklist with two values or a free text/ numeric field.

Next, users will create the folding readout definition which will contain the specified model. Users will choose “Folding” for the data type as well as define the name of the readout definition, the specific model, and specify the previously created folding trigger. Please note that it will not be possible to create the folding readout definition without previously creating the folding trigger.

We have now created the minimum requirements to run folding modelling within the safety and convenience of CDD Vault.

In order to run the folding model, users may choose to run a single registered peptide sequence or specify a series of registered peptide sequences to run in bulk. To run one folding simulation for one registered peptide sequence, create a run within the folding protocol and then select “add a readout” from the All Data tab of that run.

Next, fill out the relevant information including the molecule/ entity name, batch number, and the folding trigger(s) prior to clicking “add this readout”.

Once the readout is added, the job is submitted and the job status will periodically update to alert users as to what stage the model currently is including when the job started, its current status, and when it is completed etc. 

While it is important to consider the compute resources to run large data sets in folding models, users may also start folding jobs through the bulk importer found in the “Import Data” tab of CDD Vault. The typical data format to trigger and run folding simulations is quite simple and is exemplified below:

Viewing and interpreting folding data in CDD Vault:

Folding results may be viewed and searched across on the explore data tab just like any other data type stored in Vault. 

When a folding protocol has been executed, the following parameters will be output- most notably including the Folding PDB File and the Folding Score:

AlphaFold2

  • AlphaFold 2 Score
  • AlphaFold 2 PDB File
  • AlphaFold 2 JobID
  • AlphaFold 2 Job Status
  • AlphaFold 2 Job Errors
  • AlphaFold 2 Started At
  • AlphaFold 2 Finished At
  • AlphaFold 2 Updated At

ESMFold

  • ESMFold Score
  • ESMFold PDB File
  • ESMFold JobID
  • ESMFold Job Status
  • ESMFold Job Errors
  • ESMFold Started At
  • ESMFold Finished At
  • ESMFold Updated At

CDD Vault’s PDB Viewer:

CDD Vault has a native PDB viewer that can be accessed for molecular docking, protein folding, or viewing experimentally resolved crystal structures. The right-hand menu allows users to quickly toggle on the active-site specific view as well as turn on protein residue labels to quickly view where key interactions may be taking place.

Additionally, five options for viewing the protein surface are available including:

  • Solvent accessible
  • Solvent excluded
  • Van der Waals
  • On sided surface

Users will also have the option to toggle on the van der Waals surface for the ligand.

Please note, whichever view selections you make in the PDB viewer will be retained when you exit back to the search results page.

How to Interpret the Folding Prediction Scores:

AlphaFold2

Predicted Template Modelling and Integrated Predicted Template Modelling Scores:

  • Predicted template modelling score (pTM): an integrated measure of how well AlphaFold-Multimer has predicted the structure of a complex. It is the predicted template modelling (TM) score for a superposition between the predicted structure and the hypothetical true structure. pTM scores vary between 0 and 1: a score above 0.5 means the overall predicted fold for the complex will be similar to the true structure.
  • Interface predicted template modelling (ipTM): a measure of the accuracy of the predicted structure of a multimer, used by AlphaFold-Multimer. It measures the accuracy of the predicted interface between the subunits of the protein-protein complex.
  • Predicted local distance difference test (pLDDT): a metric used by AlphaFold to estimate local confidence, at the level of individual amino acid residues, in a predicted protein structure. It is scaled from 0 to 100, with higher scores indicating higher confidence.

The single value output by AlphaFold2 in CDD Vault will either reflect the pLDDTS or a weighted sum of the ipTM+pTM if pLDDTS is not present for AlphaFold2.


 

ESMFold

  • Predicted local distance difference test (pLDDT): a metric used by ESMFold to estimate local confidence, at the level of individual amino acid residues, in a predicted protein structure. It is scaled from 0 to 100, with higher scores indicating higher confidence.

For further information regarding the theory and methodology behind the previously described protein folding models, we welcome you to view the original and official publications listed below.

Alphafold2

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

ESMFold:

Zeming Lin et al., Evolutionary-scale prediction of atomic-level protein structure with a language model.Science379,1123-1130(2023).DOI:10.1126/science.ade2574