This article covers the details of annotating and importing antibodies in CDD Vault. Particular aspects of the general import process may be given or already known, but you can find detailed instructions about it here: Importing the first compound library
CDD Vault enables you to register and annotate antibodies with their complete chemical structure, creating an exact record of the molecule, as well as a visual representation of its region and disulfide bridges.
In this article we focus on how to import antibodies into CDD Vault and how their representation changes with different levels of annotation and imported information.
Step 1: formatting the antibody sequences
To register antibodies in your Vault, first you will need to prepare your sequences in a tabular format (e.g. .csv or .xlsx). Currently it is not possible to create a new antibody structure representation using the Macromolecule editor.
Depending on your desired level of detail (see Fig. 1) you will need to provide varying levels of annotation. We will cover what is needed for each level of representation below.
The minimum necessary level of information is to define one column for the heavy chain and one for the light chain, like in the example below:
| Name | Heavy | Light |
| Trastuzumab | EVKLQE…KSFSR | DIVL…NRNEC |
The 'name' column, while not essential, serves as a synonym that can facilitate searching for the antibody within CDD Vault after upload.
This is sufficient for obtaining a proto-y shaped representation and it describes a monospecific antibody as depicted in Fig. 2.
If you are working with a bispecific antibody, you will need to add the information for both heavy and light chains separately, like in the example below:
| Name | L1 | H1 | H2 | L2 |
| Faricimab | QIV…KSFNRGEC | QIQ…KSLSLSPGK | QVQL…GGTKLTVL | QIV…KSFNRGEC |
Also in this case, the representation will be a proto-y shape and will describe the bispecific nature of the antibody as depicted in Fig. 3.
The next step is to annotate information about domains, hinge regions, and disulfide bridges to achieve a more detailed representation of the antibody.
Note: The colors represented in the antibody chains are indicative of the amino acids present can be used for quick visual comparisons.
Metadata annotation
To annotate the antibody you can include information on the regions within the sequence, for each chain.
Domain region annotation
In order to annotate the antibody chains, you need to specify the starting and ending amino acids for each domain within brackets, as you can see in this example below:
Heavy → …LSPGK{VH:1-120}{CH1:121-218}{CH2:234-343}{CH3:344-450}
Light → …RGEC{VL:1-107}{CL:108-214}
- The VH and CH tag stands for “Variable Heavy’” chain and “Constant Heavy” chain. The presence of numbers (CH1, CH2, CH3) allows us to differentiate the different constant regions. Similarly these tags are VL for the light chain. You may use different tag annotations if desired.
- The numbers (e.g. 1-120) define the starting and ending amino acid position for each region.
Hinge region annotation
In order to annotate the hinge region a tag needs to be added to the heavy chain, as in the example below:
Heavy → …LSPGK{VH:1-120}{CH1:121-218}{CH2:234-343}{CH3:344-450}{H:219-233}
- The H tag identifies the hinge region and the numbers 219-233 define the boundaries of the hinge region in the Heavy Chain sequence.
Disulfide bridge annotation
To add disulfide bridges to an antibody, the cysteines involved in the bridges will be marked. This annotation, like others, should be enclosed in brackets and present in both the light and heavy chains, as demonstrated in the example provided below:
Heavy → …{CH3:344-450}{H:219-233}{@223:1}{@229:2}{@232:3}
Light → …RGEC{VL:1-107}{CL:108-214}{@214:1}
- The @ symbol identifies the annotation for the disulfide bridge
- The first number defines the amino acid position involved in the disulfide bridge for the respective chain
- The second number defines the identifier of the disulfide bridge. Each identifier on one chain must correspond to an identical identifier on another chain. In this example, the disulfide bridge is between heavy chain residue 223 and the light chain residue 214. For the remaining disulfide bridges:
- If the antibody is monospecific then it is implied that the remaining bridges will be between the identical heavy chains
- If the antibody is bispecific, each remaining tag’s identifier on one heavy chain must correspond an identical identifier on the other heavy chain
- Intra-chain disulfide bridges do not need to be explicitly defined, but if desired they can be depicted in the same way of inter-chain bridges
The resulting table with all the described annotations will then look like this:
| Name | Heavy | Light |
| Trastuzumab | EVQLVES…LSPGK{VH:1-120}{CH1:121-218}{CH2:234-343}{CH3:344-450}{H:219-233}{@223:1}{@229:2}{@232:3} | DIQM…FNRGEC{VL:1-107}{CL:108-214}{@214:1} |
Once all information about regions and bridges are annotated, the imported structure will be represented as shown in Fig. 4.
The import process of the structure of the antibody in CDD Vault is described in the next chapter.
Note: The system will give you an error if your disulfide bond amino acid position is NOT a cysteine.
Step 2: importing the antibody in CDD Vault
Once a data file has been created, the next step is to access the Import Data panel in your Vault and select your file. For a detailed guide on the import process, please refer to Importing the first compound library.
Proceed by selecting the “Compose macromolecules from columns” option from the drop-down menu, and click on “Update”. In the following step, select the “Antibody/ADC” option from the radio button.
If you have a sequence with annotated regions, your screen will look something like this:
Proceed by mapping the Light and Heavy chain fields to the corresponding column in your datasheet. If you are working with an asymmetric antibody, you can mark 3 chains (H1,H2, Light or Heavy, L1, L2) or 4 chains (H1,H2,L1,L2) In this case, your screen will look something like this:
Please note: at this step you do not need to map the Name column.
Once you have mapped all fields to the corresponding column, click on “Process File”.
Once the file is processed, a new column is added to the table in the Import flow, with the header “$MIXFILE_JSON”. This should be automatically mapped to the Antibody field of CDD Vault as in the screenshot below:
You can now map additional fields, such as “Name” to Synonym, or other columns that you may have added in your datasheet. Once all fields are mapped, you can commit the resulting entity to your vault and proceed to use it as your other molecules within the Vault.
Please note: while you do not need to import the column sequence chains again since their information is embedded in the “$MIXFILE_JSON” column, you can decide to import them in user-defined fields at the Molecule level to later search for them in CDD Vault.