This article covers the details of annotating and importing antibodies in CDD Vault. Particular aspects of the general import process may be given or already known, but you can find detailed instructions about it here: Importing the first compound library

CDD Vault enables you to register and annotate antibodies with their complete chemical structure, creating an exact record of the molecule, as well as a visual representation of its region and disulfide bridges.

In this article we focus on how to import antibodies into CDD Vault and how their representation changes with different levels of annotation and imported information.

**Figure 1**. Different representations of an antibody are available within CDD Vault. Without any annotation, the antibody is shown with an approximate Y shape (*left*), while annotating domains, hinge region, and disulfide bridges, leads to an annotated representation for monospecific (*center*) and bispecific (*right*) antibodies

Step 1: formatting the antibody sequences

To register antibodies in your Vault, first you will need to prepare your sequences in a tabular format (e.g. .csv or .xlsx). Currently it is not possible to create a new antibody structure representation using the Macromolecule editor.

Depending on your desired level of detail (see Fig. 1) you will need to provide varying levels of annotation. We will cover what is needed for each level of representation below.

The minimum necessary level of information is to define one column for the heavy chain and one for the light chain, like in the example below:

Name	Heavy	Light
Trastuzumab	EVKLQE…KSFSR	DIVL…NRNEC

The 'name' column, while not essential, serves as a synonym that can facilitate searching for the antibody within CDD Vault after upload.

This is sufficient for obtaining a proto-y shaped representation and it describes a monospecific antibody as depicted in Fig. 2.

**Figure 2**. Antibody representation in absence of annotations for domains, hinge region, and disulfide bridges

If you are working with a bispecific antibody, you will need to add the information for both heavy and light chains separately, like in the example below:

Name	L1	H1	H2	L2
Faricimab	QIV…KSFNRGEC	QIQ…KSLSLSPGK	QVQL…GGTKLTVL	QIV…KSFNRGEC

Also in this case, the representation will be a proto-y shape and will describe the bispecific nature of the antibody as depicted in Fig. 3.

**Figure 3**. Bispecific antibody representation in absence of annotations for domains, hinge region, and disulfide bridges

The next step is to annotate information about domains, hinge regions, and disulfide bridges to achieve a more detailed representation of the antibody.

Note: The colors represented in the antibody chains are indicative of the amino acids present can be used for quick visual comparisons.

Metadata annotation

To annotate the antibody you can include information on the regions within the sequence, for each chain.

Domain region annotation

In order to annotate the antibody chains, you need to specify the starting and ending amino acids for each domain within brackets, as you can see in this example below:

Heavy → …LSPGK{VH:1-120}{CH1:121-218}{CH2:234-343}{CH3:344-450}

Light → …RGEC{VL:1-107}{CL:108-214}

The VH and CH tag stands for “Variable Heavy’” chain and “Constant Heavy” chain. The presence of numbers (CH1, CH2, CH3) allows us to differentiate the different constant regions. Similarly these tags are VL for the light chain. You may use different tag annotations if desired.
The numbers (e.g. 1-120) define the starting and ending amino acid position for each region.

Hinge region annotation

In order to annotate the hinge region a tag needs to be added to the heavy chain, as in the example below:

Heavy → …LSPGK{VH:1-120}{CH1:121-218}{CH2:234-343}{CH3:344-450}{H:219-233}

The H tag identifies the hinge region and the numbers 219-233 define the boundaries of the hinge region in the Heavy Chain sequence.

Disulfide bridge annotation

To add disulfide bridges to an antibody, the cysteines involved in the bridges will be marked. This annotation, like others, should be enclosed in brackets and present in both the light and heavy chains, as demonstrated in the example provided below:

Heavy → …{CH3:344-450}{H:219-233}{@223:1}{@229:2}{@232:3}

Light → …RGEC{VL:1-107}{CL:108-214}{@214:1}

The @ symbol identifies the annotation for the disulfide bridge
The first number defines the amino acid position involved in the disulfide bridge for the respective chain
The second number defines the identifier of the disulfide bridge. Each identifier on one chain must correspond to an identical identifier on another chain. In this example, the disulfide bridge is between heavy chain residue 223 and the light chain residue 214. For the remaining disulfide bridges:
- If the antibody is monospecific then it is implied that the remaining bridges will be between the identical heavy chains
- If the antibody is bispecific, each remaining tag’s identifier on one heavy chain must correspond an identical identifier on the other heavy chain
Intra-chain disulfide bridges do not need to be explicitly defined, but if desired they can be depicted in the same way of inter-chain bridges

The resulting table with all the described annotations will then look like this:

Name	Heavy	Light
Trastuzumab	EVQLVES…LSPGK{VH:1-120}{CH1:121-218}{CH2:234-343}{CH3:344-450}{H:219-233}{@223:1}{@229:2}{@232:3}	DIQM…FNRGEC{VL:1-107}{CL:108-214}{@214:1}

Once all information about regions and bridges are annotated, the imported structure will be represented as shown in Fig. 4.

**Figure 4**. Antibody representation with domains, hinge region, and disulfide bridges

The import process of the structure of the antibody in CDD Vault is described in the next chapter.

Note: The system will give you an error if your disulfide bond amino acid position is NOT a cysteine.

Step 2: importing the antibody in CDD Vault

Once a data file has been created, the next step is to access the Import Data panel in your Vault and select your file. For a detailed guide on the import process, please refer to Importing the first compound library.

Proceed by selecting the “Compose macromolecules from columns” option from the drop-down menu, and click on “Update”. In the following step, select the “Antibody/ADC” option from the radio button.

If you have a sequence with annotated regions, your screen will look something like this:

Proceed by mapping the Light and Heavy chain fields to the corresponding column in your datasheet. If you are working with an asymmetric antibody, you can mark 3 chains (H1,H2, Light or Heavy, L1, L2) or 4 chains (H1,H2,L1,L2) In this case, your screen will look something like this:

Please note: at this step you do not need to map the Name column.

Once you have mapped all fields to the corresponding column, click on “Process File”.

Once the file is processed, a new column is added to the table in the Import flow, with the header “$MIXFILE_JSON”. This should be automatically mapped to the Antibody field of CDD Vault as in the screenshot below:

You can now map additional fields, such as “Name” to Synonym, or other columns that you may have added in your datasheet. Once all fields are mapped, you can commit the resulting entity to your vault and proceed to use it as your other molecules within the Vault.

Please note: while you do not need to import the column sequence chains again since their information is embedded in the “$MIXFILE_JSON” column, you can decide to import them in user-defined fields at the Molecule level to later search for them in CDD Vault.