As the oligonucleotide space continues to expand with non-natural backbones, sugars, and base modifications, the ability to register custom oligos at scale is essential. CDD Vault supports bulk registration of modified nucleic acid sequences by allowing users to import custom monomer libraries via SDF files. This guide walks you through the complete process from building a monomer library to registering oligos with complex chemical modifications.
Overview
CDD Vault now enables users to register oligonucleotides such as DNA, RNA, and synthetic analogs (e.g., siRNA, ASO, aptamers) containing custom nucleotide monomers. While a built-in monomer library system is planned for future releases, the current method utilizes Structure Data Files (SDF) to resolve custom monomers or non-natural nucleotide codes during sequence import and registration. It will generate V3000 molfile representations automatically and ensures accurate molecular weights by auto-assigning terminal groups.
1. Creating a Custom Monomer Library for Oligos
Before registering oligos, you must define any unnatural bases, sugars, or modifications as monomers in a valid SDF format.
Steps to Create Custom Monomers:
- Use a chemical drawing tool to define each custom nucleotide or modification (e.g., phosphorothioate linkages, 2'-O-methyl groups, locked nucleic acids (LNA), base analogs like 5-MeC, etc.).
-
Use R1 and R2 attachment points to define how the monomer connects to the oligo chain:
- R1: typically connects to the 5' end (phosphate backbone)
- R2: connects to the 3' end
- In the molecule-level metadata, include a field named “Code”:
- This is the shorthand identifier (e.g., [5MeC], [PS], [LNA-A]) used in the oligo sequence import file.
Tip: If a single structure is reused under different codes (e.g., for position-specific usage), duplicate the structure in the SDF with different Code values.
2. Exporting the Custom Monomer Library (SDF)
Once you’ve drawn and annotated your custom oligo monomers:
- Use the search bar in your Vault project to filter by the Code field.
- Click “Customize your report” and include:
- The structure
- Molfile
- Metadata field “Code”
- Export the table as an SDF file.
Tip: Consider storing this monomer SDF permanently in a dedicated project (e.g., "Monomers") for easy reuse.
3. Preparing the Bulk Import File for Oligos
Create an Excel or CSV file containing the oligo sequences and any associated metadata.
Formatting Guidelines:
- Use single-letter IUPAC codes (A, T, G, C, U) for standard nucleotides.
- For custom monomers, wrap the code in square brackets (e.g., [5MeC], [LNA-A], [PS]).
Example Table:
Sequence | Name | Concentration | Units |
---|---|---|---|
A[5MeC]G[T-PS]AG | Oligo 1 | 100 | µM |
[2'-LNA]TGC[2'-MOE-G]T[T-PS]C | Oligo 2 | 50 | µM |
You may also include additional columns for batch-level or molecule-level metadata (e.g., vendor, synthesis method, storage location).
4. Uploading and Composing Oligos in CDD Vault
Once your import file is ready:
- Navigate to the Import Data section in your Vault.
- Upload the file and select "Compose macromolecules from columns."
- Choose your desired options:
- Backbone Type: RNA or DNA
- Strands: Single or Double
- (Optional) Enable the checkbox to cleave the terminal 5′ phosphate from all oligos.
-
If any custom monomer codes are not recognized, CDD Vault will show a token error:
Token parsing error: Unknown monomer code [5MeC] at position 2
- To resolve:
- Drag and drop your SDF file with defined monomers into the SDF drop area.
- CDD Vault will match the codes and validate the full oligo sequences.
5. Reviewing and Registering Custom Oligos
- Preview each oligonucleotide structure in V3000 molfile format.
- Click “Process File” to register the sequences into CDD Vault.
- The molfile column will be mapped to the molecule structure field. And all metadata fields will be mapped appropriately.
Nucleic Acid Special Considerations:
When registering custom oligonucleotides in CDD Vault, accurate handling of directionality, terminal modifications, and backbone attachments is essential for correct molecular representation and molecular weight calculation. Below are key considerations to ensure your custom monomer designs work seamlessly in the importer.
1. R1/R2 Phosphate Group Orientation
As mentioned above R1 should correspond to the 5′ end of the nucleotide and R2 should correspond to the 3′ end. However for special cases proper orientation of R1 and R2 atoms in the custom monomers defines how sequences are connected and interpreted. This flexibility allows to model monomers with phosphodiester linkages and phosphorothioate backbones with accurate end-group chemistry.
Although we can place the phosphate group on either the R1 (5′) or R2 (3′) side of the monomer structure. But importer will automatically adjust the terminal atoms:
- Adds OH to the 5′ terminal monomer.
- Adds H to the 3′ terminal monomer.
Note: Directionality is determined solely by correct R1/R2 assignments, not the placement of the phosphate group.
2. Attaching Terminal Chemical Conjugates (e.g., GalNAc)
To include terminal modifications like GalNAc, cholesterol, fluorophores, or linkers:
a. Register the Conjugate as a Monomer
- Draw the conjugate as a structure with one R-group connection point.
- Typically, this is R1 (representing the attachment to the oligo’s terminal end).
- It’s acceptable to label the R-group as R1 or R2. The importer will handle this correctly based on the sequence's orientation.
b. Mark as Terminal in the SDF
To inform the importer that this monomer is not part of the chain, but instead a terminal cap, add the following metadata field in your SDF:
Code | class |
---|---|
GalNAc | terminal |
This ensures that:
- The conjugate is placed at either the 5′ or 3′ end (depending on sequence position).
- It is not treated as a standard internal monomer.
- The mass and structure are computed appropriately.
Example
Sequence:[GalNAc]CUGAGC
→ GalNAc attached at the 5′ end
Sequence:CUGAGC[Cholesterol]
→ Cholesterol attached at the 3′ end
Conclusion
With support for custom monomer definitions and bulk sequence registration, CDD Vault empowers researchers to manage even the most complex oligonucleotide libraries. By defining each monomer clearly and formatting sequences correctly, you can ensure smooth registration and comprehensive data tracking for your oligo-based projects.