How to Create Custom Monomer Library and Bulk Register the Peptides
As modern peptide research continues to expand into non-canonical and chemically modified amino acids, tools like CDD Vault is now supporting the registration and management of these custom macromolecules. This guide provides a streamlined workflow for registering peptides that include unnatural amino acids by leveraging custom monomer libraries within CDD Vault.
Overview
CDD Vault now allows users to register macromolecules containing custom monomer codes, streamlining the handling of modified peptides. While future updates will integrate monomer libraries directly into the platform, the current method involves using Structure Data Files (SDFs) for custom monomer import during sequence registration.
1. Creating a Custom Monomer Library
Before registering your peptides, first define and structure your unnatural amino acids as monomers in a valid SDF format.
Steps:
- Use the chemical drawing tool to define and register each custom monomer (e.g., Acetyl, Amide, Cy5, TAMRA, DOTA, D-Cha, NMeG etc).
- Represent monomer attachment points:
- Use an “R1” atom to replace the C-terminal OH.
- Use another “R2” atom to replace one N-terminal hydrogen.
- One Molecule level meta data field named “Code” is required. This is the exact monomer code (e.g., Cy5, TAMRA, DOTA, D-Cha, NMeG) used for peptide sequences in import file.
Note: If a single monomer corresponds to multiple codes, include separate entries with identical structures but different Code values.
2. Exporting the Custom Monomer Library (SDF)
Once you’ve drawn and annotated your custom monomers:
- Search the codes from keyword search area.
- Via “Customize your report” include following columns:
- The structure
- A molfile
- The metadata field “Code”
- Export this table as SDF
This SDF will be used during peptide sequence import or stored permanently in a dedicated Vault project (e.g., “Monomers”).
3. Preparing the Bulk Import File
Prepare an Excel or CSV file with your peptide sequences. For peptides containing unnatural monomers:
- Wrap custom codes in square brackets, e.g., [Cy5].
- Use the standard single-letter code (e.g., A, G, L) for natural amino acids, and use "X" to represent unnatural amino acids.
Example Format:
Sequence | Synonym |
---|---|
[Cy5]ERLRKKLQDVHNF[TAMRA]A | Sequence 1 |
[DOTA]SVSEIQLMHQDVHNFL | Sequence 2 |
Additional metadata columns related to molecule or batch level data can be included in this file.
4. Uploading and Composing Macromolecules in CDD Vault
Once the Excel file is ready:
- Navigate to the “Import Data” section in your CDD Vault.
- Upload the Excel file and choose “Compose macromolecules from columns.”
- Select “Linear Peptides” or “Cyclic Peptides” depending on your molecules.
- Initially, unrecognized custom monomers will cause token errors message “Token parsing error: Unknown monomer code [Cy5] at position 1”
- Now drag and drop the SDF file into the right side of the blue box.
- The system will extract and apply these monomers to your current import session.
5. Reviewing and Registering Peptides
After uploading your file and resolving all monomers:
- Preview the rendered macromolecules, which will be displayed in V3000 molfile format.
- Adjust the chain display to show the desired number of monomers per row for a clear visualization of the full sequence.
- Ensure that the Molfile column (typically Column A) is correctly mapped to the molecule structure field, and that all other columns are appropriately mapped to their corresponding metadata fields.
- Once confirmed, click “Process File” to complete the registration.
Future Improvements
Soon, same monomer libraries will be natively stored within CDD Vault, eliminating the need to attach SDFs during each session. Until then, drag and drop SDF ensures reusability and consistency.
Conclusion
The ability to register peptides with custom amino acid monomers greatly enhances the flexibility of CDD Vault for peptide researchers. By carefully defining monomers, formatting sequences correctly, and leveraging the bulk import tools, you can manage even the most complex macromolecules and its data with confidence.