This article is focused on various scenarios of molecule import and registration. We assume here that you are already familiar with the basic steps of file import, and thus make an emphasis on field mapping (import step 2), in different types of molecule registration.
If you have never registered molecules via file import before, go through this tutorial first.
Once the import fields are mapped as described in this article, they go through structure, salt, and solvent validation (import step 3), which is mentioned here, but described in detail following this link.
Overview of molecule registration
Salts and/or solvent of crystallization are included in the structure
Salt/solvent of crystallization data is in a separate field
Without Structure
Compounds containing multiple fragments (mixture)
Registering true Mixtures in bulk
Pure enantiomers of unknown absolute configuration
Overview of molecule registration
You will start by importing according to one of the methods described here - whichever one suits the majority of molecules in the file. If, during the next import step, any records in the file generate errors or suspicious events because they fall under a different registration method, don't panic. Download the error reports, and import separately after correcting the file and choosing another import method.
Import step 1: Choose your import file (CSV or SDF), and choose a project. This step is the same for all types of imports.
Import step 2: Import type and field mapping
- Choose the type of import you are performing. Since we are registering new molecules or batches, we will choose "Register new molecules/batches" option. If you are registering molecules without structure, choose that option from the drop-down.
- Map the data fields (SDF file) or columns (CSV file) that you wish to import. Structure related field mapping is different between the 4 import types, while all other mapping requirements are the same. For example, if your vault has a set of required batch fields, you will need to map all of them before proceeding with the import. Find out more about batch fields.
Import step 3: Validation and commit.
This is also a two-part step. You will have a chance to review an import validation report (QC report), where you can decide to accept or reject any import events based on their type. At this stage, no data has been inserted into your Vault yet, making it easy to prevent mistakes by careful analysis of the report. Once you have decided to commit the entire file contents (minus the errors) to the Vault, the second part of this step starts, when data are inserted into your Vault. This process may take from seconds to hours depending on the amount of data being added. Learning how to interpret the import validation report is crucial to avoid mistakes in your Vault. Here is the link to common errors, suspicious event and noteworthy event descriptions one more time.
Salts and/or solvent of crystallization are included in the structure
This is the most common import scenario. In this case, the provided MOL or SMILES structure includes all the salt and solvent information, as well as the stoichiometry, so that the right number of core structure, salt molecules, and solvents of crystallization are represented.
Map the structure field to "Molecule Structure". CDD will automatically parse out the salt and solvent data, and preserve it in the batch-level Salt field. There can only be one type of salt, and one type of solvent. If there are more than one of each, the structure will be rejected on import, and can be registered as a mixture instead.
CDD Field Mapping |
Data Description |
---|---|
Molecule Fields - Structure |
salt information is part of the SMILES or MOL structure. Core.Salt.Solvent: CNCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O.Cl.O.O |
Batch Fields - Salt |
do not import |
Salt/solvent data is in a separate field:
Salt and solvent of crystallization data are not part of the structure, but represented as some alphanumeric string in a separate field.
Some vendor library files are formatted this way. CDD Vault supports a large set of salt 'nicknames', as well as a set of two letter codes for a large set of salts, acids, and solvents. CDD Vault also supports a limited list of solvent of crystallization structures. Here's the full table of supported fragments.
Stoichiometry and solvents may be included in the textual representation, as long as they are formatted in the following way:
- The salt must come first, the solvent second, delimited by a forward slash " / "
- There can only be one type of salt, and one type of solvent. If there are more than one of each, the structure will be rejected on import, and can be registered as a mixture instead.
- The salt vendor string ("HCl" above) is case-sensitive and is not normalized, which means it must be written exactly as shown including all cases, white space, and numbers.
- Any stoichiometric numbers you like can be used, so long as they resolve to a simple stoichiometric ratio of cores, salts and solvents (e.g "0.07 HCl" does not).
- There must be at least one space between the number and the salt name or solvent, but extra spaces that do no occur within the salt name or solvent name are fine.
- Here are some examples to make all of this more clear:
- 2 HCl
- HCl/Acetone
- 3 HCl/2 n-Propanol
- 0.33 AQ/0.5 Methylene Chloride
Map the field/column containing the salt name to "Batch Fields- Salt". Map MOL or SMILES to "Molecule Structure".
CDD Field Mapping |
Data Description |
---|---|
Molecule Fields - Structure |
no salt information in SMILES or MOL structure. Core: CNCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O |
Batch Fields - Salt |
One Hydrochloride, and two molecules of water per one core structure: HCl/2 H2O |
Without Structure:
CDD allows molecule registration without structure and you may add a structure to your structureless record at a later time.
To register a new structureless compounds, make sure to choose the "without structure" option in the Step 2 above. Since there is no structure to validate whether the molecule already exists in the database, a new molecule record will be created any time a structureless registration is selected.
You may wish to update an existing structureless molecule by adding new batches or molecule/batch information. In this case, you will need to identify the existing molecule in your import file by providing the Molecule ID and mapping it as "Molecule Name or Synonym". This field should contain only your CDD-generated Molecule Name, or a Molecule Synonym. Make sure to check the validation report in the last import step to see that a new batch is added correctly - look for "Noteworthy events", "New Batch" events.
Note: If a molecule ID is provided, but it is not found amongst existing molecules, a new molecule will be created.
CDD Field Mapping |
Data Description |
---|---|
Molecule Fields - Structure |
do not import |
Molecule Fields - Molecule Name or Synonym |
Optional. Map only when registering batches of existing structureless molecules. |
Batch Fields |
Map any of your vault's required batch fields. This will vary between vaults. |
Compounds Containing Multiple Fragments (Mixture):
When a molecule is actually a mixture of fragments with none of the parts being salts or solvents that should be stripped, CDD may reject such records as "Invalid structure" errors. However, there is a way of registering these as a single Molecule by using the method described for salt/solvent data in a separate field. By inserting a text-based "Salt" field and mapping it to Batch Fields -> Salt, you bypass CDD's structure inspection, and register the entire structure as a mixture:
- Download and review "invalid structure" error file to make sure that this is, in fact, a mixture.
- Insert a field/column into the error file called "Salt".
- Enter value AA into the salt field. AA is the CDD code for "No Salt, free base or acid"
- If you're not clear on the file formatting, download a template example here.
- Import using the same mapping as above.
CDD Field Mapping |
Data Description |
---|---|
Molecule Fields - Structure |
mixture of two or more core structures. Core1.Core2 CNCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O.NCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O |
Batch Fields - Salt |
No salt, free base or acid salt form, to indicate that this isn't a salt AA |
Registering True Mixtures in Bulk:
CDD Vault supports the registration of a true Mixture if your Vault Administrator has allowed Mixtures as an Entity Type for your Vault. To register multiple mixtures (in bulk), use the Import Data tab to import and render mixtures from csv, xlsx or sdfile files. These files should contain separate columns that describe each component of the mixture along with columns containing data for any Molecule-level, Batch-level, or Protocol readout data to be included with the mixture import.
An example import file might resemble this:
This example file will create 2 new mixtures. The first mixture will contain 2 components, ethanol (95%) and water. The 2nd mixture will contain sulfuric acid (64%) and water. Columns A, B and C will annotate the 1st component of each new mixture while columns D, E and F will annotate the 2nd component of each new mixture. The initial steps of importing a data file remain the same:
- Choose the import data file,
- Select the Project, and
- Click the Upload File button to transfer the data file to the CDD Vault server.
To map the mixture-specific columns of the data file into your Vault, select the “Compose mixtures from columns” import option to have the Mixture Editor parse your new mixture(s) based on columns in your import file (in the above example, columns A through F).
In the above animation, two mixtures are created, each containing 2 components. Component 1 in each mixture was interpreted using the “Chemical structure [SMILES]” option. Component 2 in each mixture was interpreted using the “Molecule lookup by name” option, which will also insert the matching structure. The fully composed import report shows how the components will be mapped into each mixture once registration is complete.
Once you process the mixture composition, you have a chance to view how the mixtures were interpreted with the option to re-map your columns or override individual mixtures. Once each mixture has been reviewed, click to continue and a mixfile column is automatically mapped to the mixture field. You now have the option to map your remaining Molecule and Batch fields and Protocol readout definitions.
Pure enantiomers of unknown absolute configuration
You can register pairs of pure enantiomers with unknown absolute configuration by using the enhanced “OR” label. When you import a file with compounds with enhanced “OR” label, you will be presented with an Ambiguous Event whenever duplicate enantiomers are found. By default, new batches will be created for all ambiguous events. You can change this to either New molecules or Reject by clicking the View Details link.
If you have several ambiguous events, you can use the New molecule for unreviewed events or Reject unreviewed events links to set the remaining events to either new molecule or reject in bulk.
Note: Based on how the enhanced stereochemical OR label is used to represent your stereoisomers, CDD Vault will determine how many, total, unique enantiomers are allowed to be registered as separate, unique Molecules.