Information submitted through the support site is private but is not hosted within your secure CDD Vault. Please do not include sensitive intellectual property in your support requests.

Bulk registration of molecules from file

 

This article is focused on various scenarios of molecule import and registration. We assume here that you are already familiar with the basic steps of file import, and thus make an emphasis on field mapping (import step 2), in different types of molecule registration.

If you have never registered molecules via file import before, go through this tutorial first.

Once the import fields are mapped as described in this article, they go through structure, salt, and solvent validation (import step 3), which is mentioned here, but described in detail here.

 

Overview of molecule registration
Salts and/or solvent of crystallization are included in the structure
Without Structure
Salt/solvent of crystallization data is in a separate field
Compound (fragment) mixture

 

Overview of molecule registration

You will start by importing according to one of the methods described here- whichever one suits the majority of molecules in the file. If, during the next import step, any records in the file generate errors or suspicious events because they fall under a different registration method, don't panic. Download the error reports, and import separately after correcting the file and choosing another import method.

Import step 1: Choose your import file (CSV or SDF), and choose a project. This step is the same for all types of imports.

Import step 2: Import type and field mapping

  • Choose the type of import you are performing. Since we are registering new molecules or batches, we will choose "Register new molecules/batches" option. If you are registering molecules without structure, choose that option from the drop-down.

Screen_Shot_2013-04-09_at_3.10.48_PM.png

  • Map the data fields (SDF file) or columns (CSV file) that you wish to import. Structure related field mapping is different between the 4 import types, while all other mapping requirements are the same. For example, if your vault has a set of required batch fields, you will need to map all of them before proceeding with the import. Find out more about batch fields.

Import step 3: Validation and commit.

This is also a two-part step. You will have a chance to review an import validation report (QC report), where you can decide to accept or reject any import events based on their type. At this stage, no data has been inserted into your Vault yet, making it easy to prevent mistakes by careful analysis of the report. Once you have decided to commit the entire file contents (minus the errors) to the Vault, the second part of this step starts, when data are inserted into your Vault. This process may take from seconds to hours depending on the amount of data being added. Learning how to interpret the import validation report is so crucial to avoid costly mistakes in your Vault, that we link to common error and suspicious event and noteworthy event descriptions one more time.

 

Salts and/or solvent of crystallization are included in the structure:

This is the most common import scenario. In this case, the provided MOL or SMILES structure includes all the salt and solvent information, as well as the stoichiometry, so that the right number of core structure, salt molecules, and solvents of crystallization are represented.

Map the structure field to "Molecule Structure". CDD will automatically parse out the salt and solvent data, and preserve it in the batch-level Salt field. There can only be one type of salt, and one type of solvent. If there are more than one of each, the structure will be rejected on import, and can be registered as a mixture instead.

CDD Field Mapping

Data Description

Molecule Fields - Structure

salt information is part of the SMILES or MOL structure.

Core.Salt.Solvent: CNCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O.Cl.O.O

Batch Fields - Salt

do not import

 

Without Structure:

CDD permits molecule registration without structure and you may add a structure to your structureless record at a later time.
To register a new structureless compounds, make sure to choose "without structure" option in Step 2 above. Since there is no structure to validate whether the molecule already exists in the database, a new molecule record will be created any time a structureless registration is selected.

You may still wish to update an existing structureless molecule by adding new batches. In this case, you will need to identify the existing molecule in your import file by providing a molecule ID and mapping it as "Molecule Name or Synonym". This field should contain only your CDD-generated molecule name, or a molecule synonym. Make sure to check the validation report in the last import step to see that a new batch is added correctly - look for "Noteworthy events", "New Batch" events.

Note: If a molecule ID is provided, but it is not found amongst existing molecules, a new molecule will be created.

CDD Field Mapping

Data Description

Molecule Fields - Structure

do not import

Molecule Fields - Molecule Name or Synonym

Optional. Map only when registering batches of existing structureless molecules.

Batch Fields

Map any of your vault's required batch fields. This will vary between vaults.

 

Salt/solvent data is in a separate field:

Salt and solvent of crystallization data are not part of the structure, but represented as some alphanumeric string in a separate field.

Some vendor library files are formatted this way. CDD Vault supports a large set of salt 'nicknames', as well as a set of two letter codes for a large set of salts, acids, and solvents. CDD Vault also supports a limited list of solvent of crystallization structures. Here's the full table of supported fragments.
Stoichiometry and solvents may be included in the textual representation, as long as they're formatted in the following way:

    • The salt must come first, the solvent second, delimited by a forward slash " / "
    • There can only be one type of salt, and one type of solvent. If there are more than one of each, the structure will be rejected on import, and can be registered as a mixture instead.
    • The salt vendor string ("HCl" above) is case-sensitive and is not normalized, which means it must be written exactly as shown including all cases, white space, and numbers.
    • Any stoichiometric numbers you like can be used, so long as they resolve to a simple stoichiometric ratio of cores, salts and solvents (e.g "0.07 HCl" does not).
    • There must be at least one space between the number and the salt name or solvent, but extra spaces that do no occur within the salt name or solvent name are fine.
    • Here are some examples to make all of this more clear:
      • 2 HCl
      • HCl/Acetone
      • 3 HCl/2 n-Propanol
      • 0.33 AQ/0.5 Methelene Chloride

Map the field/column containing the salt name to "Batch Fields- Salt". Map MOL or SMILES to "Molecule Structure".

CDD Field Mapping

Data Description

Molecule Fields - Structure

no salt information in SMILES or MOL structure.

Core: CNCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O

Batch Fields - Salt

One Hydrochloride, and two molecules of water per one core structure:

HCl/2 H2O

 

Compound Mixture:

When a molecule is actually a mixture of molecule- none of the parts are salts or solvents that should be stripped. CDD may reject such records as "invalid structure" errors, but there is a way to register them as a mixture by using the using the method described for salt/solvent data in a separate field . By inserting a text-based "salt" field and mapping it to Batch Fields -> Salt, you bypass CDD's structure inspection, and register the entire structure as a mixture:

  • Download and review "invalid structure" error file to make sure that this is, in fact, a mixture.
  • Insert a field/column into the error file called "Salt".
  • Enter value AA into the salt field. AA is the CDD code for "No Salt, free base or acid"
  • If you're not clear on the file formatting, download a template example here.
  • Import using the same mapping as above.

CDD Field Mapping

Data Description

Molecule Fields - Structure

mixture of two or more core structures.

Core1.Core2

CNCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O.NCCCC1=CC=C2NC(CC(=O)C2=C1)C(N)=O

Batch Fields - Salt

No salt, free base or acid salt form, to indicate that this isn't a salt

AA

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.