During the validation step of the data import (import step 3), CDD Vault performs a number of checks to ensure integrity and uniqueness of uploaded structures and molecule identifiers.
Checks are performed sequentially and a validation report is provided prior to approval of data for final import.
This article review the validation rules applied by CDD. See corresponding articles for some of the common errors, suspicious events and noteworthy events displayed in the validation report.
Read this article for molecule registration mapping (import step 2).
Salt and solvent of crystallization handling
There is an optional setting to disable chemical registration with salt stripping when you register a molecule.
- Only supported salts and solvents are stripped and stored in an automatically created batch field.
- Only one salt form, AND one solvent of crystallization are supported.
- Stoichiometry that resolves to a simple stoichiometric ratio of cores, salts, and solvents is supported.
- If any of the above import validations fail, the structure will not be imported with an unrecognized structure error.
Core structure handling
- Structures are neutralized (unless chemical registration is turned off in the vault).
- Structure is checked to be in a valid format. Invalid structures fail import validation with unrecognized structure error.
- MOL
- canonical SMILES
- CXSMILES
- polymer notation is not supported
- The chiral flag will automatically be set to absolute stereochemistry, but enhanced stereochemistry features will be preserved.
- Original imported structure is saved as the original MOL or SMILES. This structure is displayed on the search page, on the molecule record, and in the exported structure image.
- A copy of the original structure is standardized as follows
- extra atom labels are removed
- extra annotation is removed
- atom numbers are removed
- aromatization is standardized
- explicit hydrogens are removed
- chiral flag is set for absolute stereochemistry
- converted to CXSMILES
- Uniqueness validation is performed using standardized structure CXSMILES.
- enhanced stereochemistry is supported.
Stoichiometry
- Salt, solvent of crystallization and core structure counts are recorded as batch information and used to calculate formula weight.
- Stoichiometric ratios are confirmed to be whole numbers.
Duplicate detections
- Duplicates are detected across the vault (all projects) and within the uploaded file.
- Enantiomers are detected and registered as distinct molecules:
- Pure Isomers (R, S)
- Unknown stereochemistry (wavy bond)
- Unspecified stereochemistry
- Possible tautomers are detected and the user is given a choice to register the new structure as either a new Batch of an existing Molecule or as a new Molecule.
-
The tautomer comparison is made by using the Rdkit rules of tautomerization on the structure itself (not just SMILES or any other text representation) and CDD Vault's own set of rules that restrict some of the things Rdkit would otherwise try to do.
-
Each molecule in our registration system is associated a unique signature that corresponds to a standardized tautomer (not always the most stable chemically). If a molecule you are trying to register matches that signature, the registration workflow will give the user a warning and provide the option on how to register your structure to make a new Batch (the legacy behavior) or new separate Molecule.
-
Synonyms and Structure conflict detection
- Synonyms that already exist in the Vault for another structure will result in an error.
Well conflict detection
- A well that has already been assigned another molecule or another batch of the same molecule will result in an error.
Batch identification and error detection
- Registration of a molecule that already exists in the Vault will result in a new Batch.
- Any unique batch identifier that has been previously assigned to another structure/batch, will result in an error.