During the validation step of the data import (import step 3), CDD Vault performs a number of checks to ensure integrity and uniqueness of uploaded structures and molecule identifiers.
Checks are performed sequentially and a validation report is provided prior to approval of data for final import.
Read this article for molecule registration mapping (import step 2).
Salt and solvent of crystallization handling
There is an optional vault-level setting to disable chemical registration with salt stripping.
- Only supported salts and solvents are stripped and stored in an automatically created batch field.
- Only one salt form, AND one solvent of crystallization are supported.
- Stoichiometry that resolves to a simple stoichiometric ratio of cores, salts, and solvents is supported.
- If any of the above import validations fail, the structure will not be imported with an unrecognized structure error.
Core structure handling
- Structures are neutralized (unless chemical registration is turned off in the vault).
- Structure is checked to be in a valid format. Invalid structures fail import validation with unrecognized structure error.
- canonical SMILES
- polymer notation is not supported
- The chiral flag will automatically be set to absolute stereochemistry, but enhanced stereochemistry features will be preserved.
- Original imported structure is saved as the original MOL or SMILES. This structure is displayed on the search page, on the molecule record, and in the exported structure image.
- A copy of the original structure is standardized as follows
- extra atom labels are removed
- extra annotation is removed
- atom numbers are removed
- aromatization is standardized using ChemAxon basic method
- explicit hydrogens are removed
- chiral flag is set for absolute stereochemistry
- converted to CXSMILES
- Uniqueness validation is performed using standardized structure CXSMILES.
- enhanced stereochemistry is supported.
- Salt, solvent of crystallization and core structure counts are recorded as batch information and used to calculate formula weight.
- Stoichiometric ratios are confirmed to be whole numbers.
- Duplicates are detected across the vault (all projects) and within the uploaded file.
- Enantiomers are detected and registered as distinct molecules:
- Pure Isomers (R, S)
- Unknown stereochemistry (wavy bond)
- Unspecified stereochemistry
- Possible tautomers are detected based on InChiKey and a warning is provided. Such duplicates will be registered as distinct molecules, if the file is committed.
Synonyms and Structure conflict detection
- Synonyms that already exist in the Vault for another structure will result in an error.
Well conflict detection
- A well that has already been assigned another molecule or another batch of the same molecule will result in an error.
Batch identification and error detection
- Registration of a molecule that already exists in the Vault will result in a new Batch.
- Any unique batch identifier that has been previously assigned to another structure/batch, will result in an error.