You are on step 3 of the import of a compound library. You have completed the initial import, and now reviewing the validation or QC import report, which shows noteworthy events, suspicious events, and errors. What do these events mean, how do you correct them, and should you now commit the import? In this article we focus on the main structure of the report and suspicious events.
Read these articles for noteworthy events, and errors.
The third and last step of the import process provides you with a detailed report of events that are going to happen if you choose to commit or complete the import. Once the data are committed to the database, it is more difficult to undo, so the report provides you with an opportunity to cancel the entire import, and to cancel import of individual event types within an otherwise OK file.
- In the top left corner of the report, you see the file name with a link to review the mapping from the previous import step, and to save a mapping template.
- In the top right corner of the report, you see the project name where this file is imported, and the file owner's name.
- The yellow panel shows import progress and status of the file: after the initial import is complete, it shows the total number of records (rows of a csv file, and individual records of the SDF file) that will be imported and/or rejected when the import is finalized.
The following report sections contain a breakdown of the imported and rejected records by category. In this article, we focus on the most commonly encountered events found in the "Suspicious" category during molecule registration. The other categories are noteworthy events, and errors.
"Usually unexpected. Associated records will not be imported except if you choose otherwise."
The report highlights these events to draw your attention to potential structure inconsistencies. CDD labels evens as "Suspicious" when interpretation of the structure is ambiguous, and requires human judgement. To avoid accidental errors, events are set to be Rejected by default, however each event type may be imported successfully if you choose to Accept it.
- Review the counts on each event type- they should add up to the total number of molecules you expect.
- Click on the arrow next to the event type to expand the section.
- Read the short description, and click on the "learn more" link for additional details about the meaning of this event.
- Scroll to the right of the file preview section to see each record's description of the event - it contains specific details. The preview section only shows 10 representative rows/records.
- Download a complete file containing all rows/records under the event type. The last column/field of each record will contain a specific description.
- By default, each event is set to be rejected during the final import, but you may override this, and choose to accept all records in this event type, in which case all of the enumerated records will be imported in the final step.
Unintentional New Molecules
"New batch for existing molecule created. Structure is equivalent to an existing molecule, meaning this is probably an unintentional duplicate or Tautomer."
The screenshot above shows that 26 unintentional new molecules are found in the file. The database compares existing structures in your vault with incoming structures, and when equivalent structures are found, CDD reports "unintentional duplicates". While these structures are distinct enough (by SMILES or MOL format) to be registered as two separate molecules in your CDD Vault, they are typically chemically equivalent, such as is the case with tautomeric structures. You, as a chemist, need to determine whether the structures warrant registration as separate molecules under unique IDs, or not. The default behavior is to register tautomeric structures as batches under the same core molecule ID.
ACCEPT this event type to register the tautomeric structures as batches of an existing molecule. While chemists may disagree whether distinct tautomeric isomers should be recorded as unique molecules, we believe that it is a best practice to merge such structures under a single molecule with one identifier.
- Each unintentional duplicate will become a new batch of a previously registered molecule, and will receive a new unique Batch ID.
REJECT this event type if you wish to register these structures as new molecules with unique Molecule IDs. This will have to be done manually:
- Reject the suspicious event.
- Expand the event section and "Download all" records as a file, then save the file.
- Manually register each structure using "Create a new Molecule" links.
Mixture of Salts
"All structures in this mixture are CDD recognized salts. The new molecule was created using the rarest salt as the core structure. The more common salt was stripped and is stored in the molecule's batch salt field."
"All components of this mixture are CDD recognized salts. Therefore, to be consistent, we always choose the rarest salt as the core structure and strip the remaining component.
If you prefer to register the other component as the core structure or if you'd like to register the components as a mixture, you must split the mixture into two fields, one for the core structure and another for the salt. Map the core structure to Molecule Structure and map the salt to Molecule Salt. You must use CDD's Salt codes and please be sure to include any hydrates in the Molecule Salt field. For assistance, please contact email@example.com."
The screen-shot above indicates that just one record in the file contains a mixture of salts. This event means that that the compound structure contains two components, both of which are present in the CDD salt table. Amantadine Hydrochloride would be a good example, since both Amantadine and Hydrochloride can be treated as a salt. Since the structure is ambiguous, the database alerts you, presenting an opportunity to alter the default registration:
ACCEPT this event type, if you want the most common salt to be stripped, and the less common salt to be the core structure. In the example of Amantadine Hydrochloride, Amantadine would be the core, and HCl would be stripped.
REJECT this event type if you want to register the entire mixture without salt stripping, or you want to reverse the salt stripping (keep HCl as the core, and strip Amantadine, in our example.) You can find the steps to register a mixture here.
Existing molecules associated with new project
"The molecule will be associated with the target project."
"The existing molecule referenced by this record does not belong to the target project. Reject this data import if you do not want to associate it." This means that a set of molecules and batches referenced in your import file exists in a different project, and this set will be shared with your import project if you click accept.
This suspicious event is meant to prevent unintentionally sharing of chemical resources between projects when importing data. Many of you work with multiple collaborators where accidental sharing of intellectual property has grave consequences, so we have turned this on by default. The import validation report will show sharing as a suspicious event: the user must explicitly accept the event.
If you prefer to share molecules (batches and plates) by default, without explicit acceptance, vault administrators can update the setting in their "Settings" tab.
When this box is not checked, sharing is automatic, and a noteworthy event serves as an alert that molecules/batches or plates are being shared between projects. If the sharing event is unintentional, it may be overridden using the “Reject” option.
ACCEPT this event type, if you want to share the molecules batches and plates with the destination project.
REJECT this event type if you do not want to share the molecules and batches. In this case, the new molecules/batches and plates will not be registered at all.
Commit/Reject Data Import
Regardless of the Accept/Reject settings you may have updated above, you will need to finalize the import by pressing "Commit Data Import", or cancel the entire import by pressing "Reject Data Import".
Of course, it is up to you how you choose to address the issues that the import report brings up. The general rule of thumb is that if less than 20% of the rows/recors in your file are problematic, reject only the problematic events, and commit the entire file. If more the 20%, then reject the entire file and start over! Remember that until you have pressed this final "Commit" button, no data has been actually inserted into the database, and it is very easy to undo and cancel. After the import is committed, molecules, batches and plates can be edited or deleted manually, but not in bulk mode. If you do need to edit a large number of records, please contact our Support.
Learn about Noteworthy events
Learn about Errors