You are on step 3 of the import of a compound library. You have completed the initial import, and now reviewing the validation or QC import report, which shows noteworthy events, suspicious events, and errors. What do these events mean, how do you correct them, and should you now commit the import? In this article we focus on the main structure of the report and the noteworthy events:
Read these articles for suspicious events, and errors.
The third and last step of the import process provides you with a detailed report of events that are going to happen if you choose to commit or complete the import. Once the data are committed to the database, it is more difficult to undo, so the report provides you with an opportunity to cancel the entire import, and to cancel import of individual event types within an otherwise OK file.
- In the top left corner of the report, you see the file name with a link to review the mapping from the previous import step, and to save a mapping template.
- In the top right corner of the report, you see the project name where this file is imported, and the file owner's name.
- The yellow panel shows import progress and status of the file: after the initial import is complete, it shows the total number of records (rows of a csv file, and individual records of the SDF file) that will be imported and/or rejected when the import is finalized.
The following report sections contain a breakdown of the imported and rejected records by category. In this article, we focus on the most commonly encountered events found in the "Noteworthy" category during molecule registration. The other categories are suspicious events, and errors.
"Usually fine. Associated records will be imported except if you choose otherwise."
Even though this category of events is usually fine and accepted for final import by default, it is important to review this section to make sure all of the counts make sense to you, and the event types you meet your expectations.
This section lists all of the events, grouped by type, that the database considers valid for import- there are no structure conflicts, and all formatting is correct. This does not mean that these valid events correspond to your expectations for the import. Therefore while the events are "usually fine", there may be cases when you want to reject a particular event type in this category.
- Review the count on each event type- it should match your expectations
- Click on the arrow next to the event type to expand the section
- Read the short description, and click on the "learn more" link for additional details about the meaning of this event.
- Scroll to the right of the file preview section to see each record's description of the event- it may contain some of the most specific details. The preview section only shows 10 representative rows/records.
- Download a complete file containing all rows/records under the event type. The last column/field of each record will contain a specific description.
- By default, each event is set to be accepted during the final import, but you may over-ride this, and choose to reject all records in this event type, in which case none of the enumerated records will be imported in the final step.
"New molecule created."
The screen-shot above shows that 355 new molecules are being created, and 24 additional molecules are being rejected indirectly, as a result of some related error or suspicious event. The database compares existing structures in your vault with incoming structures, and if no match is found, a new molecule record is added, with a new registry number and a first batch. There nothing wrong with creating new molecules, unless you didn't think there were any new structures in the file.
Let's say that you were registering a re-purchased compound library, which was registered in your vault when it was purchased initially. Instead of 355 new molecules you would expect to see 355 new batches of existing molecules. In this case, you will need to reject the entire event type to avoid creating these erroneous records. You will need to double-check the structures, and make sure that those of re-purchased compounds match the previously registered compounds.
"A new batch was created for an existing molecule."
The screen-shot above shows that 328 new batches are being created, and 2 batches are being rejected indirectly, as a result of some related error or suspicious event. The database compares the existing structures in your vault with incoming structures, and if a match is found, a new batch record is added to the existing molecule. This is a valid event, unless you thought all of the incoming structures were brand new.
Let's say that you were registering a newly purchased compound library, but instead of 328 new molecules you see 328 new batches of existing molecules in the report. In this case, you may want to reject the entire event type to avoid creating these erroneous batches. You may also want to double-check the structures- it may be that you unknowingly purchased compounds already in your collection, but it may also be that the structures in the file were wrong.
Expert tip: since the QC import report lets you know if you already have a molecule in your library, you can save yourself time and money by importing a compound library file prior to purchasing it; when you get to the import report, look at how many new molecules vs. new batches you are creating as an indicator of value, then simply reject the import, if you do not plan to actually purchase the library.
"The number of unique plates is reported above and individual wells are displayed below."
The screenshot above shows that 10 new plates are being created. When you map to Plate Name and Well Position during step 2 of the import process, CDD automatically registers the named plates, assigning whatever molecule and batch was on the same row or in the same record. The import report shows the total number of unique plates in the collapsed view, and the total number of unique wells in the expanded view and the downloadable report. If the number of plates matches your expectations, this is a valid event.
Let's say, however, that you see 960 plates instead of 10, and chances are you have only one compound on each plate, instead of 96 compounds on each of 10 plates. This frequently happens if you use Excel to generate the plate map file, and Excel automatically fills the number series instead of copying down the exact plate number. You will notice that if you reject the new plate event to avoid creating so many bad plates, many other noteworthy events will become indirectly rejected. In this case, you may consider rejecting the entire import at the bottom of the report, and double-checking plate and well assignments.
Existing Molecules Associated with New Project
"The molecule will be associated with the target project."
"The existing molecule referenced by this record does not belong to the target project. Reject this data import if you do not want to associate it."
The screenshot above shows that 8 existing molecules are being associated to a new project. Recall that in step 1 of the import, you selected the file and a project into which to import. In step 3 of the import, the report now shows that some molecules you are registering are already found in other projects. By accepting this event type, you will create new batches of these molecule, and share the entire molecule records with the project you selected. The shared details include structure, molecule ID and synonyms, and any user-defined fields, however existing batches of this molecule will not be shared with the new project.
This may or may not be the intended behavior. If you reject this event type to avoid sharing of molecule records, the new batch events related to these molecules will become indirectly rejected.
New Batches for New Molecules
"Another batch was added for a molecule created by this file import."
"The same molecule appears in multiple lines of this file. For every instance, a new batch will be created. If you only wish to register a single batch, you must remove all replicates."
The screen-shot above shows that there are 4 new batches for new molecules. The database compares all incoming structures in your file to each other and to existing structures in the vault. If a match is found within the file, but not within the vault, a new molecule record is added, with multiple batches: one batch for each time the structure is found in the file. For example, "4 new batches for new molecules" may mean that 1 new structure appears 5 times in the file: the first time it is encountered, a new molecule event is added to the report. The next 4 times the same structure is encountered, 4 new batch for new molecule events are added. Or, it may be that 1 new structure was encountered 3 times in the file, and another new structure was encountered 3 times (2 new molecules + 4 new batches for new molecules). Or 4 new structure was encountered twice each...
This may be the intended behavior if you have replicates within the library, which happens frequently. However, this may also be an unintentional consequence, for example, of importing a file with structures assigned to dose-response plates, where each structure appears in 7 or more wells. In this case, the intention is to register only one batch for a new molecule, and then assign that batch to 7 wells; you will need to reject the entire event, which will prevent replicate batch creation, but will also prevent the plate/wells from being created. The plate map will have to be imported separately once the molecule and batch are registered.
Commit/Reject Data Import
Regardless of the accept/reject settings you may have updated above, you will need to finalize the import by pressing "commit data import", or cancel the entire import by pressing "reject data import".
Of course, it's up to you how you choose to address the issues that the import report brings up. The general rule of thumb is that if less than 20% of the rows/recors in your file are problematic, reject only the problematic events, and commit the entire file. If more the 20%, then reject the entire file and start over! Remember that until you have pressed this final "commit" button, no data has been actually inserted into the database, and it is very easy to undo and cancel. After the import is committed, molecules, batches and plates can be edited or deleted manually, but not in bulk mode. If you do need to edit a large number of records, please contact our support.
Learn about Suspicious events
Learn about Errors