This workflow is designed to take a set of compound structures exported from CDD Vault in SDF format, and select a specified number of diverse compounds, outputting them to a new SDF file.Please note that files and programs in the "Downloads" section are provided by CDD "AS IS".
Background from RDKit:
Picks diverse rows from an input table based on tanimoto distance between fingerprints. The picking is done using the MaxMin algorithm (Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604). The algorithm is quite fast, even for large datasets, but note that runtime increases rapidly with the number of rows to be picked.
Knime pre-requisites:
- RDKit - all community nodes from this repository added
- Chemistry - all nodes from this repository added
Input file:
SDF file with required MOL structure exported from CDD. May contain other data in addition to structure.
Output file:
New SDF file will be defined and created in the last step of the workflow
Node: SDF Reader
Configure: right-click the node, and choose "configure"
On the file selection tab, browse for your SDF input file
- Select "Extract SDF blocks"
- Select "Extract MOL blocks"
Click "OK"
Execute the node: Right-click the node, and choose "execute"
Node: RDKit Diversity Picker
Configure: right-click the node, and choose "configure"
On the "Options" tab, set
- Molecule or fingerprint column (table 1) to "SDF Molecule"
- Number to pick- This is the number of diverse structures to pick- they are the cluster centers
Click "OK"
Execute the node: Right-click the node, and choose "execute" - look carefully in the knime console for error messages
Node: Interactive Table
Does not need configuration.
To view the table in KNIME- right-click the node, and choose "View, Table view"
Execute the node: Right-click the node, and choose "execute" - look carefully in the knime console for error messages
Node: SDF Writer- creates an output SDF file
Configure: right-click the node, and choose "configure"
On the "Default Settings" tab set
- Filename- set the path and give a filename for the output file
- Structure column- "Molecule"
- Include/exclude columns- choose the columns you want to include or exclude from the output.
Click "OK"
Execute the node: Right-click the node, and choose "execute"- look for the output file in the specified location.