Diversity Picker – CDD Support

This workflow is designed to take a set of compound structures exported from CDD Vault in SDF format, and select a specified number of diverse compounds, outputting them to a new SDF file.
Please note that files and programs in the "Downloads" section are provided by CDD "AS IS".

Download workflow

Background from RDKit:

Picks diverse rows from an input table based on tanimoto distance between fingerprints. The picking is done using the MaxMin algorithm (Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604). The algorithm is quite fast, even for large datasets, but note that runtime increases rapidly with the number of rows to be picked.

Knime pre-requisites:

RDKit - all community nodes from this repository added
Chemistry - all nodes from this repository added

Input file:

SDF file with required MOL structure exported from CDD. May contain other data in addition to structure.

Output file:

New SDF file will be defined and created in the last step of the workflow

Node: SDF Reader

Configure: right-click the node, and choose "configure"

On the file selection tab, browse for your SDF input file

Select "Extract SDF blocks"
Select "Extract MOL blocks"

Click "OK"

Execute the node: Right-click the node, and choose "execute"

Node: RDKit Diversity Picker

Configure: right-click the node, and choose "configure"

On the "Options" tab, set

Molecule or fingerprint column (table 1) to "SDF Molecule"
Number to pick- This is the number of diverse structures to pick- they are the cluster centers

Click "OK"

Execute the node: Right-click the node, and choose "execute" - look carefully in the knime console for error messages

Node: Interactive Table

Does not need configuration.

To view the table in KNIME- right-click the node, and choose "View, Table view"

Execute the node: Right-click the node, and choose "execute" - look carefully in the knime console for error messages

Node: SDF Writer- creates an output SDF file

Configure: right-click the node, and choose "configure"

On the "Default Settings" tab set

Filename- set the path and give a filename for the output file
Structure column- "Molecule"
Include/exclude columns- choose the columns you want to include or exclude from the output.

Click "OK"

Execute the node: Right-click the node, and choose "execute"- look for the output file in the specified location.