Information submitted through the support site is private but is not hosted within your secure CDD Vault. Please do not include sensitive intellectual property in your support requests.

Diversity Picker

This workflow is designed to take a set of compound structures exported from CDD Vault in SDF format, and select a specified number of diverse compounds, outputting them to a new SDF file.

Please note that files and programs in the "Downloads" section are provided by CDD "AS IS".

Download workflow

Background from RDKit:

Picks diverse rows from an input table based on tanimoto distance between fingerprints. The picking is done using the MaxMin algorithm (Ashton, M. et. al., Quant. Struct.-Act. Relat., 21 (2002), 598-604). The algorithm is quite fast, even for large datasets, but note that runtime increases rapidly with the number of rows to be picked.

Knime pre-requisites:

  • RDKit - all community nodes from this repository added
  • Chemistry - all nodes from this repository added

 

Input file:

SDF file with required MOL structure exported from CDD. May contain other data in addition to structure.

 

Output file:

New SDF file will be defined and created in the last step of the workflow             

 

Node: SDF Reader

Configure: right-click the node, and choose "configure"

On the file selection tab, browse for your SDF input file

  • Select "Extract SDF blocks"
  • Select "Extract MOL blocks"

Click "OK"

Execute the node: Right-click the node, and choose "execute"

 

Node: RDKit Diversity Picker

Configure: right-click the node, and choose "configure"

On the "Options" tab, set

  • Molecule or fingerprint column (table 1) to "SDF Molecule"
  • Number to pick- This is the number of diverse structures to pick- they are the cluster centers

Click "OK"

Execute the node: Right-click the node, and choose "execute" - look carefully in the knime console for error messages

 

Node: Interactive Table

Does not need configuration.

To view the table in KNIME- right-click the node, and choose "View, Table view"

Execute the node: Right-click the node, and choose "execute" - look carefully in the knime console for error messages

 

Node: SDF Writer- creates an output SDF file

Configure: right-click the node, and choose "configure"

On the "Default Settings" tab set

  • Filename- set the path and give a filename for the output file
  • Structure column- "Molecule"
  • Include/exclude columns- choose the columns you want to include or exclude from the output.

Click "OK"

Execute the node: Right-click the node, and choose "execute"- look for the output file in the specified location.