This article covers the details of biological protocol set-up from raw data. We'll assume here that you've created protocols before, so if you have not, you can start with this introductory tutorial instead. Dose-response protocols are described separately.
This type of protocol will be suitable for any end-point data entry such as cytotoxicity, DMPK, or single-point screening data. Each protocol definition will consist of the following steps:
Before runs can be created and assay results can be imported, we will need to set up a protocol. Protocol architecture in CDD is very flexible, and many kinds of data can be accommodated, including data from enzymatic and cell-based assays, in vitro and in vivo ADME/TOX screens, as well as in vivo pharmacodynamic and efficacy data.
For any one assay there are several ways to design a protocol, depending on how you plan to aggregate the data (calculate averages and such), and how you plan to search/mine it in the future. If you're unsure of the best design for your specific protocol, you can always contact your account manager, or support directly.
With this in mind, here is a list of things to consider while planning your protocols:
- What is your raw data? (that you collect directly from your instruments)
- What is your primary result? (e.g. % Inhibition, Ki, IC50, etc)
- What are the conditions that you need to capture to give enough context to the results?
- What are the calculations you need to perform on your data?
- Do you need to aggregate/average your data?
With a good plan in hand, time to build it.
Create a new protocol
If you have just logged into CDD, you are on the Dashboard tab. If you have been following the lessons, most likely you are on the Explore Data tab. You can create a protocol from either page.
On the Dashboard tab, click the link to Create a new protocol.
On the Explore Data tab, click Create New at the top of the side-bar, and choose Protocol from the drop-down.
Fill in the protocol name, the category, the description and project affiliation. Description is the only optional field here, but should really be filled in if you plan to share the data, or to remember what was done in a year or so.
Don't forget to click "Create protocol" at the bottom of the form.
You are now taken to the "protocol details" page where you will continue to build out the protocol by creating readout definitions.
Readout definitions represent all of the result types that are captured for any specific assay. The result types may include all of the following: conditions, experimental data, calculated results, and meta-data such as experiment # and assay order. While "readout definition" is the default name of this record type, it may have alternate nomenclature within your specific vault. For example it may be called "assay parameter" or "result type".
Basic readout definition will have a name and a data type. This is the required minimum. This readout will be imported directly from file, with no calculations being applied to it. It will appear in exactly the same format as appears in the import file. You may choose to embellish this with a display format and/or units of measure.
Normalized readout definition will be a result of a calculation based on some imported basic readout. You will see the available options when you choose a "number" data type of the basic readout definition. When left on "do not normalize", the readout will be just the end-point that was imported from file, or the basic readout. Predictably, we can not do calculations on files or text.
Here are the available fields and options for readout definitions:
Name- required. This is the name of the raw data readout. In our example we're collecting counts per minute at two time-points, so the first readout definition is called CPM-48.
Data type- required. This sets the type of data that is permitted in this readout definition.
- Number- the most common data type for HTS data. Only numeric values with modifiers (>, <, >=, <=) are permitted. This means that things like N/A or * or "precipitated" aren't allowed.
- Text- alphanumeric values are permitted. Qualitative results and hyperlinks should also be entered as this data type.
- PickList- a pre-defined list of alphanumeric values that may be entered for this readout. Gives the protocol owner ability to control the values that are imported into the vault. Data such as phenotypes, descriptions, cell lines etc. should be defined as pick lists. Learn more about pick lists.
- File- file attachments of any file type and size. Image preview will be generate for JPG, GIF, BMP, PNG, TIFF and PDF formats. All other files will be available for download to view with their native software.
Display Format- optional. This will determine the number of digits that appear throughout the vault. Much like Excel, this format is only a style, while all calculations are performed on the underlying full number.
- Decimal places- choose the number of decimal places following the decimal separator.
- Significant figures- choose the number of significant figures. If you remember your high school math, significant figures include all digits, except leading or trailing zeros.
Unit- optional. While the unit of measurement is an optional field, if you remember your high school science, you should include a unit, if you are taking a measurement. This is a free text field, so any unit can go in here. The units should stay consistent throughout the protocol, this is another thing they teach in Chemistry/Biology 101, so we won't harp on it.
Description- optional. Typically the description will be necessary if you did the calculation outside of CDD and are importing a final value, or if you have a scoring system that needs explanation.
Normalization- optional. Common data normalization options for HTS data are supported.
Two different levels of aggregation:
- Do not normalize this readout- the default option, when you're importing the final calculated value.
- Normalize within each plate (shown above)- both positive and negative controls are run on each screening plate. Controls will be averaged per plate, and test data normalization will be performed per plate. This will help remove any plate-to-plate variation.
- Normalize within each run - positive and negative controls are present on one or on some plates. All controls will be averaged together across plates before test data are normalized.
Data normalization functions:
It should be noted that all of these normalizations can also be inverted by subtracting them from 100%. This is flag you see right above the list of available calculations.
- This is similar to '% inhibition or activation', except it does not remove the background response. If you have positive and negative controls then '% inhibition or activation' is preferred.
- z-score - is the number of standard deviations from the population mean, and can be used regardless of whether your protocol has controls. This normalization assumes that very few of your samples are hits; if you expect a large percentage of hits, the z-score calculation can be based on the negative control mean and standard deviation instead of the sample mean. Choose a z-score calculation from the drop-down.
Calculated Readout Definition- optional. CDD Vault includes calculation of the arithmetic and geometric mean. CDD Vision includes expanded capability for calculations, including variables and calculated chemical properties.
Here are the available fields and options for calculated readout definitions in CDD Vault:
Average type- required. The choices are arithmetic, selected by default, or geometric. Make sure to choose geometric for IC50, and read this blog about using pIC50 instead!
Readout Definition- required. This drop-down will contain a list of basic numeric readouts in your protocol that can be averaged.
Aggregation- required. This drop-down contains choices of aggregation scope.
Aggregate by batch and run - A single aggregate value is calculated per batch of molecule, for each individual run of the current protocol. This is the best option for replicate data.
Aggregate by molecule and protocol -A single aggregate value is calculated across all batches of a molecule and across all runs of the current protocol. Only one average value per molecule will ever be reported for the current protocol. This is the best summary level result.
Aggregate by batch and protocol - An individual aggregate value is calculated for each batch of a molecule, across all runs of the current protocol. There will be as many average values, as there were tested batches of a molecule in a protocol. This is the best option to keep track of compound performance.
Choose a control plate layout if you have defined any normalized readouts, with the exception of sample-based z-score. The control layouts are used by the normalization functions that need to calculate the negative and positive control means and standard deviations. Here's a complete article that addresses control layouts.
Control layouts defined on the protocol details tab will be the default layouts that are applied to all plates imported into your protocol, but these default can be over-ridden on individual runs, or even on individual plates.