This article covers the details of biological protocol set-up from raw data. We'll assume here that you've created protocols before, so if you have not, you can start with this introductory tutorial instead. IC50 Dose-response protocols are described separately.
This type of protocol will be suitable for any end-point data entry such as cytotoxicity, DMPK, or single-point screening data. Each protocol definition will consist of the following steps:
Before runs can be created and assay results can be imported, we will need to set up a protocol. Protocol architecture in CDD is very flexible, and many kinds of data can be accommodated, including data from enzymatic and cell-based assays, in vitro and in vivo ADME/TOX screens, as well as in vivo pharmacodynamic and efficacy data.
For any one assay there are several ways to design a protocol, depending on how you plan to aggregate the data (calculate averages and such), and how you plan to search/mine it in the future. If you're unsure of the best design for your specific protocol, you can always contact your account manager, or support directly.
With this in mind, here is a list of things to consider while planning your protocols:
- What is your raw data? (that you collect directly from your instruments)
- What is your primary result? (e.g. % Inhibition, Ki, IC50, etc)
- What are the conditions that you need to capture to give enough context to the results?
- What are the calculations you need to perform on your data?
- Do you need to aggregate/average your data?
With a good plan in hand, time to build it.
Create a new protocol
On the Explore Data tab, click Create New at the top of the side-bar, and choose Protocol from the drop-down.
The four fields in the new protocol dialog are straightforward but they are also important for the future of the protocol.
Name - The protocol's name should be short and descriptive of the assay. It's a good idea to create some protocol-naming guidelines so that later, when your vault has dozens or even hundreds of protocols created by different users and listed alphabetically you'll be able to pick the one you need from a list.
Category - This field is not required but it can be very helpful in the future for grouping protocols and for narrowing down search results in your vault.
Description - Another optional field that will be very helpful to other scientists who want to understand your protocol or to you in a year when you want to remember what was done.
Additional Protocol Fields - Other optional fields may be available to populate if your Vault Administrator has created custom Protocol Fields.
Project - The project field is required because it determines who has access to the data in the new protocol. When creating a new protocol you can only select one project from the drop-down menu but more projects can be added or removed later.
Don't forget to click "Create protocol" at the bottom of the form.
You are now taken to the "protocol details" page where you will continue to build out the protocol by creating readout definitions.
Readout definitions represent all of the result types that are captured for any specific assay. The result types may include all of the following: conditions, experimental data, calculated results, and meta-data such as experiment # and assay order. While "readout definition" is the default name of this record type, it may have alternate nomenclature within your specific vault. For example it may be called "assay parameter" or "result type".
Basic readout definition will have a name and a data type. This is the required minimum. This readout will be imported directly from file, with no calculations being applied to it. It will appear in exactly the same format as appears in the import file. You may choose to embellish this with a display format and/or units of measure.
Normalized readout definition will be a result of a calculation based on some imported basic readout. You will see the available options when you choose a "number" data type of the basic readout definition. When left on "do not normalize", the readout will be just the end-point that was imported from file, or the basic readout. Predictably, we can not do calculations on files or text.
Here are the available fields and options for readout definitions:
Name- required. This is the name of the raw data readout. In our example we're collecting counts per minute at two time-points, so the first readout definition is called CPM-48.
Required- Check this box if the readout you are creating must be populated for every row of readout data registered/imported.
Data type- required. This sets the type of data that is permitted in this readout definition.
- Number- the most common data type for HTS data. Only numeric values with modifiers (>, <, >=, <=) are permitted. This means that things like N/A or * or "precipitated" aren't allowed.
- Text- alphanumeric values are permitted. Qualitative results and hyperlinks should also be entered as this data type.
- Pick List- a pre-defined list of alphanumeric values that may be entered for this readout. Gives the protocol owner ability to control the values that are imported into the vault. Data such as phenotypes, descriptions, cell lines etc. should be defined as pick lists. Note, that you cannot calculate across pick-list values! Learn more about pick lists.
- Batch Link: allows linking Batch records to other entities stored in the same or across separate CDD Vaults (if the Link Across Vaults feature is enabled for your CDD Account).
- File- file attachments of any file type and size. Image previews will be generate for JPG, GIF, BMP, PNG, TIFF and PDF formats. All other files will be available for download to view with their native software.
Protocol Condition - Check this box if the readout you are creating should be used by CDD Vault to aggregate data. Details on Protocol Conditions are documented here, in the Knowledgebase.
Display Format- optional. This will determine the number of digits that appear throughout the vault. Much like Excel, this format is only a style, while all calculations are performed on the underlying full number.
- Decimal places- choose the number of decimal places following the decimal separator.
- Significant figures- choose the number of significant figures. If you remember your high school math, significant figures include all digits, except leading or trailing zeros.
Unit- optional. While the unit of measurement is an optional field, if you remember your high school science, you should include a unit, if you are taking a measurement. This is a free text field, so any unit can go in here. The units should stay consistent throughout the protocol, this is another thing they teach in Chemistry/Biology 101, so we won't harp on it.
Description- optional. Typically the description will be necessary if you did the calculation outside of CDD and are importing a final value, or if you have a scoring system that needs explanation.
Normalization- optional. Common data normalization options for HTS data are supported.
The drop-down for normalization includes the following options which will influence the fit validation as well as plot scales, so choose one that best describes your data:
- Normalize within each plate (shown above)- both positive and negative controls are run on each screening plate. Controls will be averaged per plate, and test data normalization will be performed per plate. This will help remove any plate-to-plate variation.
- Normalize within each run - positive and negative controls are present on one or on some plates. All controls will be averaged together across plates before test data are normalized.
- Already normalized - if you perform another normalization that is not supported by CDD, this is the best option to use.
- No controls (do not normalize) - choose this option if you're using raw data that will not have a consistent scale.
Data normalization functions -
Fit parameters: Min, Max, and Hill Slope
The curve fit is performed using the standard Hill equation, or the four parameter logistic curve:
Response is the measured response on the Y axis.
Baseline response is the minimum response at the bottom of the plateau.
Maximum response is the maximum response at the top of the plateau.
EC50 is the concentration at 50% response
Concentration is the measured drug concentration on the X axis.
Hill Slope is the Hill coefficient that describes the steepness of the curve.
Calculated Readout Definition- optional. CDD Vault includes calculation of the arithmetic and geometric mean. CDD Vision includes expanded capability for calculations, including variables and calculated chemical properties.
For details please refer to this knowledge base article:
Choose a control plate layout if you have defined any normalized readouts, with the exception of sample-based z-score. The control layouts are used by the normalization functions that need to calculate the negative and positive control means and standard deviations. Here's a complete article that addresses control layouts.
Control layouts defined on the protocol details tab will be the default layouts that are applied to all plates imported into your protocol, but these default can be over-ridden on individual runs, or even on individual plates.