Spreadsheets are utilized throughout the Pharmaceutical industry, which is why spreadsheet validation is so important.
Well designed spreadsheets increase both efficiency and compliance. Poor spreadsheet validation and design can lead to frustration and in some cases to reporting incorrect results. The process by which spreadsheets are implemented is also critical for compliant laboratories and laboratory data validation.
Weak controls around spreadsheet security and management have been the source of many regulatory findings.
This article focuses on aspects of spreadsheet design that deliver solutions which increase efficiency and compliance.
Design criteria which support compliant spreadsheet implementation are also addressed. The process or IT solution by which spreadsheets are managed is not in scope of this article.
The design principles in this article assume a templated deployment pattern, where copies of a controlled template are created each time the process is run. The term “template” will be used to identify the master spreadsheet and “document” will describe template copies.
Before embarking on spreadsheet creation, key aspects of the environment surrounding the eventual use of the spreadsheet should be identified. The key aspects are compliance, target audience, and specificity of functionality.
Obtaining answers to the following questions drives the applicability of specific design components which are described in detail below.
Determining if compliance is a factor is the most critical decision point in spreadsheet design.
Compliant templates must adhere to regulatory requirements with careful design to facilitate document review and potentially template validation.
Templates not utilized for compliant work can opt out of many design strategies specifically targeted at compliance.
There are two strategies to assure accuracy of the information produced by a compliant template/document.
This design option relies on qualified reviewers to verify the outcome of each formula in the spreadsheet.
In addition, reviewers need to verify the results of automated outcomes (code which generates content), cell format of numeric results, and conditional formats which can flag outcomes (green/red for pass/fail results). This option is feasible for relatively simple spreadsheets or in situations where use is sporadic.
In these cases, the overhead of individual document review can be less burdensome than the template validation strategy defined below. Spreadsheets should be designed to:
Validation of the template is a variation on 100% review in which the spreadsheet functionality is specified, tested, and documented once.
The major advantage of template validation is avoiding the need to review templated results and reports.
For compliant organization, template is all but mandatory for complex spreadsheets and/or high volume usage.
Significant effort is required to validate and maintain validated spreadsheet compliance.
Validated spreadsheets (including copies) must be maintained within an automated system or set of manual processes which enforce version control and maintain custody.
Spreadsheet design must secure formulas or automated template functionality which generates results. The design should:
A second determinate of spreadsheet design is the target user community.
The design criteria are very different as the number of users in the target community grows.
The design for a personal spreadsheet can make assumptions regarding security, expertise, training, work process, and physical environment.
For instance, a personal template can be designed to work with a specific instrument, target a specific sequence of events, assume knowledge of spreadsheet work arounds,
and of course a personal template is a known commodity.
As the expected template audience increases, design must compensate for the different factors described above to deliver a robust solution generating quality results.
A widespread issue is the personal or small group template that becomes wildly successful beyond the initial group.
In this situation, the template is often deployed to a wider audience without addressing new design requirements.
Lack of a suitable wide audience design can result in a less successful roll out.
An example of this decision point in analytical laboratories is designing a template for “method” specific work or design for a generic “analysis” supporting many products.
A manufacture decision point may be whether to create a template for a specific manufacture line or to attempt to design generically for different lines.
A formulation template may require a decision as to the types of products supported – tablets, capsules, creams.
In many situations, the designer is faced with a choice between a single generic template or multiple specific templates for each use case.
The trade-off between specific and generic should be carefully considered to determine if the upfront effort for a generic template offsets the added maintenance required for multiple specific templates.
Below are a list of some of the Pros and Cons to help in determining how specific You should be in your Template Design:
|Specific to Manufacture Line – Designer can account for all needs of Single Group of Users||Multitude – The need to create many templates|
|Time – Quicker to get Template into Production||Maintenance – the need to maintain a multitude of Templates|
|Creation of fewer Templates||Upfront Effort and Time – Designer needs to accommodate needs of various groups|
|Maintenance is done on fewer templates||Takes longer to get into production|
|Future Requirements are hard to predict|
Tables are used extensively in most spreadsheets.
The tables may record samples, preparation data, resources, results, etc.
Tables in most spreadsheets are predefined meaning that the maximum number of table rows/columns is ascertained during spreadsheet development.
For instance, a sample preparation table may be defined to support up to 20 preparations.
Each preparation may require cells for weights, volumes, and formulas to calculate concentrations.
If each preparation is displayed on a row, there will be a total of 20 rows complete with predefined cells for weights, volumes, and concentrations.
A good design practice is to display only relevant table rows to simplify both recording data and subsequent review.
This design pattern requires implementation of macro code.
Table rows are initially hidden and are then displayed based on user interaction with the spreadsheet.
In the sample preparation example, rows may be displayed based on addition of samples or “Add Prep” button click event.
Consistent use of cell color in spreadsheets is an important design component.
A unified color scheme can:
Color schemes are optional for personal spreadsheets, but spreadsheets deployed to a wider audience should implement a consistent color scheme.
Be sure the selected color scheme is printable as some fill/font combinations do not translate to grey scale printers (many office printers).
Define Colors for cells containing:
The most utilized spreadsheets cell types are: “General”, “Text”, “Number”, and “Date”.
Each cell type has both advantages and disadvantages.
Choice of the best cell type is based on the specific situation.
“General” is the recommended cell type for most situations and should be the default for “White Space” cells.
The advantage of “General” is support for all types of data entry or formula display.
Reasons not to use general:
“Numeric” is recommended for numerical data entry situations and for display of most calculated results.
The disadvantage is the necessity to define in advance the number decimals.
The data entry cell for a balance reading in grams from a 4 place balance is typically defined as a 4 decimal numeric cell.
Unfortunately, the same balance may record to 5 decimals or possibly in mg to 2 decimals.
In these situations, the numeric cell display will not match the actual data recorded.
An important consideration is whether authors can modify the number of decimals if needed.
“Text” has the advantage of displaying an entry exactly as the author/instrument
intended avoiding issues with trailing zeros, precision, and automatic translations.
The principle disadvantage of text cell type is when a value is recorded in a “text” cell and is then referenced in a subsequent calculation.
The “text” cell value must be translated into a number using the VALUE function which creates considerable overhead for the template developer.
In addition, data validation limiting numeric inputs cannot be implemented against “text” cells.
“Date” is of course recommended for any situation requiring dates.
The actual format of date cells should be consistent and adhere to organization SOP (DD-MMM-YYYY).
The major problem with dates is when a cell is not predefined as a date and the date is displayed in whatever default format is active.
The best path to creating a robust spreadsheet is to assure the initial data entry is recorded correctly.
Data validation of numbers allows spreadsheet developers to force data entry values
to be within acceptable parameters which also assists with spreadsheet validation
since endpoints are defined by the data validation max and mins. This also means that you need to use the right data validation tools.
One caution with this technique is that limits may not be defined broadly enough and legitimate values cannot be recorded.
This potential drawback can be avoided by using a data validation “Warning” rather than “Stop” when an error condition is encountered.
A great way to highlight alert situations or pass/fail results is through conditional formats.
The format of target ranges can be manipulated based on any input.
Conditional formats are however difficult to maintain since the underlying conditional formula
is not easily displayed or obvious.
To overcome this issue, place all formulas which evaluate a condition in hidden cells
on the spreadsheet and include only simple references in the conditional formula.
A simplistic example is a result to be colored red if greater than 7.
A hidden cell would contain the evaluation in a formula myResult > 7″.
The conditional formula would reference this hidden cell =myCondition=TRUE.
Spreadsheets with multiple worksheets often reproduce information across several worksheets.
For instance, samples may be identified on a “Samples” worksheet and prepared on a “Prep” worksheet.
In this instance, the developer will wish to reproduce at least the sample description from the “Samples” sheet to the “Prep” sheet.
If the sample description is in cell ‘B2’ the obvious formula to obtain the sample description on the “Prep” sheet is “=Samples!B2”.
This works as long as a sample description is defined however, a blank value in cell ‘B2’ will cause a “0” to be displayed on the “Prep” sheet.
Several solutions are possible to avoid this behavior:
The first option tests for a blank sample description and records a value only if the sample description is not blank.
The second option forces a textual interpretation concatenating a blank with the sample description.
The third option also forces a textual interpretation of the sample description.