OU Bioinformatics Core Facility | University of Oklahoma

Protocol for Analysis of E. coli Membrane Array Data

Statistical Considerations:

We routinely obtain at least 2 replicates of each experimental condition. The membrane arrays have duplicate spots for each gene and each spot is considered to be a separate determination. We normalize individual hybridization replicates by expressing the signal intensity for each spot as a percentage of the sum of all spot intensities on the array (global normalization). The uncertainty that lies behind individual gene measurements can be variously calculated as the standard deviation of the determinations or the coefficient of variation can be used. Since we are usually interested in the statistical significance of differences between an experimental condition and a control, we prefer the student t-test as a means for calculating the probability, based on the uncertainty of the measurements in both conditions, that gene expression differs significantly between the conditions. A conservative approach is to consider significance at a level of P<0.0002, where false positives might be expected for 1 in 5000 genes. We use the standard deviation of the mean of the log ratios, within the context of the P value, to indicate significant up- or down-regulation of gene expression. The standard deviation for the log ratios is calculated and usually those genes that differ by more than 3 standard deviations (99.9% confidence in each tail) from the mean of the log ratio (usually zero, or no change) are considered. This approach is meaningful where the expression level of the majority of genes does not change significantly between conditions and where the researcher is interested in genes that show substantially different expression.

Downloads:

The following downloads provide analysis tools (and examples) for processing membrane array data in a Windows-PC environment using Microsoft Office tools. The macros embedded in these Microsoft Excel workbooks were designed to work with Sigma-GenoSys Panorama E. coli gene arrays. The macros are written in Visual Basic and can be modified in VB Editor for use with other array designs. 6. If the array is stripped, proceed to a new hybridization experiment. Store unused arrays in Saran Wrap at 4oC.

Image Data Cruncher (Microsoft Excel format - 1.5MB)
Sample Image Data Cruncher (Microsoft Excel format - 2.5MB)
2-Replicate-Stats (Microsoft Excel format - 214KB)
Ready2-replicate-stats (Microsoft Excel format - 2.1MB)
Data Analysis (Microsoft Excel format - 1.0MB)
Sample Data Analysis (Microsoft Excel format - 2.8MB)
SampleDB (Microsoft Access format - 2.4MB)
Biotechniques, 2002, 32:110 (PDF format; Requires Acrobat Reader - 325KB)

Array Data Analysis in Excel:

We have developed a series of manually executed macros, written in Visual Basic for Microsoft Excel, that allow for rapid, semi-automated gene expression data analysis. The first VB macro, (embedded in Image Data Cruncher.xls) assigns a spot number to each spot by its coordinate (location) on the array, assigns gene names and associated annotation information to spot numbers, and normalizes individual hybridization replicates by expressing the signal intensity for each spot as a percentage of the sum of all spot intensities. The second macro (embedded in 2-Replicate-Stats.xls) streamlines statistical analysis of experimental replicates by application of the student t-test to calculate p-values that indicate confidence levels for the significance of the ratio of experimentals to controls, based on the uncertainty of individual gene measurements. The third macro (embedded in Data Analysis.xls) introduces a threshold value, calculates expression ratios between experimental and control conditions, and calculates an additional statistical metric, the standard deviation of the mean of the log ratios. A fourth macro (embedded in Data Analysis.xls) is used to create a processed dataset suitable for import into a Microsoft Access database. Step-by-step instructions follow:

Open the workbook called "Image Data Cruncher", being sure to enable macros and open as read-only. [This workbook has three worksheets, named: arvdata, allfields, and Data Analysis. The "arvdata" worksheet is a blank worksheet, "allfields" has information from Sigma-Genosys used to assign spot numbers to spot-array coordinates, and the "data-analysis" worksheet contains annotation information to assign the gene name to each spot on the image.] Open the .XLS file that contains the exported raw data from Array Vision. Select the entire data set by clicking on the upper left had box, and copy/paste this information into the blank "arvdata" worksheet beginning at cell A1. To run the macro, choose Tools on the toolbar, select Macros, highlight "AllDataCrunched7" and click Run. (These 6 subroutines can be run manually, in order, to learn how the macro works.) Save the crunched data as "condition-replicate.crunched" following your standardized file-naming scheme.

The second workbook (2-Replicate-Stats), is used to execute the student t-test on two replicate data sets for two conditions. [This workbook has four worksheets named, "Enter Data (2 replicates)" "P-Raw", "P-Ln", and "All Values".] Open the four crunched data files to be analyzed and the "2-Replicate-Stats" workbook. The four data sets are sequentially copied and pasted beginning with the first replicate of the control condition followed by the second replicate of the control, then the first replicate of the experimental, and the second replicate of the experimental. Select columns A-F of the crunched data set in the "Data Analysis" worksheet of the "condition replicate crunched file" and copy/paste into the corresponding six columns of the "Enter Data (2 Replicates)" worksheet in the "2-Replicate-Stats" workbook. The process is repeated for the remaining three replicates. To run the macro, choose Tools, select Macros, "AllAnalysis8", and click Run. Save the file as "condition-replicate stats", following your file-naming scheme.

The third workbook (Data Analysis) is used to execute two macros. The first calculates the log ratio and a second statistical metric, the standard deviation of the mean of the log ratios; the second macro creates an additional spreadsheet containing selected data columns for import into Microsoft Access. [The Data Analysis workbook has two worksheets named: "crunched data" and "DataAnalysis".] Open the "condition-replicate stats" file and the "Data Analysis" workbook. Copy columns A-F of the "All Values" worksheet in the "condition-replicate stats" workbook and paste into the "crunched data" worksheet of the "Data Analysis" workbook. To run the macro, choose Tools, select Macros, "AllAnalysis6", and click Run. Next, the P values calculated in the 2-Replicate-Stats workbook are copied to the Data Analysis workbook: copy columns H-I of the "All Values" worksheet in the "condition-replicate stats" workbook and paste into columns I-J of the "Data Analysis" worksheet of the "Data Analysis" workbook. To run the second macro, select Tools, select Macros, highlight "MakeDBsheet", and click Run. Save the file as "condition-replicate analysis", following your file-naming scheme. The threshold value can be adjusted by editing the “ThresholdRatios3” subroutine in the “AllAnalysis6” macro to use a cell either higher or lower than F501 [2^nd line: Range("F501").Select].

Combining multiple experiments in Microsoft Access: When multiple time points or multiple conditions need to be compared, it is useful to build a database. A sample database has been included here called "sampleDB". There are 3 tables in the sample database. This first one is a populated annotation table. In this table is important reference information, such as b#, array coordinate, gene, gene product, functional groupings, and accession numbers. Both the control and the test tables are populated with sample information. This information is identical to that contained in the "Data Analysis" worksheet. To see how the data tables are designed, highlight a table and click the “Design” button. Additional tables can be created and populated by copying/pasting the table (structure only), editing the design as necessary, and pasting the appropriate data from the worksheet created by MakeDB. Queries are designed by linking tables by b#, as shown in the "sampleDB". Simple queries can be run by manipulation of the parameters in query design. It is best that users obtain a third-party software manual. Data can be copy/pasted from Access into presentation software such as Spotfire (www.spotfire.com).

**Workbooks and Macros**
Workbook	Macro	Subroutine	Purpose
Image Data Cruncher	AllDataCrunched6	ArvAllSort1	associates array coordinate with spot number
Image Data Cruncher	AllDataCrunched6	Nameall2	associates spot number with unique identifier for gene
Image Data Cruncher	AllDataCrunched6	CalcPct3	normalizes data by expressing each spot as percentage of sum of all spot intensities
Image Data Cruncher	AllDataCrunched6	Cleanup4	reorganizes data and calculates avg values for duplicate spots
Image Data Cruncher	AllDataCrunched6	Statistics5	calculates averages of genomic DNA controls and blank spots
Workbook	Macro	Subroutine	Purpose
2-replicate-stats	AllAnalysis8	OrganizebySpotNo1	sorts each of the 4 data sets individually by spot number
2-replicate-stats	AllAnalysis8	CalculateAverages2	calculates averages of volumes and pct values for the control and test replicates
2-replicate-stats	AllAnalysis8	CopyValuesIntoPRaw3	copies and pastes pct values into a separate spreadsheet for calculation of p values
2-replicate-stats	AllAnalysis8	CalculateLn4	copies pct values into a separate spreadsheet and natural log transforms data
2-replicate-stats	AllAnalysis8	CalculatePRaw5	calculates the Pvalue for the raw data by application of the student t-test
2-replicate-stats	AllAnalysis8	CalculatePLn6	calculates the Pvalue for the log transformed data by application of the student t-test
2-replicate-stats	AllAnalysis8	CopyAllValues7	copies and pastes data used for ratio calculations into separate spreadsheet&
Workbook	Macro	Subroutine	Purpose
Data Analysis	AllAnalysis6	SpotSort1	sorts control and test data sets by spot number
Data Analysis	AllAnalysis6	DataSort2	copies data set into spread sheet used for ratio calculations, sorts by total pct value
Data Analysis	AllAnalysis6	ThresholdRatios3	calculates ratio of Test/Control using threshold of tot pct value for 500th lowest gene
Data Analysis	AllAnalysis6	Cleanup4	reorganizes data and calculates log (10) of ratio
Data Analysis	AllAnalysis6	Stats5	calculates standard deviation of log ratio values and correlation between Test and Control
Data Analysis	Manual Step		copy and paste special values Praw and PLn values from 2-replicate-stats to Data Analysis
Data Analysis	MakeDBsheet		reorganizes data and copies into separate spreadsheet for entry into Access database

Directory Structure:

Array Analysis
User Name
      Project Name
          Images (.gel TIFF image files)
          Export (.xls raw data files from ArrayVision)
          Crunched (.xls crunched image files created in Image Data Cruncher)
          Stats (.xls files with student t-test results created in 2-Replicate-Stats)
          Analysis (.xls files with ratio calculations created in Data Analysis)

Workbooks and Macros