
Protocol
for Analysis of E. coli Membrane Array Data
Statistical Considerations:
We routinely obtain at least 2 replicates of each experimental condition.
The membrane arrays have duplicate spots for each gene and each spot is
considered to be a separate determination. We normalize individual hybridization
replicates by expressing the signal intensity for each spot as a percentage
of the sum of all spot intensities on the array (global normalization).
The uncertainty that lies behind individual gene measurements can be variously
calculated as the standard deviation of the determinations or the coefficient
of variation can be used. Since we are usually interested in the statistical
significance of differences between an experimental condition and a control,
we prefer the student ttest as a means for calculating the probability,
based on the uncertainty of the measurements in both conditions, that
gene expression differs significantly between the conditions. A conservative
approach is to consider significance at a level of P<0.0002, where
false positives might be expected for 1 in 5000 genes. We use the standard
deviation of the mean of the log ratios, within the context of the P value,
to indicate significant up or downregulation of gene expression. The
standard deviation for the log ratios is calculated and usually those
genes that differ by more than 3 standard deviations (99.9% confidence
in each tail) from the mean of the log ratio (usually zero, or no change)
are considered. This approach is meaningful where the expression level
of the majority of genes does not change significantly between conditions
and where the researcher is interested in genes that show substantially
different expression.
Downloads:
The following downloads provide analysis tools (and examples) for
processing membrane array data in a WindowsPC environment using
Microsoft Office tools. The macros embedded in these Microsoft Excel
workbooks were designed to work with SigmaGenoSys
Panorama E. coli gene arrays. The macros are written in Visual Basic
and can be modified in VB Editor for use with other array designs.
6. If the array is stripped, proceed to a new hybridization experiment.
Store unused arrays in Saran Wrap at 4oC.
Array Data Analysis in Excel:
We have developed a series of manually executed macros, written
in Visual Basic for Microsoft Excel, that allow for rapid, semiautomated
gene expression data analysis. The first VB macro, (embedded in
Image Data Cruncher.xls)
assigns a spot number to each spot by its coordinate (location)
on the array, assigns gene names and associated annotation information
to spot numbers, and normalizes individual hybridization replicates
by expressing the signal intensity for each spot as a percentage
of the sum of all spot intensities. The second macro (embedded in
2ReplicateStats.xls)
streamlines statistical analysis of experimental replicates by application
of the student ttest to calculate pvalues that indicate confidence
levels for the significance of the ratio of experimentals to controls,
based on the uncertainty of individual gene measurements. The third
macro (embedded in Data Analysis.xls)
introduces a threshold value, calculates expression ratios between
experimental and control conditions, and calculates an additional
statistical metric, the standard deviation of the mean of the log
ratios. A fourth macro (embedded in Data
Analysis.xls) is used to create a processed dataset suitable
for import into a Microsoft Access database. Stepbystep instructions
follow:
 Open the workbook called "Image
Data Cruncher", being sure to enable macros and open as readonly.
[This workbook has three worksheets, named: arvdata, allfields,
and Data Analysis. The "arvdata" worksheet is a blank worksheet,
"allfields" has information from SigmaGenosys used to assign
spot numbers to spotarray coordinates, and the "dataanalysis"
worksheet contains annotation information to assign the gene name
to each spot on the image.] Open the .XLS file that contains the
exported raw data from Array Vision. Select the entire data set
by clicking on the upper left had box, and copy/paste this information
into the blank "arvdata" worksheet beginning at cell A1. To run
the macro, choose Tools on the toolbar, select Macros, highlight
"AllDataCrunched7" and click Run. (These 6 subroutines can be
run manually, in order, to learn how the macro works.) Save the
crunched data as "conditionreplicate.crunched" following your
standardized filenaming scheme.
 The second workbook (2ReplicateStats),
is used to execute the student ttest on two replicate data sets
for two conditions. [This workbook has four worksheets named,
"Enter Data (2 replicates)" "PRaw", "PLn", and "All Values".]
Open the four crunched data files to be analyzed and the "2ReplicateStats"
workbook. The four data sets are sequentially copied and pasted
beginning with the first replicate of the control condition followed
by the second replicate of the control, then the first replicate
of the experimental, and the second replicate of the experimental.
Select columns AF of the crunched data set in the "Data Analysis"
worksheet of the "condition replicate crunched file" and copy/paste
into the corresponding six columns of the "Enter Data (2 Replicates)"
worksheet in the "2ReplicateStats" workbook. The process is
repeated for the remaining three replicates. To run the macro,
choose Tools, select Macros, "AllAnalysis8", and click Run. Save
the file as "conditionreplicate stats", following your filenaming
scheme.
 The third workbook (Data Analysis)
is used to execute two macros. The first calculates the log ratio
and a second statistical metric, the standard deviation of the
mean of the log ratios; the second macro creates an additional
spreadsheet containing selected data columns for import into Microsoft
Access. [The Data Analysis workbook has two worksheets named:
"crunched data" and "DataAnalysis".]
Open the "conditionreplicate stats" file and the "Data
Analysis" workbook. Copy columns AF of the "All Values"
worksheet in the "conditionreplicate stats" workbook
and paste into the "crunched data" worksheet of the
"Data Analysis" workbook. To run the macro, choose Tools,
select Macros, "AllAnalysis6", and click Run. Next,
the P values calculated in the 2ReplicateStats workbook are
copied to the Data Analysis workbook: copy columns HI of the
"All Values" worksheet in the "conditionreplicate
stats" workbook and paste into columns IJ of the "Data
Analysis" worksheet of the "Data Analysis" workbook.
To run the second macro, select Tools, select Macros, highlight
"MakeDBsheet", and click Run. Save the file as "conditionreplicate
analysis", following your filenaming scheme. The threshold
value can be adjusted by editing the “ThresholdRatios3”
subroutine in the “AllAnalysis6” macro to use a cell
either higher or lower than F501 [2^{nd} line: Range("F501").Select].
 Combining multiple experiments in Microsoft Access: When multiple
time points or multiple conditions need to be compared, it is
useful to build a database. A sample database has been included
here called "sampleDB". There
are 3 tables in the sample database. This first one is a populated
annotation table. In this table is important reference information,
such as b#, array coordinate, gene, gene product, functional groupings,
and accession numbers. Both the control and the test tables are
populated with sample information. This information is identical
to that contained in the "Data Analysis" worksheet. To see how
the data tables are designed, highlight a table and click the
“Design” button. Additional tables can be created and populated
by copying/pasting the table (structure only), editing the design
as necessary, and pasting the appropriate data from the worksheet
created by MakeDB. Queries are designed by linking tables by b#,
as shown in the "sampleDB". Simple
queries can be run by manipulation of the parameters in query
design. It is best that users obtain a thirdparty software manual.
Data can be copy/pasted from Access into presentation software
such as Spotfire (www.spotfire.com).
Workbooks and Macros
Workbook 
Macro 
Subroutine 
Purpose 
Image Data
Cruncher 
AllDataCrunched6 
ArvAllSort1 
associates
array coordinate with spot number 
Image Data
Cruncher 
AllDataCrunched6 
Nameall2 
associates
spot number with unique identifier for gene 
Image Data
Cruncher 
AllDataCrunched6 
CalcPct3 
normalizes
data by expressing each spot as percentage of sum of all spot
intensities 
Image Data
Cruncher 
AllDataCrunched6 
Cleanup4 
reorganizes
data and calculates avg values for duplicate spots 
Image Data
Cruncher 
AllDataCrunched6 
Statistics5 
calculates
averages of genomic DNA controls and blank spots 
Workbook 
Macro 
Subroutine 
Purpose 
2replicatestats 
AllAnalysis8 
OrganizebySpotNo1 
sorts each
of the 4 data sets individually by spot number 
2replicatestats 
AllAnalysis8 
CalculateAverages2 
calculates
averages of volumes and pct values for the control and test
replicates 
2replicatestats 
AllAnalysis8 
CopyValuesIntoPRaw3 
copies and
pastes pct values into a separate spreadsheet for calculation
of p values 
2replicatestats 
AllAnalysis8 
CalculateLn4 
copies pct
values into a separate spreadsheet and natural log transforms
data 
2replicatestats 
AllAnalysis8 
CalculatePRaw5 
calculates
the Pvalue for the raw data by application of the student ttest 
2replicatestats 
AllAnalysis8 
CalculatePLn6 
calculates
the Pvalue for the log transformed data by application of the
student ttest 
2replicatestats 
AllAnalysis8 
CopyAllValues7 
copies and
pastes data used for ratio calculations into separate spreadsheet& 
Workbook 
Macro 
Subroutine 
Purpose 
Data
Analysis 
AllAnalysis6 
SpotSort1 
sorts
control and test data sets by spot number 
Data
Analysis 
AllAnalysis6 
DataSort2 
copies
data set into spread sheet used for ratio calculations, sorts
by total pct value 
Data
Analysis 
AllAnalysis6 
ThresholdRatios3 
calculates
ratio of Test/Control using threshold of tot pct value for 500th
lowest gene 
Data
Analysis 
AllAnalysis6 
Cleanup4 
reorganizes
data and calculates log (10) of ratio 
Data
Analysis 
AllAnalysis6 
Stats5 
calculates
standard deviation of log ratio values and correlation between
Test and Control 
Data
Analysis 
Manual
Step 

copy
and paste special values Praw and PLn values from 2replicatestats
to Data Analysis 
Data
Analysis 
MakeDBsheet 

reorganizes
data and copies into separate spreadsheet for entry into Access
database 
Directory Structure:
Array Analysis
User Name
Project Name
Images (.gel TIFF image files)
Export (.xls raw data files from ArrayVision)
Crunched (.xls crunched image files created in Image
Data Cruncher)
Stats (.xls files with student ttest results created in 2ReplicateStats)
Analysis (.xls files with ratio calculations created in Data
Analysis)
