Web supplement to
"A unique and universal molecular barcode array"

Sarah E. Pierce, Eula L. Fung, Daniel F. Jaramillo, Angela M. Chu, Ronald W. Davis, Corey E. Nislow, Guri N. Giaever

Nature Methods 3, 601 - 603 (2006)
doi:10.1038/nmeth905

Analysis of microarray results:

Outlier masking:

Notes: The masking algorithm is also provided in the accompanying MATLAB scripts. This script also generates a heatmap of the array showing how each probe compares to its replicates. This view makes array defects clearly visible.

Array normalization:

Notes: This method of normalizing is dependent on a standard curve to which the arrays are normalized (the standard curve used above is the median of the control arrays). To keep this curve from changing over time, it is best to calculate one standard curve from a set of arrays and keep it for normalizing future arrays.

Only experiments with a similar distribution of tag intensities can be normalized together. For example, het and hom experiments must be normalized separately, and experiments with different generation times should also not be normalized together.

Non-parametric analysis (best for large-scale studies):

Notes: This method works best for large scale studies where it is possible to generate a set of control arrays to use against many treatment sets (> 10 control arrays). Although a large number of control arrays are required, one set of controls can be used to analyze many experimental arrays. One major benefit is that the control arrays do not need to be processed on the same day as the drug arrays to obtain a good result.

One caveat is that it is important that the control arrays represent as diverse a set of samples as the treatment arrays (cells grown on different days, tag PCRs done in different runs etc.) otherwise the standard deviation for tags in the control set will be deceptively small, making strains appear more sensitive/resistant than they actually are.

CelCompare (small scale studies):

Notes: This method works best for small-scale studies where it would be inconvenient to generate the large number of control arrays required for non- parametric analysis. Because only a small number of control arrays are used, it is best to use control and drug arrays that were processed together (cells grown on the same day, PCR amplified together, etc.) to minimize any variation between the control and drug samples that is not related to the treatment.

Any strains for which the treatment value is indistinguishable from background have reached there maximum sensitivity score, so they may actually be more sensitive than they appear in your data. To resolve the sensitivity of these strains, sample earlier time points or examine the growth of the strain individually as described below.

The data for strains with low representation in the pool is prone to noise due to increased sampling error. One class of strains that is especially prone to this problem is the slow-growing strains1, so data from these strains should be carefully confirmed.

References

  1. Deutschbauer, A.M. et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics (2005).
  2. Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185-93 (2003).
Inquiries can be addressed to sepierce@stanford.edu.