Web supplement to
"Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug action"

Maureen E. Hillenmeyer, Elke Ericson, Ronald W. Davis, Corey Nislow, Daphne Koller, Guri Giaever

Data download

The analyses for this study were performed using data from Hillenmeyer et al., "The Chemical Genomic Portrait of Yeast: Uncovering a Phenotype for all Genes", Science 2008. Original fitness data is available for download from that paper's supplement.

Co-fitness of gene pairs

Co-inhibition of chemical pairs

Drug target prediction

We used two separate training sets:
  1. Yeast: compound_target_training_set_yeast.txt

    Training set of compound-target interactions from an expert-curated set of known interactions

  2. DrugBank: compound_target_training_set_DrugBank.txt

    Training set of compound-target interactions extrapolated from their homologs that existed in DrugBank.

Each training set was used to learn a Random Forest model of compound-target interactions, as described in the main text. The model was then applied to a test set comprising all possible interactions (all compounds with all heterozygous yeast strains). The following two files list the predicted confidence values (when the algorithm was able to make a prediction) for those test set interactions.
  1. Yeast: compound_target_predictions_RandomForest_yeast.txt

  2. DrugBank: compound_target_predictions_RandomForest_DrugBank.txt

The format of these files is a tab-delimited list of protein-compound interactions. Each row is an interaction, with columns: The input features are described more fully in the Materials and Methods in the main text.

Note that these prediction files include all possible interactions, not only the high-confidence ones. To filter the list to the highest-confidence interactions, we applied filters using the following criteria (described in the Materials and Methods in the main text):
(1) the gene was essential or showed a fitness defect as a homozygous deletion strain in the absence of compound, (2) the confidence value of predicted interaction (from the Random Forest algorithm) was >= 0.7 out of 1, high fitness defect (log ratio >= 5), (3) the compound was not a frequently-predicted interactor (i.e., appeared in less than 1000 total predictions), and (4) the protein and compound were reciprocal top 10 sensitivity hits of each other, as determined by examination of the protein and compound in FitDB. This yielded 12 pairs (Supplementary Table 1).

Inquiries can be addressed to guri.giaever@utoronto.ca.