The analyses for this study were performed using data from Hillenmeyer et al., "The Chemical Genomic Portrait of Yeast: Uncovering a Phenotype for all Genes", Science 2008. Original fitness data is available for download from that paper's supplement.
Co-fitness of gene pairs
- Heterozygous deletion experiments: het.ratio_result_nm.goodbatch.cofitness.txt
Pearson correlation values for pairs of gene deletion strains across all heterozygous experiments in Hillenmeyer et al., Science 2008. The term "goodbatch" in the filename refers to the exclusion of problematic batches, as described in Hillenmeyer et al., Science 2008.
- Homozygous deletion experiments: hom.ratio_result_nm.cofitness.txt
Pearson correlation values for pairs of gene deletion strains across all homozygous experiments in Hillenmeyer et al., Science 2008.
Co-inhibition of chemical pairs
- Heterozygous deletion experiments: het.ratio_result_nm.goodbatch.coinhibition.txt
Pearson correlation values for pairs of heterozygous experiments across all gene deletion strains in Hillenmeyer et al., Science 2008. The term "goodbatch" in the filename refers to the exclusion of problematic batches, as described in Hillenmeyer et al., Science 2008.
- Homozygous deletion experiments: hom.ratio_result_nm.coinhibition.txt
Pearson correlation values for pairs of homozygous experiments across all gene deletion strains in Hillenmeyer et al., Science 2008.
Drug target prediction
We used two separate training sets:
- Yeast: compound_target_training_set_yeast.txt
Training set of compound-target interactions from an expert-curated set of known interactions
- DrugBank: compound_target_training_set_DrugBank.txt
Training set of compound-target interactions extrapolated from their homologs that existed in DrugBank.
Each training set was used to learn a Random Forest model of compound-target interactions, as described in the main text. The model was then applied to a test set comprising all possible interactions (all compounds with all heterozygous yeast strains). The following two files list the predicted confidence values (when the algorithm was able to make a prediction) for those test set interactions.
- Yeast: compound_target_predictions_RandomForest_yeast.txt
- DrugBank: compound_target_predictions_RandomForest_DrugBank.txt
The format of these files is a tab-delimited list of protein-compound interactions. Each row is an interaction, with columns:
- prediction score
- input features (fitness_defect_pvalue, fitness_defect_ratio, gene_sens_freq, drug_inhib_freq, hompheno, struct_enrichment, struct_count, struct_similarity, fitness_defect_pvalue_0, fitness_defect_pvalue_1, fitness_defect_pvalue_2, fitness_defect_pvalue_3, fitness_defect_pvalue_4, fitness_defect_pvalue_5, fitness_defect_pvalue_6, fitness_defect_pvalue_7, fitness_defect_pvalue_8, fitness_defect_pvalue_9, fitness_defect_pvalue_secondary_mean, fitness_defect_pvalue_secondary_median).
- compound prediction frequency (total number of predictions in which this compound appeared)
Note that these prediction files include all possible interactions, not only the high-confidence ones. To filter the list to the highest-confidence interactions, we applied filters using the following criteria (described in the Materials and Methods in the main text):
(1) the gene was essential or showed a fitness defect as a homozygous deletion strain in the absence of compound, (2) the confidence value of predicted interaction (from the Random Forest algorithm) was >= 0.7 out of 1, high fitness defect (log ratio >= 5), (3) the compound was not a frequently-predicted interactor (i.e., appeared in less than 1000 total predictions), and (4) the protein and compound were reciprocal top 10 sensitivity hits of each other, as determined by examination of the protein and compound in FitDB. This yielded 12 pairs (Supplementary Table 1).
Inquiries can be addressed to firstname.lastname@example.org.