Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach
One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificity Set Cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the Maximum Likelihood Estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://ppi.cse.nd.edu.