Revealing Missing Parts of the Interactome via Link Prediction
Protein interaction networks (PINs) are often used to “learn” new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared immediate neighborhoods of the nodes to be linked. As such, they have limitations. Thus, in order to comprehensively study what are the topological properties of nodes in PINs that dictate whether the nodes should be linked, we introduce novel sensitive LP measures that are expected to overcome the limitations of the existing methods.
We systematically evaluate the new and existing LP measures by introducing “synthetic” noise into PINs and measuring how accurate the measures are in reconstructing the original PINs. Also, we use the LP measures to de-noise the original PINs, and we measure biological correctness of the de-noised PINs with respect to functional enrichment of the predicted interactions. Our main findings are: 1) LP measures that favor nodes which are both “topologically similar” and have large shared extended neighborhoods are superior; 2) using more network topology often though not always improves LP accuracy; and 3) LP improves biological correctness of the PINs, plus we validate a significant portion of the predicted interactions in independent, external PIN data sources.
Ultimately, we are less focused on identifying a superior method but more on showing that LP improves biological correctness of PINs, which is its ultimate goal in computational biology. But we note that our new methods outperform each of the existing ones with respect to at least one evaluation criterion. Alarmingly, we find that the different criteria often disagree in identifying the best method(s), which has important implications for LP communities in any domain, including social networks.