An Ensemble Topic Model for Sharing Healthcare Data and Predicting Disease Risk

Andrew Rider and Nitesh V. Chawla
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. ACM, 2013.
Publication Date: 
September, 2013

With the recent signing of the Affordable Care Act into law, the use of electronic medical data is set to become ubiquitous in the United States. This presents an unprecedented opportunity to use population health data for the benefit of patient-centered outcomes. However, there are two major hurdles to utilizing this wealth of data. First, medical data is not centrally located but is often divided across hospital systems, health exchanges, and physician practices. Second, sharing specific or identifiable information may not be allowed. Moreover, organizations may have a vested interest in keeping their data sets private as they may have been gathered and curated at great cost. We develop an approach to allow the sharing of beneficial information while staying within the bounds of data privacy. We show that the use of a probabilistic graphical model can facilitate effective transfer learning between distinct healthcare data sets by parameter sharing while simultaneously allowing us to construct a network for interpretation use by domain experts and the discovery of disease relationships. Our method utilizes aggregate information from distinct populations to improve the estimation of patient disease risk.