There is a growing need to streamline the management and accessibility of large disparate databases. This project uses a technology infrastructure leveraging ontologies and vocabularies to integrate cardiovascular data elements from three, large, multi-center epidemiology studies using the latest informatics tools. This project provides a platform to facilitate the integration, comparison and joint analyses of CVD data from three studies, providing a powerful informatics tool that will benefit the entire scientific community.
Introduction It is increasingly recognized that evaluating genetic variants for common complex diseases requires massive sample sizes, which can only be derived by combining data across multiple epidemiological studies. Semantic integration of data across studies using ontologies can maximize the statistical power and maintain the discriminating information between studies. While standard ontologies for molecular data are currently in use (e.g. Gene Ontology), associated ontologies for trait domains such cardiovascular (CVD) risk factors are lagging in development. Towards this end, we are developing ontologies for CVD risk factors to describe the clinical measurement, conditions and measurement methods using common tools such as OBO (Open Biomedical Ontologies). These CVD ontologies are being applied to three large epidemiological family studies, which are semantically annotated to create common data elements. A clinical database management tool was adopted to integrate the data ontologies across the studies using a technology infrastructure that leverages caBIGTM and can be fully integrated with other federated grid-based informatics initiatives such as the Cardiovascular Research Grid (CVRG) (http://cvrgrid.org/) and the Biomedical Informatics Research Network (BIRN) (http://www.nbirn.net/index.shtm).
Data Dissemination An interactive webpage has been created and a clinical database management system utilized to facilitate seamless data sharing and promote widespread data dissemination among research communities cutting across clinical, translational and epidemiological domains, including data sharing plans and strategic approaches for receiving and incorporating input from relevant “stakeholders”. At the website (cover.wustl.edu/Cover) the COVER tool used to gather general project information, study specific overview with links to detailed study information, and searchable count data. The site is specifically designed for users to query the data to determine the number of participants with the desired attributes in the studies. Hence, an individual may use the look-up table to determine the feasibility of using data from one or more of the studies. Access to the raw data can then be requested and accessed through ClinPortal, a clinical database management system or through coordinating centers of the original studies.
Conclusion Such ontologies and web-based data-mining tools serve a critical need in the field by providing a platform to facilitate the integration, comparison and joint analyses of CVD data from across studies, thus providing a powerful informatics tool that will benefit the entire scientific community. Additionally, this study serves as a prototype for the integration of additional cardiovascular studies.