2013 CTSI Annual Pilot Awards to Improve the Conduct of Research

To facilitate the development, conduct or analysis of clinical & translational research

Search Engine to Directly Access EHRs without Data Warehouses or Loss of Unstructured Data

Proposal Status: 

Search Engine to Directly Access EHRs without Data Warehouses or Loss of Unstructured Data


Research hospitals, including UCSF, spend millions of dollars a year moving data from EHRs (electronic health records) to alternate data warehouses for analytics. Consultants are hired to map data to and from warehouses. As a result of this time and labor-intensive process, a search query can take up to two months until completion. A query does not even produce a complete set of patients matching the search criteria because data stored in hospitals’ legacy systems, or entered as physician notes in EHRs, lack clinical context and/or code association. Consequently, cohort identification and statistical correlations are severely limited in the conduct of clinical and translational research. As the adoption of EHRs grows, healthcare organizations need analytic tools that can aggregate data from disparate sources so that they have a more complete and comprehensive view of individual patients and patient populations.
Massive Minable Medical Data (M3D) is being developed as a search engine for clinical researchers at UCSF to search and analyze information directly from EHRs, without the laborious, time-consuming, and expensive mapping of data to new data warehouses. It offers fast and accurate data retrieval from natural language queries posed by the researcher. Natural language processing allows M3D users to ask intelligent questions to discover correlations, e.g. "how many patients with heart disease were taking Vioxx?" Direct integration with EHRs will save UCSF data moving expenses, which can potentially reduce cost by an order of magnitude. The specific difficulty with large EHR vendors, such as Epic,  is handling a system of many thousand database tables. To address this, M3D’s has developed an algorithm that uses machine learning to identify which tables contain relevant data- even when the researcher doesn’t.
Metrics for Success
1. Identify high priority query types
2. Secure access to representative datasets for validation
3. Pilot study/ adoption by clinical researchers
4. Expansion of capacities for physicians
5. Pilot study/ adoption by physicians

Success is the widespread adoption of M3D for accessing the wealth of information contained in EHRs at UCSF. We are currently seeking feedback from the medical community, specifically clinical researchers, regarding the the highest priority types of data/queries. Validation of the search engine on these query types are prerequisites to a pilot study with active researchers. Access to representative datasets for development and validation of the search engine is a prerequisite to a pilot study with active researchers. Once the design can be refined to satisfaction of clinical researchers, the process will be repeated with a broader set of physicians at UCSF.
Initial costs will be focused on securing the proper legal standing to work with patient data on Epic systems through either professional legal counsel or Epic certification. Important, but technically simplistic aspects of the search engine, such as the user interface, will be contracted out as needed.
Michael Sachs- UCSF BMS graduate student

Jingwei Zhang- UCSF/UCB BioEng graduate student

Sabrina Atienza- UCB CompSci

2013 George Ramonov- UCB CompSci 2013

Anil Sethi- Founded Sequoia Software, current CEO Gliimpse, 20+ years experience in Healthcare IT.

Commenting is closed.