The Big Tent

CTSI 2016 NIH Renewal Proposal Launchpad

Administrative Data Concierge Service

Challenge: Many UCSF researchers are interested in questions about human health and the delivery of health care services that could be studied using large administrative datasets, such as those generated by the Centers for Medicare and Medicaid Services (CMS), the California Office of Statewide Health Planning and Development (OSHPD), and agencies that collect vital statistics data (e.g., birth certificates, death certificates). While some UCSF researchers have conducted important research with these datasets, expanding the pool of researchers who work with them is challenging for several reasons. Organizations that produce these datasets often have difficulty responding to requests promptly due to limited resources and competing priorities. In addition, extensive programming is often required to transform the raw data into usable information. For some research questions, researchers also need to link vital statistics records with administrative data on the delivery of health care services. Because many of the publicly available versions of data sets are de-identified, linking such datasets typically relies upon probabilistic matching algorithms. The complexities and error-prone nature of probabilistic matching represents a barrier to the full exploitation of administrative and vital statistics data by researchers who are not experts in these techniques.

Solution - Data Concierge Service: Building upon resources for analysis of large, public datasets that are already available to UCSF researchers through the Comparative Effectiveness Large Dataset Analysis Core (CELDAC), this project would establish a concierge service that would assist UCSF researchers in accessing large administrative datasets. A ‘special access’ data manager who would simultaneously be employed by UCSF and agencies that collect administrative data (OSHPD, CA Department of Public Health, CMS) would link records across datasets. The data manager would have expertise in the use of deterministic and probabilistic matching algorithms to merge datasets using unique identifiers (where available) and other variables such as date, age, gender, and zip code. The data manager would generate customized datasets that are tailored to researchers’ specifications, create de-identified versions of them, and deliver them to requestors (probably through MyResearch). The data manager and CELDAC’s principal investigator would work with investigators to ensure they secure the approvals needed to analyze and report upon the data and serve as a liaison with data-providing agencies. This service could be funded in a manner similar to CTSI’s existing consultation services (CTSI subsidy for initial hour of consultation, recharge for subsequent hours of service). It could also be made available to researchers at other CTSAs to broaden the potential user base.

Potential Partners: Members of Stanford’s CTSA have expressed interest in collaborating with UCSF CTSI to enhance capacity to conduct research using secondary datasets. CTSAs at other UC campuses may be interested as may faculty and trainees in the School of Public Health at UC-Berkeley. The UC Research Exchange would be a valuable partner in this effort due to its experience in bringing UC campuses together to improve access to administrative data for health research. In addition, CELDAC’s principal investigator has good contacts with staff of OSHPD’s Healthcare Information Division who are interested in enhancing their ability to serve researchers and other customers.

Innovation: This proposal builds upon UCSF CTSI’s existing resources for conducting research with large administrative datasets by creating a data concierge service that would help UCSF researchers to more quickly obtain secondary datasets tailored to their specific research interests. If successful, CELDAC could be transformed from a conduit of information about datasets that other organizations generate to a concierge that works proactively with these organizations to help researchers at UCSF and potentially other CTSAs obtain the data they need for their research. Making requisite public data more accessible is expected to significantly expand the use of secondary data to address important hypotheses in public health and comparative effectiveness research. This innovation may be particularly valuable during the current contraction in national research funding.

Projected Impact: This project could enhance UCSF researchers’ ability to conduct timely research using large administrative datasets that would enhance our understanding of factors that affect human health. If successful, the project could serve as a model for other CTSAs.

Comments

this is an exellent idea to have someone with the expertise and security priveliges to create an honest broker system for many projects.

The Big Tent:  CTSI 2016 NIH Renewal Proposal Launchpad
Administrative Data Concierge
Submitted by Janet Coffman

  1. Summarize the problem being addressed.  Please make sure this is NOT disease-specific.
    • Researchers would benefit from improved access to large administrative data sets, including administrative data sets that are merged with each other (e.g., OSHPD, death records) and/or with investigators' primary data
    • Linking such datasets often requires probabilistic matching algorithms that are difficult to implement—imposing a barrier to research progress.
  2. Summarize the solution being proposed. Please make sure this is NOT disease-specific, although you can provide examples of specific test cases.
    • Establish a concierge service with a 'special access' data manager who would
    • Access identified administrative data to allow for deterministic data merging
    • Use probabilistic matching algorithms to merge de-identified datasets
    • Create de-identified versions of datasets customized to researchers’ needs
  3. What partners are involved in the solutions
    • UCSF CTSI Comparative Effectiveness Large Dataset Analysis Core
    • CTSAs at other UCs (including UC Research Exchange) and Stanford CTSA
    • State and federal agencies that collect vital statistics and administrative data
  4. What is the potential impact?
    • Enhance UCSF researchers’ ability to efficiently leverage multiple existing data sources to conduct relatively low cost/high impact research
    • Improve understanding of factors that affect human health
    • Serve as a model for other CTSAs

Group 9-1

1.     How do we maximize impact and broad applicability of the proposal? 

Develop an Open Source catalog that is cheap, widely applicable, and easily accessible to a wide variety of large datasets.  The catalog must be a curated, searchable data dictionary. Identify experts and a way to network people with the data. 

 

 2.     What foundation exists on campus already that will ensure success of the initiative?

 

UCSF already has an inventory of datasets that can be leveraged.  Use Profiles website to connect investigators to data sets.  Expand UCSF Profiles to include data managers and other applicable individuals to connect expertise of how to access and link to Data Management tools.  Build network.  Identify databases, expand availability of data, match database to PIs.  Use CTSI design consultant service and leverage individual subject experts

 

 3.     What creative and/or innovative partnerships could be leveraged to ensure success?

Leverage UC BRAID and other CTSAs.  Partner with NIH and congress, partner with dataset providers to develop a standard for how the data is entered.  Link with expert from industry, such as Google or Amazon web services, to create a searchable database to match data with investigators.  Connect to health industry exchanges, Create a template agreement with data set owners.

The Big Tent:  CTSI 2016 NIH Renewal Proposal Launchpad

Notes submitted on behalf of Group 9-2



How do we maximize impact and broad applicability of the proposal?

What types of data?  Vital stats, death records, hospital discharge records, ED dataset, medicare claims.  Curate data.  Make sure data answers question before starting.  Improve access to data. 

 

Develop an Open Source catalog that is cheap, widely applicable, and easily accessible to a wide variety of large datasets.  The catalog must be a curated, searchable data dictionary. Identify experts and a way to network people with the data. 

 

What foundation exists on campus already that will ensure success of the initiative?

UCSF already has an inventory of datasets that can be leveraged.  Use Profiles website to connect investigators to data sets.  Expand UCSF Profiles to include data managers and other applicable individuals to connect expertise of how to access and link to Data Management tools.  Build network.  Identify databases, expand availability of data, match database to PIs.  Use CTSI design consultant service and leverage individual subject experts

 

What creative and/or innovative partnerships could be leveraged to ensure success?

Leverage UC BRAID and other CTSAs.  Partner with NIH and congress, partner with dataset providers to develop a standard for how the data is entered.  Link with expert from industry, such as Google or Amazon web services, to create a searchable database to match data with investigators.  Connect to health industry exchanges, Create a template agreement with data set owners.

Some thoughts from CTSI Retreat / Leadership panel with UCSF Deans:

 

Data management and analytics needs are great and silos are difficult to overcome.  It is critical to enhance the institutional commitment to data analytics even though there are regulatory and security issues that present challenges.  A concierge service should focus on how any particular dataset is relevant to a PI or research area. We need an internal component to give faculty skills to navigate these databases.

Commenting is closed.