Artificial Intelligence / Machine Learning Demonstration Projects 2025

Crowdsourcing ideas to bring advances in data science, machine learning, and artificial intelligence into real-world clinical practice.

Enhancing Efficiency and Impact: AI-Powered Eligibility Assessment for Adult Complex Care Management

Proposal Status: 
  1. The UCSF Health problem.    

The UCSF Office of Population Health (OPH) Complex Care Management (CCM) team provides advanced care management services to adult patients with high medical and/or psychosocial complexity who are high utilizers of inpatient or emergency department (ED) services. The CCM program involves essential high-touch support such as assessing individual patients’ healthcare challenges, developing targeted care plans, providing health education and coaching, coordinating linkage to care, and connecting to other community resourcesPrior analysis of this program’s outcomes showed a statistically significant decrease in ED and observation encounters for patients enrolled in the programThe impact of this program therefore has significant potential to help address UCSF Health’s current ED crowding and bed capacity challenges, reduce readmission rates, and help meet quality metrics associated with specific patient populations. 

Currently, one of the most time-consuming challenges faced by the CCM team is determining individual patient eligibility for the CCM program. Despite using a reporting workbench that identifies patients meeting initial objective criteria, the team must still manually chart review to determine eligibility, which can consume up to 30 minutes per patientThis manual process can also be inconsistent, as it has been noted that different reviewers assess medical and social complexities differently based on their training and clinical background. 

Ideally, much of the current time spent by the OPH CCM team on manual chart review could instead be spent on higher-yield patient care activities, expanding the team’s ability to manage a higher volume of patients without requiring additional staff, as well as allowing its team members to practice at the top of their licenses. Additionally, standardizing definitions across patient enrollment criteria and workflows around the patient identification process would reduce variability between team members in applying inclusion and exclusion criteria.  

We believe that AI can be used to optimize these areas of need and have done preliminary work to show how this can apply. Through the AI demonstration project, we hope to validate our initial prototype; build an automated pipeline at scaledesign, implement, and study appropriate clinical workflows; and work with appropriate governance committees to ensure safe and effective use. While this proposal is initially focused on addressing the needs of the CCM program in OPH, we believe that this model can be scaled to multiple other programs within OPH who would benefit from improved efficiency in patient identification processes.  

This work is particularly important given the current environment for UCSF Health. As our health system grows, we may not be able to hire more staff, and so will need to be more efficient. If successful, this initiative will allow CCM team members to re-allocate approximately one day per week to seeing additional patients instead of performing chart reviews. 

  1. How might AI help?   

To address these challenges, we built an initial prototype using prompt engineering in Versa to optimize the CCM patient eligibility determination process. This AI system mimics the current workflow of CCM staff, reviewing relevant notes for complex patients with high inpatient and ED utilization within a set time period for specific inclusion and exclusion criteria. Specifically, patients must meet at least one of six indicators of medical or psychosocial complexity. These indicators include:

  1. Uncontrolled chronic medical conditions
  2. Use of high-risk medications
  3. Evidence of barriers to medical adherence or limited health literacy
  4. Lack of social support
  5. Older age with frailty or poor functional status or cognitive impairment
  6. Social determinants of health (SDoH) challenges

We completed an initial assessment comparing LLM-generated eligibility determinations with those from manual reviews using 27 distinct patient charts. Our findings revealed 100% sensitivity and 77% specificity in eligibility determination. The lower specificity was attributed to over-inclusion, as the initial prompt lacked explicit instructions on defining medical and social complexity. Patients with lower complexity levels were incorrectly classified as eligible. We expect that further refinements will lead to better specificity. During this initial assessment, we also observed the AI system’s ability to standardize the review process, as it identified 3 patients who were initially misclassified for eligibility upon re-review by the CCM team. 

The initial results from this AI system were received positively by the CCM team and OPH leadership. Together, we believe that this can be used as an adjunct tool to significantly streamline the time spent on and improve the overall quality of eligible patient review. 

  1. How would an end-user find and use it 

Twice a month, the CCM team reviews a list of patients who have met certain utilization thresholds and other simple inclusion criteriaand subsequently performs chart review for the preceding 6 months of data to determine whether these patients are eligible for enrollment in the CCM program. Approximately 73 patients are reviewed every month. For each patient, the AI system described above would review and synthesize relevant patient notes to summarize findings for the CCM teamIt would generate the following outputs: eligibility determination, specific inclusion and exclusions that led to that determination, as well as which noteinformed those inclusion and exclusion decisions (see question 4).This setup increases transparency in the AI system, as it can show how it reached its conclusions as well as where to find relevant information for further review. This output should ideally be integrated into the existing Epic Reporting Workbench report used by the CCM team, as it includes patient data and patient outreach toolsCCM team members would then use this information to assist in their review.  

  1. Embed a picture of what the AI tool might look like 

Please see above for an example of what the output from the AI system might look like. This data would ideally be integrated into an Epic Reporting Workbench report. Here, we only show 1 exclusion criteria for illustrative purposes. In reality, the output would include many more criteria currently utilized by the CCM team.  

  1. What are the risks of AI errors 

False negatives, false positives, and hallucinations are critical errors we will monitor for in this AI system. These errors can arise at multiple levels, such as when the LLM is outputting answers to individual inclusion and exclusion criteria, and when it is giving a final eligibility assignment for each patient.  As we envision this system being used to augment the review process rather than automate it, it is particularly important to monitor the false negative, false positive, and hallucination rates in its responses to individual inclusion and exclusion criteria, as we anticipate the CCM team to primarily rely on these responses to optimize their review. Large error rates in any of these categories will limit the usefulness of the AI system.  

Including notes from CareEverywhere in the AI system will also be fundamental to ensuring accuracy of the AI outputAs these notes are also included in review by CCM staff in their current processes 

The only way to measure if these errors are occurring is to retain some number of complete manual chart reviews per month (i.e.10-20) and compare the AI system’s performance to this gold standard. In terms of mitigation strategies, there are a few possible strategies, including: (1) For criteria that includes math/logic/structured data, leverage traditional reporting rather than generative AI. (2) Prioritize low false negative and hallucination rates over false positive rates, if able. (3) When new patterns of error are discovered, use iterative cycles of prompt engineering to attempt to resolve the error. 

  1. How will we measure success 

Please see below for lists of measures being tracked already, as well as measures that we want to track as part of the AI deployment. For this project to be successful, the AI system should be accurate, increase efficiency/satisfaction, increase the capacity of the CCM team, and be low cost to deploy and maintain relative to its benefits.  

  1. Measurements using data that are already being collected 
  • Number of enrolled patients per CCM staff 
  • Number of San Francisco Health Plan patient referrals and acceptances per month 
  • Outcomes of patients deemed eligible (e.g. completed, declined, dropped engagement, etc) 
  1. Other measurements to evaluate success of the AI 

  • Rates of sensitivity, false positive,false negative, and hallucinationof AI system per month compared to manual review gold standard 
  • Average and total time required per patient to review chart to determine enrollment eligibility(pre/post) 
  • Adoption rate of Versa API per clinical staff member per month 
  • Versa API cost per month 
  • Staff satisfaction (pre/post)  
  • Number of inappropriate eligibility determinations requiring secondary review per month  
  1. Describe your qualifications and commitment:  

Executive Sponsor: Tim Judson, MD, Interim Chief Population Health Officer 

Co-project leads: Esther Hsiang, MD, MBA, Interim Medical Director of Care Delivery Transformation, UCSF Office of Population Health and Leo Liu, MD, MS, FAMIA, Associate Program Director, Clinical Informatics Fellowship.  

Esther Hsiang leads the Innovations team in OPH in designing, implementing, and evaluating strategic initiatives related to new models of care delivery at scale. She is involved in partnering with OPH clinical programs to assess program outcomes and assessing how technology can improve clinical workflows and expand patient capacity. Her clinical experience as a hospital medicine physician with internal medicine training allows for a deep understanding of challenges in navigating inpatient and outpatient systems for medically complex patients. 

Leo Liu is an Applied ML Scientist who has developed many AI tools at UCSF, including a sepsis prediction model that is now in silent study in APeX. Leo has helped mentor CI fellows Sara Faghihi Kashani and Abi Fadairo-Azinge in the development of the pilot AI system above. In addition, Leo has published on evaluating ML models for clinical practice[1], as well as created the concept of Sabercaremetrics[2] – novel metrics to better measure clinical performance.  

Esther and Leo are both committed to dedicate effort to this project during the upcoming year, including participating in regular work-in-progress sessions and collaborating with the Health AI and AER teams for development and implementation of the AI system.  

Additional Team Members 
Robin Andersen, MSN, PHN, NP, Manager, Complex Care Management, OPH 
Joshua Munday, MSN, MPH, RN, FNP, Complex Care Management, OPH 
Kristin Gagliardi, Implementation Specialist, Innovations Team, OPH 
Sara Faghihi Kashani, MD, MPH, Clinical Informatics Fellow 
Abi Fadairo-Azinge, MD, MPH, Clinical Informatics Fellow 

Open edit period edit(s)
Added list of inclusion/exclusion criterias that pilot prompt evalutes for

References
[1] Liu X, Anstey J, Li R, Sarabu C, Sono R, Butte AJ. Rethinking PICO in the Machine Learning Era: ML-PICO. Appl Clin Inform. 2021 Mar;12(2):407-416.
[2] Liu X, Auerbach AD. A new ballgame: Sabercaremetrics and the future of clinical performance measurement. J Hosp Med. 2025 Apr;20(4):411-413.

Supporting Documents: 

Comments

The development of AI/ML prompts to streamline the process by which clinical teams identify patients in need of and appropriate for interventions would allow clinicians to focus their time on actual patient care. Leveraging AI/ML to provide care to more patients with limited staffing resources is really exciting and important.

This kind of project is critical to the way that we deliver healthcare- fully support!

Augmenting the patient identification process with AI/ML could really help our clinicians spend more time with patients and expand our reach. Love it! 

I'm hopeful this important work will be leveraged in the future to identify patients for other care management programs to help deliver better care and an improved experience for UCSF patients.

A great proposal, as many of our dc summaries are long enough to be unreadable due to time constraints. 

I like your proposal, including the plan for continuing to require SOME manual chart reviews over time to monitor accuracy/concordance.  How many criteria do you currently use?  Can you provide a full list?  Does the team have capacity to provide CCM to all patients who meet inclusion criteria at this point, or do we still have capacity limitations?  If limited, how does the team currently decide between eligible patients which will receive CCM?

Thank you, Mark, for your thoughtful engagement! To address each of your questions:

Enrollment criteria: The CCM program has enrolled adult patients with high health system utilization, defined historically as having at least two ED visits, observation stays, or unplanned inpatient admissions within the past 12 months. In addition, patients must meet at least one of six indicators of medical or psychosocial complexity. These indicators include:

  1. Uncontrolled chronic medical conditions
  2. Use of high-risk medications
  3. Evidence of barriers to medical adherence or limited health literacy
  4. Lack of social support
  5. Older age with frailty or poor functional status or cognitive impairment
  6. Social determinants of health (SDoH) challenges

While specific definitions of health system utilization and medical/psychososcial complexity may evolve over time in alignment with shifting health system goals and priority populations, the overall enrollment focus will remain on patients with high utilization with significant complexity.

 

Team capacity: Historically, the CCM team has generally been able to provide care to all identified and enrolled patients, using manual chart review to determine eligibility and assess patient interest in participation. However, we recognize that the labor-intensive nature of manual reviews may contribute to underidentification of eligible patients. The goal of integrating an AI/ML-driven prompt is to streamline, accelerate, and improve objectivity of the identification process, potentially surfacing more eligible patients which may ultimately impact team capacity. Should this occur in the future, we anticipate that prioritization will be guided by a combination of factors, including alignment with priority populations and the degree of healthcare utilization. That said, we are optimistic that time saved through AI/ML-driven prompt assisting in patient identification could instead be reinvested in caring for more enrolled patients.

 

Love this well-planned, thoughful, and important proposal by this well-rounded team of experts - fully support!  As a primary clinician, I know of the critical work our OPH colleagues do - anything to improve their efficiency, standardize processes (which also optimizes equity), and better support them to focus on actual clinical care is a pt-centered and instititional win.  I appreciate the preliminary work demonstrating proof-of-concept, its inclusion of Care Everywhere notes (those can be incredibly challenging to sift through!) and this is a terrific application of AI.