Artificial Intelligence / Machine Learning Demonstration Projects 2025

Crowdsourcing ideas to bring advances in data science, machine learning, and artificial intelligence into real-world clinical practice.

GIVersa-Endoscopy: A Large Language Model (LLM) based AI Assistant for Endoscopy Sedation Triage

Proposal Status: 

The UCSF Health Problem:

Sedation planning prior to endoscopic procedures is an important quality metric and essential step in the procedural workflow. Triage of which patients require higher levels of anesthesia support is critical to maximize patient safety and allocate limited anesthesia resources. At most institutions, patients are assessed for sedation risk from pre-existing medical conditions based on manual chart review by clinical staff and endoscopists: this is a time consuming and labor-intensive task.  

At UCSF Health, patients are triaged for endoscopy procedures across five locations, based on their anesthesia risk and required sedation type. As our health system expands across hospital systems and ambulatory care centers throughout San Francisco and the greater Bay Area, the decision tree for appropriate triage becomes increasingly complex. Currently, the gastroenterology office reviews up to 150 direct endoscopy referrals per week. This workflow of manual review by staff and faculty diverts valuable time away from direct patient care and contributes to administrative burdens, which in turn leads to physician burnout. 

Although this proposal is initially tailored to gastroenterology-specific procedures, the administrative challenges of peri-operative sedation triage are widespread across the health system in many divisions, highlighting the larger potential uses for an AI-based sedation triage assistant. 

How might AI help?

We propose the development of an LLM-based assistant (UCSF Versa) customized via the Retrieval-Augmented Generation (RAG) methodology. This approach adds the UCSF anesthesia endoscopy guidelines and American Society of Gastrointestinal Endoscopy (ASGE) clinical guidelines into the assistant’s search database, thereby reducing hallucinations by controlling the data sources used for response generation. This customization named, “GIVersa-Endoscopy” would serve as the chat interface intended to triage sedation levels (moderate sedation versus anesthesia) and recommend the appropriate procedure location stratified by anesthesia risk level (Parnassus operating room, Parnassus endoscopy unit, Mission Bay endoscopy unit, Mount Zion endoscopy unit, or ambulatory care centers) based on clinical patient data extracted from the electronic medical record. This design has been successfully tested by our team in a pilot retrospective cohort study using a custom Epic Smartphrase to extract relevant clinical data for the assistant. 

How would an end-user find and use it?

The GI-Versa Endoscopy LLM assistant will be integrated into the existing Epic referrals and pre-procedural planning dashboards. When processing a direct procedure referral, the end-users, the administration staff and faculty member responsible for reviewing and scheduling new referrals, will receive an alert with an option to generate the AI-augmented triage recommendation. The interface will allow the user to “Approve” or “Decline” the AI’s recommendation. The final decision on sedation level and where the patient should be scheduled for their procedure along with the underlying reasoning on why the assistant reached that decision will be recorded in the patient’s chart. This integration ensures that the tool is actionable, easily discoverable, and seamlessly fits into the existing workflow.  

Example AI tool output

See attached figure. 

What are the risks of AI errors?

The risks for an AI-augmented preprocedural sedation triage assistant includes: 

1) Patient safety: A false negative would occur when a patient is recommended a lower level of sedation support by the assistant when the patient should have been triaged to a higher level of sedation support. 

 2) Overdependence on the AI: Excess reliance on the AI system may compromise decision making in complex clinical cases. 

To mitigate these risks, we will conduct rigorous validation tests and continuously monitor performance metrics such as sensitivity, specificity, and error rates. The system is designed to require human verification of the AI’s recommendations prior to scheduling the patient.  All AI assistant outputs will be logged in the patient’s chart for audit and review. Licensed professionals will maintain primary accountability for patient care decisions. 

How will we measure success? 

We plan to evaluate the AI solution through a prospective cohort study of patients referred for direct endoscopy at UCSF Health. Success metrics will include: 

Measurements Using Existing APeX Data:

1)    The volume of referrals processed by the AI tool.

2)    The percentage of referrals where AI recommendations are accepted after clinician review.

3)    Reduction in manual review time as recorded within current workflow data. Time reduction in manual processing of each direct procedure referral.

4)    Correlation between AI-augmented sedation risk stratification and patient outcomes.  This will be measured via post-procedural audits which are performed to ensure successful and safe completion of patient procedures. This workflow is already in place for endoscopic procedures at our health system. 

Additional Measurements:

1)    Review of AI recommendation logs versus human overrides to identify error rates and opportunities for improvement in assistant performance. 

2)    End-user satisfaction surveys to assess overall usability, reduction in administrative load, and time efficiency. 

3)    Impact on patient wait times to schedule procedures (time from referral placement to receipt of procedure appointment confirmation). 

4)    Impact on administrative clinic staff and ease of integration within existing workflows.

Describe your qualifications and commitment 

Dr. Lakshmi Subbaraj, MD, a current second year gastroenterology fellow, has career aspirations in the integration of AI applications in clinical practice to enhance patient care and improve clinical operations. Her technical background in computer science, completion of an “AI in Healthcare: From Strategies to Implementation” course, and recognition as an ASGE AI scholar underscore her commitment to this goal this early in her career. The blend of her clinical acumen, technical expertise, and communication skills has been pivotal in spearheading this project.

Dr. Jin Ge, MD, MBA, an NIH-funded clinical researcher and transplant hepatologist, also serves as Director of Clinical AI for the Division of Gastroenterology and Hepatology. He is the co-lead for this project and has a proven track record in building and implementing AI initiatives. His professional background with experiences in healthcare administration, data science, and artificial intelligence – along with his previous collaborations with the UCSF AER Team and AI Tiger Team, make him exceptionally qualified to lead this project through the design, testing, and implementation stages. 

Comments

The way you've designed the output of the tool, it seems like the logic for the recommendation will be relatively clear and easy for the reviewer to verify since the relevant conditions/patient characteristics are provided.  But what if the LLM doesn't correctly extract the relevant characteristics, and/or misses some important clinical detail?  To catch this, the clinician would still need to review the referral/records in detail?  If you have a human in the loop, will this still save time or do you envision eventually being able to fully trust the extracted details?