Artificial Intelligence / Machine Learning Demonstration Projects 2025

Crowdsourcing ideas to bring advances in data science, machine learning, and artificial intelligence into real-world clinical practice.

Reducing Diagnostic Delays in Inflammatory Bowel Disease Through Machine Learning Approaches

Submitted by Goktug Onal on April 3, 2025 - 6:53pm Last revised by Zeanid Noor on April 30, 2025 - 4:33pm.

Primary Author: Vivek Rudrapatna

Proposal Status:

1-The UCSF Health Problem

According to the NAM about 30% of healthcare activities are generally wasted on unnecessary services and other inefficiencies. Although the true magnitude of wasteful activities at UCSF are not precisely known, it’s likely that waste not only exists but meaningfully harms UCSF in an increasingly competitive environment. Diagnostic and treatment errors harm patients and undercut our mission of advancing health worldwide. They also harm the financial health of UCSF, particularly in the setting of risk-bearing contracts with payors. Diagnostic and treatment errors also harm UCSF in many other indirect ways, such as 1) reducing healthcare access to other patients, 2) harming our reputation as a global leader in medicine, 3) increased risk of staff burnout, and 4) medico-legal risk from diagnostic or treatment delays.

2- How Might AI Help?

One of the great promises of AI lies in its ability to improve medical decision making, enhancing both patient outcomes and hospital efficiencies by reducing waste. While recent years have seen many advances in general AI solutions (e.g. ChatGPT), there remains a significant paucity in models that have been trained to correctly interpret healthcare data. Dedicated AI models for interpreting the EHR are likely necessary given its significant complexity, as well significant structural differences from other internet-scale text used to train most general AI solutions. In the near term, we foresee a need for “homegrown” AI solutions, given 1) the significant challenges that general AI companies face in accessing sensitive EHR data, and 2) their relative lack of healthcare expertise compared to what is available at UCSF.

We propose to develop a general-purpose system to reduce diagnostic delays, piloting this system in IBD. The choice of this disease reflects our clinical expertise and directly extends recent work from our group that grapples with and partially addresses the challenges of building systems intended for real-world deployment. However, we envision that a successful pilot will be the first step on the road to extending this very broadly across diseases.

The system we envision will likely be a data-driven approach (use known cases and controls to train a longitudinal classifier), but we can explore the use of models that utilize published guidelines and diagnostic algorithms to enhance the model and eventually reduce errors from human clinicians.

Our model will be trained and validated using readily available and free de-identified EHR datasets. It will analyze patients' past healthcare interactions, integrating ICD codes (historical diagnoses, misdiagnoses, and symptoms), Medication history (previous prescriptions and treatment patterns), Laboratory results (CBC, inflammatory markers, etc.), Imaging reports (radiology findings linked to IBD suspicion), Clinical notes (NLP-based insights from physician documentation) and Demographics (age, sex, race, and lifestyle factors)

3- How would an end-user find and use it?

The AI model will generate a probability score for IBD based on these factors, offering transparent and explainable predictions. The output will display: The likelihood (%) that a patient has IBD, key contributing factors used in the AI’s prediction (e.g., chronic diarrhea, weight loss, anemia, previous gastro-related complaints), suggested next steps (e.g., referral to a gastroenterologist, additional non-invasive screening)

The model will be deployed within the Electronic Health Record (EHR) system and triggered during a patient’s visit to their physician. Specifically, it will be:

Automatically activated when a patient with relevant symptoms visits a physician.
Displayed in real-time within the physician’s workflow, ensuring immediate usability.

When the AI detects a high risk of IBD, the clinician will receive an alert within the EHR interface. The alert will display:

The probability score (%) of IBD suspicion for that patient.
An optional, expandable menu showing a breakdown of key contributing factors, such as persistent gastrointestinal complaints, abnormal lab findings, or prior treatments.
Suggested next steps, such as direct referral for a GI procedure, vs further non-invasive testing.

The tool will allow clinicians to acknowledge the AI recommendation and either follow or override the suggestion. Order additional tests (e.g., fecal calprotectin, CRP, stool culture) within the same interface. Directly refer the patient for a GI procedure, vs GI e-consultation. View explainability details—understanding why the AI made the recommendation.

4- Embed a picture of what the AI tool might look like.

5- What Are the Risks of AI Errors?

While AI can improve early IBD detection, it comes with potential risks that must be managed.

1. False Positives (Overdiagnosis)

Risk: AI may flag non-IBD patients, leading to unnecessary referrals and lab tests.
Mitigation: Use explainability tools, confidence thresholds, and allow clinician override to avoid excessive false alarms.

2. False Negatives (Missed Diagnoses)

Risk: AI might miss true IBD cases, delaying treatment.
Mitigation: Regular model updates, feedback loops, and a human-in-the-loop approach ensure accuracy.

Risk Monitoring & Continuous Improvement

Track error rates (false positives/negatives).
Collect clinician feedback for refinement.
Regularly retrain AI models to improve reliability.

While false positives may lead to minor inefficiencies, the benefits of early IBD detection far outweigh the risks when managed properly.

6- How Will We Measure Success?

To evaluate the AI model’s effectiveness, we will perform an embedded randomized controlled trial of the tool, clustered by clinician, and measure the following:

1. Existing APeX Data Metrics

✔ Reduction in time to diagnosis (from first symptoms to confirmed IBD diagnosis).
✔ Proportion of direct-to-endoscopy referral cases with positive findings
✔ Percentage of clinicians interacting with the AI tool in practice (only in the intervention group)

2. Additional Ideal Metrics

✔ Clinician trust and adherence (based on a timed survey of the intervention group).
✔ Improvement in patient outcomes (fewer emergency visits, earlier treatment initiation).
✔ Cost savings from fewer unnecessary tests and delayed diagnoses.

When to Continue or Abandon?

Success: AI adoption increases, diagnostic delays decrease, and patient outcomes improve.
Failure: AI shows persistent false positives/negatives, low clinician adoption, or no impact on diagnosis speed.

If successful, we will seek long-term integration into UCSF’s APeX system

7- Describe your qualifications and commitment

I am an IBD physician and a researcher in clinical data science. As I clinician I am often at the receiving end of referrals for new or suspected cases of IBD and often note delays in timely diagnoses. As a researcher I work on computational methods for using EHR data to improve clinical decision making. We have developed, published and patented methods for reducing diagnostic delays in rare diseases using machine learning (PMID 38946554). More recently in unpublished work, we have enhanced these methods using novel ML architectures for longitudinal predictive modeling and incorporating domain expertise using automated methods. If selected for this pilot, I commit 15% of my effort to working with the UCSF AI team to develop, deploy and test this new tool.

Supporting Documents:

Letter of Support

Comments

Can you clarify the setting

Mark Pletcher - April 12, 2025 - 9:58am

Can you clarify the setting in which this tool/alert would activate? Is it for GI specialists or much more broadly (e.g., general internal medicine outpatient visits)? The latter could potentially add more value (catching cases earlier), but may be less acceptable by clinicians who have many problems/complaints they are evaluating and won't necessarily welcome a tool focused on a very specific diagnosis. If you are thinking this is for a general setting, are you involving any generalist stakeholders who can testify and advise you on desirability and usability in their workflows? Another approach would be to build this into an AI response to a GI consultation request?

Hi Dr Pletcher, Thanks for

Goktug Onal - April 19, 2025 - 5:31pm

Hi Dr Pletcher,

Thanks for your comments and interest.

We envision that the tool would primarily target primary care encounters, but could also potentially apply in broader contexts as well. The rationale is to aid non-GI specialists in considering this diagnosis and selecting an appropriate next step. We recognize that the tool might be a slight annoyance to busy PCPs who are managing many other complaints, but we aim to curb this in a few ways 1) given that undiagnosed/de novo IBD is somewhat uncommon, the popup would likely not interfere in the vast majority of clinical encounters, 2) the tool recommendations will be suppressed if a clinician has already taken a relevant action (recent GI referral or GI visit).

This pilot proposal is intentionally focused on a very narrow and specific use case (IBD), as this reflects our team's domain expertise. However if it proves successful we fully intend to broaden it to many other diseases that could benefit from early diagnostic support (e.g. cancers like ovarian and pancreatic, rare diseases). In this pilot our primary focus will be on minimizing false positives and other disruptions to clinical workflows. However, secondary measures include time-to-diagnosis savings, cost-reductions estimates, and clinician acceptability. This will inform future decisions around scaling and institutional investment. Ultimately, we think diagnostic clinical decision support is a very promising use case for AI and one that could have broad value to UCSF overall and to our patients.

UCSF Open Proposals