1-The UCSF Health Problem
According to the NAM about 30% of healthcare activities are generally wasted on unnecessary services and other inefficiencies. Although the true magnitude of wasteful activities at UCSF are not precisely known, it’s likely that waste not only exists but meaningfully harms UCSF in an increasingly competitive environment. Diagnostic and treatment errors harm patients and undercut our mission of advancing health worldwide. They also harm the financial health of UCSF, particularly in the setting of risk-bearing contracts with payors. Diagnostic and treatment errors also harm UCSF in many other indirect ways, such as 1) reducing healthcare access to other patients, 2) harming our reputation as a global leader in medicine, 3) increased risk of staff burnout, and 4) medico-legal risk from diagnostic or treatment delays.
2- How Might AI Help?
One of the great promises of AI lies in its ability to improve medical decision making, enhancing both patient outcomes and hospital efficiencies by reducing waste. While recent years have seen many advances in general AI solutions (e.g. ChatGPT), there remains a significant paucity in models that have been trained to correctly interpret healthcare data. Dedicated AI models for interpreting the EHR are likely necessary given its significant complexity, as well significant structural differences from other internet-scale text used to train most general AI solutions. In the near term, we foresee a need for “homegrown” AI solutions, given 1) the significant challenges that general AI companies face in accessing sensitive EHR data, and 2) their relative lack of healthcare expertise compared to what is available at UCSF.
We propose to develop a general-purpose system to reduce diagnostic delays, piloting this system in IBD. The choice of this disease reflects our clinical expertise and directly extends recent work from our group that grapples with and partially addresses the challenges of building systems intended for real-world deployment. However, we envision that a successful pilot will be the first step on the road to extending this very broadly across diseases.
The system we envision will likely be a data-driven approach (use known cases and controls to train a longitudinal classifier), but we can explore the use of models that utilize published guidelines and diagnostic algorithms to enhance the model and eventually reduce errors from human clinicians.
Our model will be trained and validated using readily available and free de-identified EHR datasets. It will analyze patients' past healthcare interactions, integrating ICD codes (historical diagnoses, misdiagnoses, and symptoms), Medication history (previous prescriptions and treatment patterns), Laboratory results (CBC, inflammatory markers, etc.), Imaging reports (radiology findings linked to IBD suspicion), Clinical notes (NLP-based insights from physician documentation) and Demographics (age, sex, race, and lifestyle factors)
3- How would an end-user find and use it?
The AI model will generate a probability score for IBD based on these factors, offering transparent and explainable predictions. The output will display: The likelihood (%) that a patient has IBD, key contributing factors used in the AI’s prediction (e.g., chronic diarrhea, weight loss, anemia, previous gastro-related complaints), suggested next steps (e.g., referral to a gastroenterologist, additional non-invasive screening)
The model will be deployed within the Electronic Health Record (EHR) system and triggered during a patient’s visit to their physician. Specifically, it will be:
- Automatically activated when a patient with relevant symptoms visits a physician.
- Displayed in real-time within the physician’s workflow, ensuring immediate usability.
When the AI detects a high risk of IBD, the clinician will receive an alert within the EHR interface. The alert will display:
- The probability score (%) of IBD suspicion for that patient.
- An optional, expandable menu showing a breakdown of key contributing factors, such as persistent gastrointestinal complaints, abnormal lab findings, or prior treatments.
- Suggested next steps, such as direct referral for a GI procedure, vs further non-invasive testing.
The tool will allow clinicians to acknowledge the AI recommendation and either follow or override the suggestion. Order additional tests (e.g., fecal calprotectin, CRP, stool culture) within the same interface. Directly refer the patient for a GI procedure, vs GI e-consultation. View explainability details—understanding why the AI made the recommendation.
4- Embed a picture of what the AI tool might look like.
5- What Are the Risks of AI Errors?
While AI can improve early IBD detection, it comes with potential risks that must be managed.
1. False Positives (Overdiagnosis)
- Risk: AI may flag non-IBD patients, leading to unnecessary referrals and lab tests.
- Mitigation: Use explainability tools, confidence thresholds, and allow clinician override to avoid excessive false alarms.
2. False Negatives (Missed Diagnoses)
- Risk: AI might miss true IBD cases, delaying treatment.
- Mitigation: Regular model updates, feedback loops, and a human-in-the-loop approach ensure accuracy.
Risk Monitoring & Continuous Improvement
- Track error rates (false positives/negatives).
- Collect clinician feedback for refinement.
- Regularly retrain AI models to improve reliability.
While false positives may lead to minor inefficiencies, the benefits of early IBD detection far outweigh the risks when managed properly.
6- How Will We Measure Success?
To evaluate the AI model’s effectiveness, we will perform an embedded randomized controlled trial of the tool, clustered by clinician, and measure the following:
1. Existing APeX Data Metrics
✔ Reduction in time to diagnosis (from first symptoms to confirmed IBD diagnosis).
✔ Proportion of direct-to-endoscopy referral cases with positive findings
✔ Percentage of clinicians interacting with the AI tool in practice (only in the intervention group)
2. Additional Ideal Metrics
✔ Clinician trust and adherence (based on a timed survey of the intervention group).
✔ Improvement in patient outcomes (fewer emergency visits, earlier treatment initiation).
✔ Cost savings from fewer unnecessary tests and delayed diagnoses.
When to Continue or Abandon?
- Success: AI adoption increases, diagnostic delays decrease, and patient outcomes improve.
- Failure: AI shows persistent false positives/negatives, low clinician adoption, or no impact on diagnosis speed.
If successful, we will seek long-term integration into UCSF’s APeX system
7- Describe your qualifications and commitment
I am an IBD physician and a researcher in clinical data science. As I clinician I am often at the receiving end of referrals for new or suspected cases of IBD and often note delays in timely diagnoses. As a researcher I work on computational methods for using EHR data to improve clinical decision making. We have developed, published and patented methods for reducing diagnostic delays in rare diseases using machine learning (PMID 38946554). More recently in unpublished work, we have enhanced these methods using novel ML architectures for longitudinal predictive modeling and incorporating domain expertise using automated methods. If selected for this pilot, I commit 15% of my effort to working with the UCSF AI team to develop, deploy and test this new tool.
Comments
Can you clarify the setting
Can you clarify the setting in which this tool/alert would activate? Is it for GI specialists or much more broadly (e.g., general internal medicine outpatient visits)? The latter could potentially add more value (catching cases earlier), but may be less acceptable by clinicians who have many problems/complaints they are evaluating and won't necessarily welcome a tool focused on a very specific diagnosis. If you are thinking this is for a general setting, are you involving any generalist stakeholders who can testify and advise you on desirability and usability in their workflows? Another approach would be to build this into an AI response to a GI consultation request?
Hi Dr Pletcher, Thanks for