Artificial Intelligence / Machine Learning Demonstration Projects 2025

Crowdsourcing ideas to bring advances in data science, machine learning, and artificial intelligence into real-world clinical practice.

Headache Evaluation and Diagnosis - with Generative Artificial INtelligence (HEAD-GAIN): Improving Access

Proposal Status: 

Section 1: The UCSF Health Problem 

Headache disorders affect a wide swath of the population as the third highest cause of disability-adjusted life years worldwide(1) and often impact people during their peak productive years, extracting a significant financial toll at upwards of 20 billion dollars annually (2,3). Accurate and rapid diagnosis of secondary headaches is imperative to prevent neurological morbidity. Moreover, early identification and treatment of primary headache disorders improves outcomes by preventing headaches from progressing into debilitating, chronic conditions (4). A critical decision point in the accurate diagnosis of headaches is whether brain imaging is needed. If every person with headache received a brain MRI, this would place unnecessary strain on the health system (5). However, not obtaining an MRI in a patient with secondary headache can be devastating. 

While neurologists are well-equipped to evaluate and diagnose headache disorders, primary and acute care providers are usually the first line of care for patients with headache (4,6). Quality and depth of training for first-line providers on the management of headache disorders is highly variable (7-11). Furthermore, a shortage of neurologists results in limited access to specialty headache care (12). The UCSF General Neurology division receives over 12,000 referrals annually, and on average, 22-25% of these referrals are headache-related. Of these headache referrals, 1 in 6 are secondary headaches that may receive delays in diagnosis due to long referral wait times. Based on these data, approximately 3,000 patients with headaches currently stand to benefit from implementation of the HEAD-GAIN tool each year. Furthermore, validation of the HEAD-GAIN tool could inspire the development of similar technologies in other neurological or medical subspecialties; for instance, our team is currently working on an analogous tool for patients with neurodegenerative disease.  

 

Section 2: How AI Addresses the Problem  

LLMs like OpenAI’s generative pre-trained transformers (ChatGPT-3) are being increasingly studied and implemented throughout medicine, from virtual assistants to clinical decision support (13–16). Headache classification is a common application of generative AI with PVAs serving as a source of rich phenotypic data.17  AI systems can be used to differentiate secondary headaches, with one machine learning-based prediction model demonstrating an accuracy of 0.74 (17,18). Some research has already suggested that application of LLMs towards headache classification and diagnostic work-up may help to reduce overcrowding in emergency departments and allow providers to improve patient triaging (19). 

Despite the promise of applying LLMs to patient care, data on the diagnostic accuracy of generative AI compared to physicians is mixed (20–28). Additionally, generative AI runs the risk of providing biased diagnostic suggestions impacting clinical care (27), especially when trained with a non-diverse and uniform patient population (28).Given these concerns, research into how to appropriately implement these systems into clinical workflows (29,30), including for headache diagnosis and management remains critical.  

There is great need for a scalable AI tool that 1) helps distinguish primary from secondary headaches, 2) supports physician medical decision-making, and 3) facilitates efficient patient care delivery. Through this study, we hope to develop and validate a generative AI-based tool that can be applied to the diverse patient population at UCSF who need headache care. 

To help us more accurately and efficiently triage and diagnose headache patients, we propose to validate and implement a diagnostic and management tool using a large language model (LLM) coupled with a Qualtrics-based pre-visit assessment (PVA). This tool is intended to identify patients at risk of secondary headaches to be scheduled sooner, recommend first-line treatments for primary headaches to referring providers.  

For this study, participants will complete Qualtrics-based PVA once. Versa, a HIPAA-compliant LLM, will review the PVA and output a high-level summary, likely diagnosis, and imaging recommendation. Each participant will then see a neurologist who will perform a detailed clinical history and neurologic physical examination. If the AI tool, designated as Headache Evaluation And Diagnosis-with Generative Artificial INtelligence (HEAD-GAIN), demonstrates high efficacy and useability, we will pilot this program on APeX, where we can first apply it to all headache referrals to UCSF Neurology, with the ultimate goal of making this tool accessible to primary care doctors.  

 

Section 3: End-User Workflow  

The HEAD-GAIN tool is intended to expedite in-person encounters with headache patients. After validation in the UCSF General Neurology Clinics, this tool may be deployed in primary care settings, where physicians have less specialized training in the diagnosis and management of headache disorders. 

We envision the end-user workflow as such: APeX automatically scans referral requestfor keywords such as “head pain,” “headache,” or “migraine. If detected, a button becomes visible on the patient's chartThe patient coordinator will click the button to send the patient an automated MyChart message and SMS text message via CipherHealth containing instructions to complete the PVA. The patient clicks on the link in the text or MyChart message, which prompts them to decide whether to consent to engagement with the HEAD-GAIN tool. If the patient consents, they are directed to the Qualtrics-based PVA. The PVA responses are saved to a secure, HIPAA-compliant server. Python Notebook automates the remaining steps.  Notebook feeds the PVA responses into UCSF Versa and requests a high-level summary, likely diagnosis, and imaging recommendation. An engineered prompt is given to Versa to generate these data, determine the most likely diagnosis, and categorize it as primary or secondary headache. If there is a likelihood of a secondary headache, then the referral will be marked “urgent” and expedited to see a neurologist within 5-7 business days. If the AI-determined diagnosis is a primary headache condition, then a curated list of potential first-line treatments for the identified primary headache condition will be provided back to the referring provider while the referral is pending neurology evaluation.

To improve patient access and survey completion, future iterations of this tool will incorporate the use of OpenAI's Whisper, which is an AI/machine learning model for speech recognition and transcription, to deploy the PVA. 

Section 4: Image of AI Tool 

 

A screenshot of a computer

AI-generated content may be incorrect. 

 

Section 5: Possible Sources of Error 

The major concern associated with the HEAD-GAIN tool is the diagnostic error rate for secondary headache. Furthermore, inaccurate diagnosis of primary headaches could also lead to erroneous recommendations for headache treatment.  

To mitigate these risks, a study team member will review discrepancies between Versa-determined headache diagnoses compared to the gold standard of neurologist headache diagnosis. Discrepancies will be reviewed individually and will be analyzed by prompting Versa to elucidate its clinical chain of thought.  We will then modify the PVA to try to improve the performance of the HEAD-GAIN tool.  We will also conduct prompt engineering to optimize the accuracy of data output by UCSF Versa.  The concordance rate for the diagnosis of primary versus secondary headache will be calculated, with the aim of achieving a concordance rate of 0.85 after modifications to the PVA and prompt engineering.  

Optimization of the PVA and iterative prompt engineering will be combined to increase the sensitivity for HEAD-GAIN to identify secondary headaches before implementation into clinical workflows.

Section 6: Metrics of Success 

Collected in APeX 

  • Validation stage: Number of referrals identified and selected for HEAD-GAIN intervention;Wait times for headache consultations from referral date; Utilization of MRI brain imaging before and after headache referral; Concordance between AI diagnosis and management and neurologist diagnosis and management. 

  • Implementation stage: Time to diagnosis of secondary headache starting from referral date;Patient and referring provider satisfaction pre- and post-intervention; Time to initiation of first primary prevention or abortive medication; Proportion of providers that use the HEAD-GAIN tool.  

Other Measurements 

  • Validation stage: Concordance of headache diagnosis and imaging recommendation by Versa compared to consultant neurologist. 

  • Implementation stageAcceptability and useability of the HEAD-GAIN tool for referring providers and consulting neurologists 

Section 7: Qualifications   

      The project will be led by UCSF General Neurology Division’s Technology Chief Pierre Martin.  He is an outpatient general neurologist with SmartUser certification for Epic and has an academic focus that involves the design and development of educational technology.  He recently secured Innovations Funding for Education to develop an immersive mobile application for medical trainees to learn clinical neuroanatomy via an interactive 3D model.  The HEAD-GAIN project team includes two UCSF neurologists, an external neurologist collaborator from Emory University, an AI expert from UCSF Memory and Aging Center, UCSF medical student, and graduate programming student from UCSC.  We currently maintain biweekly meetings and plan to incorporate time for work-in-progress sessions with the Health AI and AER teams.

Comments

I could see this being useful.  You are proposing that the initial data collection step be done via a Qualtrics survey.  Won't some or all of the information already be in the referral sometimes?  What about patients who can't type well or don't want to fill out a survey?  How accurate would the triage have to be for you to rely on it?  If you can't fully rely on it, would it still save time and effort?