Caring Wisely FY26 Project Contest

Headache Evaluation and Diagnosis - with Generative Artificial INtelligence (HEAD-GAIN): Improving Access, Reducing Overutilization

Submitted by Pierre Martin on March 1, 2025 - 12:00pm Last revised by Rasmyah Hammoudeh on March 11, 2025 - 2:01pm.

Proposal Status:

Review Complete

PROJECT LEAD(S):

- Pierre Martin, MD, MEd

- Andrew Breithaupt, MD

EXECUTIVE SPONSOR(S):

- Maggie Waung, MD, PhD

ABSTRACT

Clinicians must distinguish between primary headaches and secondary headaches that require neuroimaging. Limited access to subspecialty headache care leads to misdiagnoses, delays in management, overutilization of neuroimaging, and increased costs, despite clinical practice guidelines and Choose Wisely campaigns.^1–3 Large language models (LLMs), are increasingly used for clinical decision support.⁴ We hypothesize that a chatbot-delivered pre-visit assessment (PVA) combined with an LLM can efficiently collect a comprehensive patient history, accurately diagnose headaches, and provide neuroimaging recommendations similar to an in-person neurologist. The goal of the intervention is to improve access to headache care and reduce overutilization of neuroimaging, along with associated costs. For the development and validation phase, participants will be randomized to a person-delivered pre-visit assessment (PVA) or an automated chatbot-delivered PVA. UCSF Versa, a HIPAA-complaint AI platform, will analyze transcripts to generate a history of present illness, diagnosis and positive or negative recommendation for neuroimaging, for comparison with an in-person neurologist. The chatbot-delivered PVA will later be implemented into clinical workflows by the UCSF General Neurology Clinic and various metrics, including time to diagnosis, neuroimaging utilized and cost of neuroimaging will be calculated for the 6 months before and after implementation.

TEAM

- Project Lead: Pierre Martin, MD, MEd

- Project Lead: Andrew Breithaupt, MD

- Lead Researcher: Psalm Pineo-Cavanaugh, BS

- Lead Developer: Forest Pineo-Cavanaugh, BE

PROBLEM:

Headache disorders affect a wide swath of the population as the third highest cause of disability-adjusted life years worldwide⁵ and often impact people during their peak productive years, extracting a significant financial toll at upwards of $20B annually.^6,7 Accurate and rapid diagnosis of headaches is imperative to treat potentially life-threatening conditions such as subarachnoid hemorrhage, meningitis, or brain tumors. Moreover, early identification and treatment of primary headache disorders (98% of all headaches)⁸ improve outcomes, preventing headaches such as migraine from progressing into debilitating, chronic conditions.⁹ A critical decision point in the accurate diagnosis of headaches is whether brain imaging is needed. If every person with headaches received a MRI Brain, this would place unnecessary strain on the health system¹. However, not obtaining an MRI in a patient with secondary headaches can be devastating.¹⁰

Two-thirds of the 27 million ED visits are “avoidable”¹¹ and there are 3.5 million “potentially preventable” adult inpatient admissions yearly, accounting for $33.7B in the aggregate¹². Disease management in neurology is particularly expensive because of the complex diagnostic procedures, chronic disease management as well as indirect costs (e.g., lost productivity, disability accommodations). While neurologist involvement in the ambulatory care setting leads to greater unadjusted allowed third-party payments, it is also associated with increased utilization of both symptom-ameliorating and disease-modifying medications, decreased adverse events, and decreased utilization of both acute and post-acute healthcare resources.¹³ There can be significant interprovider differences in ordering practices and the reasons for this overutilization are multifold, including desire to address the concerns of referring clinicians, appeasement of patients, shortcuts in a busy practice, cognitive bias and defensive medicine.^14,15 Overutilization burdens the healthcare system, can impact insurance coverage and in turn reduces access to care for patients with more critical neurological conditions. Reduced access to care invariably leads to a worsened prognosis and increased acute care utilization (e.g., ED visits and inpatient admissions).

Standardization and clinical practice guidelines help to narrow practice variation and can reduce cost without a reduction in clinical outcomes.¹⁶ The American Headache Society and American College of Radiology emphasize that neuroimaging is not necessary for uncomplicated headaches that meet ICHD-3 criteria for migraine, do not have “red flag” symptoms/signs and maintain a normal neurological examination.¹⁴ For example, most brain tumor patients present with multiple symptoms/signs, while only 12% present with isolated headache and 3% are incidental.¹⁷ Despite clinical practice guidelines and Choose Wisely healthcare campaigns, 12 – 16% of patient with primary headaches undergo MRI neuroimaging^1–3 as overutilization in this setting contributes to $1 – 3B per year in avoidable imaging costs.¹⁴ Moreover, overutilization of neuroimaging can lead to false positives, resultant patient anxiety and unnecessary follow-up diagnostics, interventions and consultations with their own associated risks.

Headaches are one of the more common neurological conditions managed in the UCSF General Neurology Clinic. During 2024, the UCSF General Neurology Clinic received between 1000 - 1400 new patient referrals per month and a significant proportion of those referrals were related to headache management, between 100 - 280 referrals per month. And the catchment area for UCSF General Neurology Clinic is increasing in size, year by year. Hundreds of new headache patients are seen per month between UCSF General Neurology and UCSF Headache clinics, some presenting with or without recent neuroimaging. Moreover, there are growing concerns that changes in both federal and state level healthcare policies will lead to significant reductions in insurance coverage, funding cuts to Medicaid, and changes in reimbursement rates, which will all further exacerbate the problem of reduced access to neurological care. This will be particularly true for underserved regions and vulnerable patient populations. Innovative solutions to improve access to neurological care and reduce healthcare costs are imperative.

TARGET:

Goal: We aim to develop, validate and safely implement a chatbot-delivered PVA combined with a LLM to 1) improve patient access to headache care and 2) reduce overutilization of neuroimaging for headache management.

Expected quantitative benefits:

- Decrease in time to diagnosis

- Decrease in time to diagnostic testing

- Decrease in unnecessary diagnostic tests (i.e., neuroimaging)

- Reduction in healthcare expenses associated with the decrease in unnecessary diagnostic tests (i.e., neuroimaging)

- Reduction in acute care utilization

Expected qualitative benefits:

- Improved and more efficient clinical workflow for patients

GAPS:

While neurologists are well equipped to evaluate and diagnose headache disorders, primary and acute care providers are usually the first line of care for patients with headache^9,18. Quality and depth of training for these providers on the management of headache disorders is highly variable.^19,20 And providers maintain variable awareness of clinical practice guidelines and Choosing Wisely Campaigns. Furthermore, the global shortage of neurologists results in limited access to subspecialty headache care, often leading to misdiagnosis and delays in management, which are further compounded in low-resource and rural settings.²¹ There has been increasing research into the potential of large language models, a subset of generative AI, to help providers (neurologists and non-neurologists) more accurately and efficiently triage, diagnose and manage headache patients.²²

INTERVENTION:
Generative AI with LLMs like OpenAI’s generative pre-trained transformers (ChatGPT-3, ChatGPT-4) is being increasingly studied and implemented throughout medicine, from virtual assistants to clinical decision support.^23–26 Despite the promise of utilizing LLMs for medical purposes, the data on the diagnostic accuracy of generative AI compared to physicians is mixed.^27–31 With respect to LLMs in particular, ChatGPT-3 has been used to develop lists of five differential diagnoses based on ten mock clinical vignettes and contained the correct diagnosis upwards of 80% of the time.³² LLMs may outperform physicians in terms of diagnosis and clinical reasoning in certain circumstances, but may actually hamper physician accuracy when used without appropriate training.³³ That being said, these systems will need to be investigated further given continued concerns about accuracy, reliability and bias in clinical settings.³⁴ LLMs can incorporate clinical practice guidelines³⁵ and can be integrated into clinical workflows for clinical decision support such as structured symptom assessments for patient triage, risk stratification and initial recommendations. As a result, there is ongoing research into how to appropriately implement these systems,^36,37 including for headache diagnosis and management. Headache classification is a common application of generative AI with PVAs serving as a source of rich phenotypic data.³⁸ Technological advances such as enhanced LLMs that incorporate clinical practice guidelines have the opportunity to increase clinical efficiency, improve access to neurological care and promote diagnostic stewardship, all the while decreasing overutilization of resources and driving down costs.

Approach:

Hypothesis: We hypothesize that a chatbot-delivered PVA combined with an LLM can efficiently collect a comprehensive patient history, accurately diagnose headache syndromes, and provide imaging recommendations similar to an in-person neurologist.

Aim 1: Evaluate the diagnostic accuracy and imaging recommendations of an LLM for primary and secondary headaches. We hypothesize that a fine-tuned LLM using the results of a chatbot-delivered PVA will diagnose headache disorders and make imaging recommendations with a concordance of at least 0.8 with a blinded in-person neurologist.

Aim 2: Evaluate clinical efficiency, resource utilization and cost reductions associated with implementation of a chatbot-delivered PVA combined with an LLM for headache management. After implementation for new headache patient referrals, we will calculate the time to diagnosis (by fine-tuned LLM and then in-person neurologist) as well as record if the LLM recommended neuroimaging, if patients underwent neuroimaging and the costs associated with neuroimaging (based on EHR logs and administrative billing records). We will compare the aforementioned metrics during the 6 months before and after implementation of the chatbot-delivered PVA and LLM into clinical workflows.

Methods: Development and Validation

160 participants will be recruited to achieve a 95% CI with a sensitivity of 0.80 and a margin of error of ±0.062. and include adult English-speaking patients scheduled to be seen in the UCSF General Neurology Clinic for headache management within 6 months of enrollment. This study will be conducted in collaboration with the neurology and computer science departments at Emory University, who will provide chatbot and secure server support.

Participants will be randomized to either a 1) person-delivered PVA administered by study staff via Zoom or 2) an automated chatbot-delivered PVA. A chatbot for the PVA was developed and will be deployed via Emory’s Amazon Web Services (AWS; cloud computing platform for storage, computing, and databases). Participants will undergo 1:1 block randomization between the 2 arms.

Audio recordings of the PVAs will be transcribed to text via OpenAI’s Whisper (automated speech recognition system). Recordings and transcripts will be stored on the UCSF Research Analysis Environment (data hosting and collaboration tool). Whisper- and chatbot-generated transcripts of the PVA will be entered into UCSF Versa, a HIPAA-complaint AI platform that allows users to interact with ChatGPT-3.5 and ChatGPT-4.0. UCSF Versa will be prompted to analyze the transcripts, generating both a diagnosis based on International Classification of Headache Disorders 3 criteria⁴¹ and a positive or negative recommendation for neuroimaging.

All patients will receive a detailed clinical history and neurological physical examination, which will be documented by the blinded in-person neurologist. Blinded study staff will review the neurologist’s clinic note for each patient and enter the diagnosis and imaging recommendations into REDCap.

This proposal has already been approved by the UCSF IRB (23-40675). Development of a REDCap Database and person-delivered PVA questionnaire has been completed. Participant recruitment has started; 10 have completed PVAs.

Methods: Implementation and Evaluation

Subsequently, the chatbot-delivered PVA will be implemented into the clinical workflow for new headache patient management in the UCSF General Neurology Clinic. New headache patient referrals will be randomized to either chatbot-delivered PVA and LLM evaluation vs standard of care. Results of the chatbot-delivered PVA and LLM will be sent to primary care physicians for headache management. The time to diagnosis (via LLM or neurologist), neuroimaging recommended/obtained and cost of the associated neuroimaging obtained will be calculated for the 6 months before and after clinical implementation. And the results will be compared between the two cohorts.

UCSF General Neurology Clinic:

Provider Characteristics: There are 9 board certified neurologists and 1 physician assistant employed.

Staff Characteristics: There are multiple patient coordinators, nurses and other staff.

Patient Characteristics: Adult patients 18+ years old

Potential Barriers to Implementation:

LLMs are being increasing used in medicine for various purpose from administrative tasks to clinical decision support.⁴ That being said there remain concerns related to the accuracy, reliability, and bias associated with LLMs as up as until recently they maintained limited clinical reasoning and produced hallucinations.³⁴ This LLM will be enhanced with the ICHD3 criteria for headache diagnosis in order to improve diagnostic accuracy. And the initial phase of the project involves validation of the LLM’s diagnostic performance. Moreover, effective engagement with these systems by patients necessitates a level of digital health literacy, which could be a potential source of equity bias. Patients may benefit from educational resources to facilitate engagement with the chatbot-delivered PVA. There is a need for standardized validation procedures and actionable guidelines for healthcare organizations as well as providers to ensure responsible implementation of LLM in the clinical setting.³⁴

We anticipate that the risks to the patient are minimal. Some patients have reservations about the accuracy, lack of empathy, and potential for privacy breeches associated with AI-integrated technologies.⁴² Patient data is de-identified and we rigorously adhere to IRB protocols and privacy standards to ensure that patient confidentiality is fully maintained. Moreover, while the goal is to reduce the overutilization of neuroimaging for uncomplicated headaches, a proportion of patients may receive a recommendation to obtain neuroimaging, which uncovers an incidental finding and leads to subsequent unnecessary testing and avoidable patient anxiety.

PROPOSED EHR MODIFICATIONS:

After the initial validation and implementation of the chatbot-delivered PVA and LLM, it could be incorporated into the electronic health record and require the following features 1) inclusion of MyChart Link for patients to access chatbot-delivered PVA and 2) an automated process to capture the results of the chatbot-delivered PVA with LLM and send results to referring providers and future neurologist.

RETURN ON INVESTMENT (ROI)

MRIs alone account for 51% of total healthcare expenditures in outpatient neurology.⁴¹ And the average cost of an MRI ranges from $1,600 – 8,400⁴² depending on extent of structures evaluated, insurance coverage, healthcare facility location, healthcare facility type, and need for contrast media. One Choosing Wisely campaign assessing the impact of the campaign on the use of brain MRI in preterm infants was associated with a significant decrease in non-indicated MRIs with expenditures decreasing from $1.3M in 2006 to $260,000 in 2016.⁴³ While another study revealed that a Choosing Wisely campaign for headaches was associated with a significant reduction in imaging for uncomplicated headaches from 10.8% to 6.9% in a 3-year period, which led to significant savings.⁴⁴ Over the long-term period, indirect cost savings will include reductions associated with a decrease in acute care utilization (e.g., ED visits, inpatient admissions), decrease in unnecessary diagnostics following up incidental findings as well as improvements in lost productivity of patients.

SUSTAINABILITY

The Technology Chief for UCSF’s General Neurology Division will be responsible for ongoing enhancements to the system. The Division Chief for UCSF’s General Neurology Division will be the executive sponsor for oversight and budgeting operational resources.

BUDGET

Faculty Protected Salary Time for Project Implementation: $38,000.00

Research Coordinator Stipend(s): $12,000.00

Supporting Documents:

HEAD GAIN REFERENCES