Artificial Intelligence / Machine Learning Demonstration Projects 2025

Crowdsourcing ideas to bring advances in data science, machine learning, and artificial intelligence into real-world clinical practice.

AI-Augmented Lung Nodule Tracking

Proposal Status: 

The UCSF Problem:

Lung nodules are a common finding on CT and are often benign but rarely may represent malignancy. To monitor for potential malignancy, follow-up imaging is often recommended. If there is concern for malignancy, adequate follow-up allows earlier diagnosis and treatment, while lack of follow-up may lead to progression and worse prognosis.  

Lung nodules are found in two different ways: lung cancer screening CTs, which are CT scans to screen patients who are at high risk for lung cancer, and those incidentally found on other CT scans. At UCSF, we perform around 400 screening CTs a year, a number that is dwarfed by the number of incidentally found nodules. Approximately 65,000 CT chests for other indications were performed in 2024, and population data has noted that around 31% of CT chest scans contain a nodule. At UCSF around 4,000 CTs in 2024 have the word “nodule” or “follow-up” in their impression1. While not all of these may represent lung nodules requiring follow-up, the number is likely in the thousands.

Here a UCSF, we have a lung nodule nurse coordinator to follow-up nodules found on lung cancer screening CTs. These are tracked on an Excel spreadsheet. Unlike many peer institutions such as the VA and SFGH, we have no system-wide tracking program for incidentally found nodules. Concerning nodules must be tracked by the ordering physician or a patient’s primary care doctor, which may be challenging without a centralized system for tracking these nodules and risks a patient not obtaining adequate follow-up.

 

How might AI help?

Manually reviewing all 65,000 CT scans per year to find concerning nodules would be time-consuming and cost prohibitive. However, large language models could efficiently and cost-effectively parse through reports and extract the key information needed to determine the optimal follow-up time. This information can then be used to populate a dashboard, enabling coordinators to track suspicious nodules, identify if a patient is overdue to follow-up, and easily reach out to patients.

 

How would an end-user find it and use it?

Our AI tool would be limited to users part of the pulmonary nodule program including lung nodule coordinators and pulmonologists. We will directly coordinate with the relevant individuals so that they are aware of and understand how to use the AI-enabled dashboard.

 

Picture of the AI tool:

Using Versa API, we will extract the information needed for clinical decision-making about lung nodules from radiology reports. This includes the number of nodules and recommended follow-up time if suggested by the radiologist. For nodules 6mm or greater, which are considered higher risk, we will extract the size of the nodule, location, and characteristics such as groundglass, solid, or subsolid as these would change the recommended follow-up time as per the guidelines. This data will be extracted as structured data using OpenAI’s structured outputs and then be used to build a dashboard. The dashboard will contain key patient information, including age, MRN, date of CT, characteristics noted above, if a repeat CT is scheduled and when, and if a referral to a relevant subspecialty is ordered as shown in Figure 1. This will allow for easy filtering of which patients have been connected to care and which patients may benefit from outreach.

 

Figure 1: Mock-up of dashboard 

What are the risks of AI errors?

There are two primary risks to introducing AI for lung nodule tracking:

  1. A significant nodule requiring follow-up is not reported: The nodule would then not be included in the monitoring dashboard. However, CT scan results would be sent to the ordering user for follow-up. At present, incidental lung nodules are not tracked and so this workflow would be the same as the current state.
  2. Inaccurate description of the nodule or its characteristics: It is possible that the model may describe a nodule when one does not exist or incorrectly state the characteristics of the nodule. The dashboard will include a copy of the original report to verify that the nodule exists and the characteristics of the lung nodule.

 To mitigate against these risks, we will perform robust testing prior to launch on retrospective data to assess accuracy and identify any characteristics that lead to a higher risk of hallucination.

Our preliminary testing on 200 CT chest reports from 2024 containing the word nodule notes a 96.4% accuracy and 1.5% hallucination rate. We achieved a 95% sensitivity and 97.1% specificity in identifying nodules 6mm or greater, a key cutoff point in the lung nodule guidelines and when estimating the risk of malignancy. Our precision was 93.5% and F1 score was 94.3%. Continuous performance monitoring will be performed to ensure persistent high accuracy and hallucination rate. A lung nodule coordinator will also have access to the full CT scan report and will serve as the human-in-the-loop to verify all characteristics.

 

How will we measure success?

We will measure success by:

-       Sensitivity and specificity: We will assess the sensitivity and specificity of the model in detecting nodules requiring follow-up.

-       Percentage of patients who obtain recommended follow-up images: We will assess what percentage of patients with lung nodules obtain the recommended follow-up imaging within 3 months of the recommended time frame.

-       Frequency of outreach: We will track how often the lung nodule monitoring team reaches out to patients with lung nodules.

-       Qualitative feedback: We will obtain feedback and measuretool satisfaction, usability, helpfulness, and perception of accuracy from those involved in the lung nodule monitoring program

-       Time to referral: We will measure time to referralfrom identification of the nodule to the next recommended step of care, including interventional pulmonary, oncology, thoracic surgery, and interventional radiology.

-       Equity and bias analysis

 

Describe your qualifications and characteristics:

Cat Blebea, MD: Dr. Blebea is a current pulmonary and critical care fellow and clinical informatics fellow. She frequently manages patients with pulmonary nodules in clinic and has first-hand experience with pulmonary nodule monitoring programs at the VA and SFGH. She also as experience using large language models to improve patient care as a member of the prompt engineering team for the Intelligent InBasket project, which uses large language models to create draft responses to patient messages.

Leo Liu, MD: Dr. Liu is an informaticist and hospitalist at UCSF. He serves as the physician lead for inpatient informatics at St. Mary’s and St. Francis, associate program director for the clinical informatics fellowship, and director of the GME Clinical Informatics, Data Science and Artificial Intelligence Pathway.

References:

1.         Gould MK, Tang T, Liu ILA, et al. Recent Trends in the Identification of Incidental Pulmonary Nodules. Am J Respir Crit Care Med. 2015;192(10):1208-1214. doi:10.1164/rccm.201505-0990OC

 

Summary of Open Improvement Edits: 

Updated proposal with recent preliminary data findings. 

Supporting Documents: 

Comments

I like this idea.  How much staff time would need to be dedicated to monitoring the nodule dashboard and messaging patients?  Also, you are noting that a human would be in the loop to "verify all characteristics"...won't that still take a long time?  How good would the NLP need to be so that you could trust it without verifying?

Thank you for your comments! I’ve done some preliminary work to determine the volume of patients that might require staff engagement. Using the current model with 95% sensitivity and 97.1% specificity in detecting nodules 6mm of greater (a significant cut-off point in management and risk per guidelines), the model found 7,399 CTs in 2024 with nodules 6mm+. A radiologist had recommended a repeat scan in less than 12 months for 1,409 CTs. Some of those patients will have CTs schedule their follow-up without needing any outreach and some of these CTs may have been for the same patient. Our next step is to determine how many of those patients were overdue for their follow-up and need outreach, and this work is currently in process. If we imagine 25% of patients need outreach, which in my clinical experience seems high but would provide a conservative estimate, and that it takes 15 minutes for outreach, it could take up around 2 weeks of staff time per year. While this certainly isn’t insignificant, it does seem feasible. 

 

Many nodules on CT already have a known cause, such as metastases from a known cancer, so in it’s pilot phase we’d focus on CTs that already have a recommendation for follow-up by the radiologist. That really limits the number of chest CTs. At least when I’ve been reviewing the results, it takes me about an hour to review 100 results for accuracy, so it should only take about 15 minutes per week! 

 

It’s hard to give a concrete number on how good it’d have to be to not verify the results. With the limitations inherent in current models and that this impacts patient care, I don’t think there’s a number where I’d feel comfortable saying it does not need verification. It’s possible that might change as more sophisticated models are developed, but I don’t think we’re there yet. Maybe one day! 

Excellent concept and application of AI to significantly upgrade UCSF's current process on an important clinical issue, especially having cared for many pts with lung nodules who unfortunately fell through the cracks with either not being aware/informed of these findings or lost to monitoring efforts. I was surprised to learn UCSF doesn't have a tracking program, while SFGH/SFVA do, as well as that the current platform is an Excel spreadsheet...! This proposal definitely would improve upon time-intensive manual chart reviews and free up the coordinator. I appreciate the cost-effective attribute and plan to assess for equity/bias. I would imagine another metric could be estimated time saved vs the current state. I could envision that this dashboard could ultimately be expanded to include PCPs, as well as scaleable to other EPIC institutions.  This also seems like a great proof-of-concept that could be generalizable to other radiographic findings, incidental or not (ex: meningiomas, adrenal adenomas). Great work here!