Rationale: Inaccurate sample size estimation leads to research studies that enroll too few or too many study participants. The former can result in failure to demonstrate anticipated effects and the latter to excessive costs, due to overuse of research participants and personnel time. While there are several important components of a good sample size calculation – including a compelling research question, an appropriate outcome variable, and an efficient study design – here we focus on improving the accuracy of the quantitative inputs. Examples of such parameters include:
- the variability of responses within and between eligible participants (accounting for correlations among replicate measurements at a given time and/or at different times),
- the mean response under standard-of-care conditions, and
- the prevalence of the disease under study.
At present, the primary sources of parameter values are pilot studies (often too small to provide reliable estimates) and published studies (which often lack the needed values). Estimates of parameter values that are not evidence-based can introduce a large amount of error into the sample-size calculation, making it precise but inaccurate. The proposed web-based application would allow a clinician and biostatistician to identify required values in real time during a consulting visit, revising the query as needed to ensure a viable and cost-efficient study. Given the enormous number of medical studies launched every year, access to accurate inputs to sample size calculations could vastly reduce waste of valuable resources.
Plan: The two key ingredients to obtaining a wide range of evidence-based inputs for sample size calculations are the availability of electronic sources of current health information and a convenient means of retrieving relevant information from the databases. During pilot funding, we will demonstrate feasibility of (1) identifying relevant existing databases; (2) quantifying improved accuracy of sample-size calculations based on of evidence-based parameters (as defined here) relative to those based on other sources of inputs; and (3) producing version 1 of a user-friendly web-based interface to access, summarize, and output evidence-based parameters in useful formats. Beyond pilot funding, additional databases could be tapped, and the breadth of access, summarization, and output could be expanded.
Aim 1: Identify Databases:
- Ideal databases should provide longitudinally tracked individual-level health outcomes related to a wide range of conditions. Two excellent examples appear to be (1) Databases included in the UCSF CTSI Large Dataset Inventory, accessible at no cost through http://accelerate.ucsf.edu/research/celdac. (Example: National Health Interview Survey) (2) The Kaiser healthcare database. [Must: Identify key personnel who could provide access, engage their interest, and establish an agreement to collaborate.]
- With clinical partners: Establish a set of commonly expected requests that could be used to evaluate candidate databases on quality and availability of desired data.
- With database partners and computer experts: Optimize access to and manual retrieval of information.
Aim 2: Proof of Concept
- Identify commonly used outcome variables and alternative choices through review of the published literature and discussion with clinical colleagues.
- Identify a selection of recently published high-profile studies that reported the values of key study design parameters. For each: (1) Document the eligibility criteria and parameter values that were used in the study design. (2) Document corresponding values drawn from the selected databases. (3) Examine the effect on the sample size of differences between the two sets of parameter values.
Aim 3: Create a web-based user interface, a manual of procedures with useful examples, and useful export files.
- User experience:
o Make retrieval fully dynamic: any outcome stored in the database; retrieval tailored to major eligibility criteria (e.g., age and diagnosis) and design criteria (e.g., frequency of assessment per patient).
o Use drop-down menus, populated by database-specific data dictionaries, to ensure accurate spelling.
o Produce results that are easy for the investigator or biostatistician to manipulate. The software will process all (or a random sample of) eligible values to generate rates (see example appended), means, variances, and correlations, as needed.
o Generate downloadable documentation of queries for user’s later access.
- Developer experience:
o As queries of a database are made, record queries (including search criteria) and results.
o Identify unavailable measures; examine reasons. Automate or prompt search for an alternative database and/or measure.
Criteria and metrics for success:
Aim 1: Compare the proposed database resources in terms of features, data quality, and costs:
- Are Common Data Elements used?
- Is a data dictionary available for sorting and browsing to find measures available?
- Does resource have the requested measures?
- How current are the measures?
Aim 2: For a range of recently published studies that reported the values of key study design parameters, evaluate the effect on the sample-size calculation of differences between reported parameter values and values retrieved from our proposed database resource(s). Hypothesis: Evidence-based values will modify the sample size calculation by at least 10%.
Aim 3: Compare the proposed database resource(s) in terms of ease of access and value of information gained:
- Poll users to obtain feedback on ease of use and value of information retrieved, by database.
- Summarize measures queried by frequency.
- Characterize variation among databases with respect to comprehensiveness and ease of use.
- Estimate personnel costs associated with building database access.
Cost: We seek funding to access at least two large free databases by leveraging the CTSI Large Database Inventory [Aim 1], to quantify the benefit of evidence-based parameter values on sample-size calculations [Aim 2], and to plan the computational work in fine detail [Aim 3]. Salary support $100,000 (12 months) for principal faculty and staff.
Collaborators: Joan Hilton (Epidemiology & Biostatistics) will lead the biostatistical aspects of the “App” development. Tracy van Nunnery (Medicine) will lead a team of computer experts who will create the database interfaces. Kirsten Bibbins-Domingo (Medicine) will serve as lead clinical collaborator.
Example interface and output: Dr Hilton and Mr Nunnery have on-going research collaborations that began in 2007. Mr. Nunnery and his team created HERO, the electronic medical record system used at Ward 86 of SFGH, and thus are well acquainted with HIPAA requirements. As an example of their work, users (clinicians) can query HERO to obtain the distribution of any patient characteristic captured by clinicians, limited to user-specified search criteria. The web-based interface for retrieving demographic data is shown below, with the date fields displayed (upper image). The distributions of demographic characteristics were exported to an Excel spreadsheet (lower image).
Commenting is closed.
Comments
Sounds like a good idea -
Thanks for your support! I
The quantities to be
Peter, I appreciate that you
Point 1 is the key one, so I