21c. Missing Data

What to write

How missing data were handled in the analysis

Examples

“Regarding the multiple imputation procedure, briefly, for each outcome, the analysis model used was a linear regression with treatment arm, baseline outcome, and ethnicity (randomization stratifier) as explanatory variables. The imputation models contained all the variables of the analysis model(s) as well as factors associated with missingness: age (identified empirically to predict missingness, P = .03) and adherence (number of doses taken of either vitamin D or placebo, P < .001).”1

“To consider the potential impact of missing data on trial conclusions, we used multiple imputation (data missing at random) and sensitivity analysis (data not missing at random). Multiple imputation by chained equations was performed using the”mi impute chained” command in Stata. We used a linear regression model to impute missing outcomes for the HOS ADL [activities of daily living subscale of the hip outcome score] at eight months post-randomisation. Variables in the imputation model included all covariates in the analysis model (baseline HOS ADL (continuous), age (continuous), and sex). In addition, we included other variables that were thought to be predictive of the outcome (lateral centre-edge angle, maximum α angle, Kellgren-Lawrence grade, and baseline HADS score). Imputations were run separately by treatment arm and based on a predictive mean matching approach, choosing at random one of the five HOS ADL values with the closest predicted scores. Missing data in the covariates that were included in the multiple imputation model were imputed simultaneously (multiple imputation by chained equation approach). Sensitivity analysis was performed using the “rctmiss” command in Stata, and we considered scenarios where participants with missing data in each arm were assumed to have outcomes that were up to 9 points worse than when data were missing at random.”2

“Analyses for the 2 primary outcomes compared each treatment with usual care using multiple imputation to handle missing data and a Bonferroni-corrected 2-tailed type I error of .025. We performed 20 imputations with a fully conditional specification using Proc MI in SAS. Imputation was performed with the following prespecified variables: age, study group, study site, clinic, sex, race and ethnicity, body mass index, exercise frequency at baseline, education, employment status, smoking status, other medical conditions at baseline, number of medications used for spine pain at baseline, duration of pain at baseline, number of previous pain episodes, STarT Back score, baseline ODI, baseline self-efficacy, baseline EQ-5D-5L, and scores for patient-reported outcomes at every follow-up point (ODI [Oswestry Disability Index], cost, Lorig et al self-efficacy scale, and EQ-5D-5L [EuroQol 5-dimensional 5-level questionnaire]). Each imputed data set was analyzed separately using Proc GENMOD in SAS (with an identity link and normally distributed errors for ODI and a log link and Poisson-distributed errors for spine-related spending).”3

“Missing peak V̇o2 [oxygen consumption] data at week 20, regardless of the type of intercurrent event, was imputed using multiple imputation methodology under the missing at random assumption for the primary analysis. Sensitivity analyses were performed by exploring a missing not at random assumption in the imputation of peak V̇o2. The imputation model used a regression multiple imputation, which includes treatment group, baseline respiratory exchange ratio, persistent atrial fibrillation (yes or no), age, sex, baseline peak V̇o2, baseline hemoglobin level, baseline estimated glomerular filtration rate, baseline body weight, baseline KCCQ [Kansas City cardiomyopathy questionnaire] total symptom score, baseline NYHA [New York Heart Association] class, and baseline average daily activity units (refers to 10 hours of wearing during the awake time for ≥7 days unless otherwise specified). Treatment group, persistent atrial fibrillation (yes or no), baseline NYHA class, and sex were treated as categorical variables. Fifty imputed data sets were generated. Each of the imputed data sets was analyzed using the analysis of covariance model of the primary analysis. Least square mean (LSM) treatment difference and the standard error were combined using Rubin’s rules to produce an LSM estimate of the treatment difference, its 95% CI [confidence interval], and P value for the test of null hypothesis of no treatment effect.”4

“Multiple imputation was preplanned for the primary outcome measure in the case of missing data; however, because there were no missing data relating to ventilator-free days, imputation was not required.”5

Explanation

Missing data are common when conducting medical research. Collecting data on all study participants can be challenging even in a trial that has mechanisms to maximise data capture. Missing values can occur in either the outcome or in one or more covariates, or usually both. There are many reasons why missing values occur in the outcome. Patients may stop participating in the trial, withdraw consent for further data collection, or fail to attend follow-up visits; all of which could be related to the treatment allocation, specific (prognostic) factors, or experiencing a specific health outcome.6 Missing values could also occur in baseline variables, such that all the necessary data needed to conduct the trial have been only partially recorded. Despite the ubiquity of missing data in medical research, the reporting of missing data and how they are handled in the analyses is poor.7814 15

Many trialists exclude patients without an observed outcome. Once any randomised participants are excluded, the analysis is not strictly an intention-to-treat analysis. Most randomised trials have some missing observations. Trialists effectively must choose between omitting the participants without final outcome data, imputing their missing outcome data, or using model based approaches such as fitting a linear mixed model to repeated measures data.16 A complete case (or available case) analysis includes only those participants whose outcome is known. While a few missing outcomes will not cause a problem, many trials have more than 10% of randomised patients with missing outcomes.7814 This common situation will result in loss of power by reducing the sample size, and bias may well be introduced if being lost to follow-up is related to a participant’s response to treatment. There should be concern when the frequency or the causes of dropping out differ between the intervention groups.

Participants with missing outcomes can be included in the analysis if their outcomes are imputed (ie, their outcomes are estimated from other information that was collected) or if using a model based approach. Imputing the values of missing data allows the analysis to potentially conform to intention-to-treat analysis but requires strong assumptions, which may be hard to justify. Simple imputation methods are appealing, but their use may be inadvisable as they fail to account for uncertainty introduced by missing data and may lead to invalid inferences (eg, estimated standard errors for the treatment effect will be too small).17 For randomised trials with missing data within repeated measures data, model based approaches such as fitting a linear mixed model can be used to estimate the treatment effect at the final time point which is valid under a missing-at-random assumption. A model is fit at a (limited) number of time points following randomisation, by including fixed effects for time and randomised group and their interaction.16

Another approach that is sometimes used is known as “last observation carried forward,” in which missing final values of the outcome variable are replaced by the last known value before the participant was lost to follow-up. Although this method might appear appealing through its simplicity, the underlying assumption will rarely be valid, so the method may introduce bias, and makes no allowance for the uncertainty of imputation. The approach of last observation carried forward has been severely criticised.1820 Sensitivity analyses should be reported to understand the extent to which the results of the trial depend on the missing data assumptions and subsequent analysis (item 21d).21 When the findings from the sensitivity analyses are consistent with the results from the primary analysis (eg, complete case for the primary analysis and multiple imputation for a sensitivity analysis), trialists can be reassured that the missing data assumptions and associated methods had little impact on the trial results.22

Regardless of what data are missing, how such data are to be analysed and reported needs to be carefully planned. Authors should provide a description on how missing data were handled in sufficient detail to allow for the analysis to be reproduced (in principle; see the box below).

Guidance for reporting analytical approaches to handle missing data (adapted from Hussain et al23)

Methods

  1. Report any strategies used to reduce missing data throughout the trial process.

  2. Report if and/or how the original sample size calculation accounted for missing data (item 16a) and the justification for these decisions. Report if and/or how the sample size was reassessed during the course of the trial (item 16b).

  3. Report the assumption about the missing data mechanism for the primary analysis and the justification for this choice, for all outcomes. For multiple imputation methods, report24:

    • What variables were included in the imputation procedure?

    • How were non-normally distributed and binary/categorical variables dealt with?

    • If statistical interactions were included in the final analyses (item 21a), were they also included in imputation models?

    • Was imputation done separately or by randomised group?

    • How many imputed datasets were created?

    • How were results from different imputed datasets combined?

  4. Report the method used to handle missing data for the primary analysis (eg, complete case, multiple imputation) and the justification for the methods chosen, for all outcomes. Include whether or which auxiliary variables were collected and used.

  5. Report the assumptions about the missing data mechanism (eg, missing at random) and methods used to conduct the missing data sensitivity analyses for all outcomes, and the justification for the assumptions and methods chosen.

  6. Report how data that were truncated due to death or other causes were handled with a justification for the method(s) (if relevant).

Results

  1. Report the numbers and proportions of missing data in each trial arm.

  2. Report the reasons for missing data in each trial arm.

  3. Report a comparison of the characteristics of those with observed and missing data.

  4. Report the primary analysis based on the primary assumption about the missing data mechanism, for all outcomes.

  5. Report results of the missing data sensitivity analyses for all outcomes. As a minimum, a summary of the missing data sensitivity analyses should be reported in the main paper with the full results in the supplementary material.

Discussion

  1. Discuss the impact of missing data on the interpretation of findings, considering both internal and external validity. For multiple imputation, include whether the variables included in the imputation model make the missing-at-random assumption plausible.

Training

The UK EQUATOR Centre runs training on how to write using reporting guidelines.

Discuss this item

Visit this items’ discussion page to ask questions and give feedback.

References

1.
Gaughran F, Stringer D, Wojewodka G, et al. Effect of vitamin d supplementation on outcomes in people with early psychosis: The DFEND randomized clinical trial. JAMA Network Open. 2021;4(12):e2140858. doi:10.1001/jamanetworkopen.2021.40858
2.
Palmer AJR, Ayyar Gupta V, Fernquest S, et al. Arthroscopic hip surgery compared with physiotherapy and activity modification for the treatment of symptomatic femoroacetabular impingement: Multicentre randomised controlled trial. BMJ. Published online February 2019:l185. doi:10.1136/bmj.l185
3.
Choudhry NK, Fifer S, Fontanet CP, et al. Effect of a biopsychosocial intervention or postural therapy on disability and health care spending among patients with acute and subacute spine pain: The SPINE CARE randomized clinical trial. JAMA. 2022;328(23):2334. doi:10.1001/jama.2022.22625
4.
Lewis GD, Voors AA, Cohen-Solal A, et al. Effect of omecamtiv mecarbil on exercise capacity in chronic heart failure with reduced ejection fraction: The METEORIC-HF randomized clinical trial. JAMA. 2022;328(3):259. doi:10.1001/jama.2022.11016
5.
Schlapbach LJ, Gibbons KS, Horton SB, et al. Effect of nitric oxide via cardiopulmonary bypass on ventilator-free days in young children undergoing congenital heart disease surgery: The NITRIC randomized clinical trial. JAMA. 2022;328(1):38. doi:10.1001/jama.2022.9376
6.
Akl EA, Shawwa K, Kahale LA, et al. Reporting missing participant data in randomised trials: Systematic survey of the methodological literature and a proposed guide. BMJ Open. 2015;5(12):e008431. doi:10.1136/bmjopen-2015-008431
7.
Wood AM, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals. Clinical Trials. 2004;1(4):368-376. doi:10.1191/1740774504cn032oa
8.
Bell ML, Fiero M, Horton NJ, Hsu CH. Handling missing data in RCTs; a review of the top medical journals. BMC Medical Research Methodology. 2014;14(1). doi:10.1186/1471-2288-14-118
9.
Ibrahim F, Tom BDM, Scott DL, Prevost AT. A systematic review of randomised controlled trials in rheumatoid arthritis: The reporting and handling of missing data in composite outcomes. Trials. 2016;17(1). doi:10.1186/s13063-016-1402-5
10.
Joseph R, Sim J, Ogollah R, Lewis M. A systematic review finds variable use of the intention-to-treat principle in musculoskeletal randomized controlled trials with missing data. Journal of Clinical Epidemiology. 2015;68(1):15-24. doi:10.1016/j.jclinepi.2014.09.002
11.
Kahale LA, Diab B, Khamis AM, et al. Potentially missing data are considerably more frequent than definitely missing data: A methodological survey of 638 randomized controlled trials. Journal of Clinical Epidemiology. 2019;106:18-31. doi:10.1016/j.jclinepi.2018.10.001
12.
Kearney A, Rosala-Hallas A, Rainford N, et al. Increased transparency was required when reporting imputation of primary outcome data in clinical trials. Journal of Clinical Epidemiology. 2022;146:60-67. doi:10.1016/j.jclinepi.2022.02.008
13.
Khan NA, Torralba KD, Aslam F. Missing data in randomised controlled trials of rheumatoid arthritis drug therapy are substantial and handled inappropriately. RMD Open. 2021;7(2):e001708. doi:10.1136/rmdopen-2021-001708
14.
Tan PT, Cro S, Van Vogt E, Szigeti M, Cornelius VR. A review of the use of controlled multiple imputation in randomised controlled trials with missing outcome data. BMC Medical Research Methodology. 2021;21(1). doi:10.1186/s12874-021-01261-6
15.
Zhang Y, Flórez ID, Colunga Lozano LE, et al. A systematic survey on reporting and methods for handling missing participant data for continuous outcomes in randomized controlled trials. Journal of Clinical Epidemiology. 2017;88:57-66. doi:10.1016/j.jclinepi.2017.05.017
16.
Sullivan TR, Morris TP, Kahan BC, Cuthbert AR, Yelland LN. Categorisation of continuous covariates for stratified randomisation: How should we adjust? Statistics in Medicine. 2024;43(11):2083-2095. doi:10.1002/sim.10060
17.
Schafer JL. Multiple imputation: A primer. Statistical Methods in Medical Research. 1999;8(1):3-15. doi:10.1177/096228029900800102
18.
Lachin JM. Fallacies of last observation carried forward analyses. Clinical Trials. 2015;13(2):161-168. doi:10.1177/1740774515602688
19.
Molnar. 2009;3.
20.
Kenward MG, Molenberghs G. Last observation carried forward: A crystal ball? Journal of Biopharmaceutical Statistics. 2009;19(5):872-888. doi:10.1080/10543400903105406
21.
Morris TP, Kahan BC, White IR. Choosing sensitivity analyses for randomised trials: principles. BMC Medical Research Methodology. 2014;14(1). doi:10.1186/1471-2288-14-11
22.
Food and drug administration. E9 (R1) statistical principles for clinical trials: Addendum: Estimands and sensitivity analysis in clinical trials. Guidance for industry. May 2021.
23.
Hussain JA, White IR, Johnson MJ, et al. Development of guidelines to reduce, handle and report missing data in palliative care trials: A multi-stakeholder modified nominal group technique. Palliative Medicine. 2022;36(1):59-70. doi:10.1177/02692163211065597
24.
Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ. 2009;338(jun29 1):b2393-b2393. doi:10.1136/bmj.b2393

Reuse

Most of the reporting guidelines and checklists on this website were originally published under permissive licenses that allowed their reuse. Some were published with propriety licenses, where copyright is held by the publisher and/or original authors. The original content of the reporting checklists and explanation pages on this website were drawn from these publications with knowledge and permission from the reporting guideline authors, and subsequently revised in response to feedback and evidence from research as part of an ongoing scholarly dialogue about how best to disseminate reporting guidance. The UK EQUATOR Centre makes no copyright claims over reporting guideline content. Our use of copyrighted content on this website falls under fair use guidelines.

Citation

For attribution, please cite this work as:
Hopewell S, Chan AW, Collins GS, et al. CONSORT 2025 statement: updated guideline for reporting randomised trials. BMJ. 2025;389:e081123. doi:10.1136/bmj-2024-081123

Reporting Guidelines are recommendations to help describe your work clearly

Your research will be used by people from different disciplines and backgrounds for decades to come. Reporting guidelines list the information you should describe so that everyone can understand, replicate, and synthesise your work.

Reporting guidelines do not prescribe how research should be designed or conducted. Rather, they help authors transparently describe what they did, why they did it, and what they found.

Reporting guidelines make writing research easier, and transparent research leads to better patient outcomes.

Easier writing

Following guidance makes writing easier and quicker.

Smoother publishing

Many journals require completed reporting checklists at submission.

Maximum impact

From nobel prizes to null results, articles have more impact when everyone can use them.

Who reads research?

You work will be read by different people, for different reasons, around the world, and for decades to come. Reporting guidelines help you consider all of your potential audiences. For example, your research may be read by researchers from different fields, by clinicians, patients, evidence synthesisers, peer reviewers, or editors. Your readers will need information to understand, to replicate, apply, appraise, synthesise, and use your work.

Cohort studies

A cohort study is an observational study in which a group of people with a particular exposure (e.g. a putative risk factor or protective factor) and a group of people without this exposure are followed over time. The outcomes of the people in the exposed group are compared to the outcomes of the people in the unexposed group to see if the exposure is associated with particular outcomes (e.g. getting cancer or length of life).

Source.

Case-control studies

A case-control study is a research method used in healthcare to investigate potential risk factors for a specific disease. It involves comparing individuals who have been diagnosed with the disease (cases) to those who have not (controls). By analysing the differences between the two groups, researchers can identify factors that may contribute to the development of the disease.

An example would be when researchers conducted a case-control study examining whether exposure to diesel exhaust particles increases the risk of respiratory disease in underground miners. Cases included miners diagnosed with respiratory disease, while controls were miners without respiratory disease. Participants' past occupational exposures to diesel exhaust particles were evaluated to compare exposure rates between cases and controls.

Source.

Cross-sectional studies

A cross-sectional study (also sometimes called a "cross-sectional survey") serves as an observational tool, where researchers capture data from a cohort of participants at a singular point. This approach provides a 'snapshot'— a brief glimpse into the characteristics or outcomes prevalent within a designated population at that precise point in time. The primary aim here is not to track changes or developments over an extended period but to assess and quantify the current situation regarding specific variables or conditions. Such a methodology is instrumental in identifying patterns or correlations among various factors within the population, providing a basis for further, more detailed investigation.

Source

Systematic reviews

A systematic review is a comprehensive approach designed to identify, evaluate, and synthesise all available evidence relevant to a specific research question. In essence, it collects all possible studies related to a given topic and design, and reviews and analyses their results.

The process involves a highly sensitive search strategy to ensure that as much pertinent information as possible is gathered. Once collected, this evidence is often critically appraised to assess its quality and relevance, ensuring that conclusions drawn are based on robust data. Systematic reviews often involve defining inclusion and exclusion criteria, which help to focus the analysis on the most relevant studies, ultimately synthesising the findings into a coherent narrative or statistical synthesis. Some systematic reviews will include a [meta-analysis]{.defined data-bs-toggle="offcanvas" href="#glossaryItemmeta_analyses" aria-controls="offcanvasExample" role="button"}.

Source

Systematic review protocols

TODO

Meta analyses of Observational Studies

TODO

Randomised Trials

A randomised controlled trial (RCT) is a trial in which participants are randomly assigned to one of two or more groups: the experimental group or groups receive the intervention or interventions being tested; the comparison group (control group) receive usual care or no treatment or a placebo. The groups are then followed up to see if there are any differences between the results. This helps in assessing the effectiveness of the intervention.

Source

Randomised Trial Protocols

TODO

Qualitative research

Research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or field observations in order to collect data that is rich in detail and context. Qualitative research is often used to explore complex phenomena or to gain insight into people's experiences and perspectives on a particular topic. It is particularly useful when researchers want to understand the meaning that people attach to their experiences or when they want to uncover the underlying reasons for people's behaviour. Qualitative methods include ethnography, grounded theory, discourse analysis, and interpretative phenomenological analysis.

Source

Case Reports

TODO

Diagnostic Test Accuracy Studies

Diagnostic accuracy studies focus on estimating the ability of the test(s) to correctly identify people with a predefined target condition, or the condition of interest (sensitivity) as well as to clearly identify those without the condition (specificity).

Prediction Models

Prediction model research is used to test the accurarcy of a model or test in estimating an outcome value or risk. Most models estimate the probability of the presence of a particular health condition (diagnostic) or whether a particular outcome will occur in the future (prognostic). Prediction models are used to support clinical decision making, such as whether to refer patients for further testing, monitor disease deterioration or treatment effects, or initiate treatment or lifestyle changes. Examples of well known prediction models include EuroSCORE II for cardiac surgery, the Gail model for breast cancer, the Framingham risk score for cardiovascular disease, IMPACT for traumatic brain injury, and FRAX for osteoporotic and hip fractures.

Source

Animal Research

TODO

Quality Improvement in Healthcare

Quality improvement research is about finding out how to improve and make changes in the most effective way. It is about systematically and rigourously exploring "what works" to improve quality in healthcare and the best ways to measure and disseminate this to ensure positive change. Most quality improvement effectiveness research is conducted in hospital settings, is focused on multiple quality improvement interventions, and uses process measures as outcomes. There is a great deal of variation in the research designs used to examine quality improvement effectiveness.

Source

Economic Evaluations in Healthcare

TODO

Meta Analyses

A meta-analysis is a statistical technique that amalgamates data from multiple studies to yield a single estimate of the effect size. This approach enhances precision and offers a more comprehensive understanding by integrating quantitative findings. Central to a meta-analysis is the evaluation of heterogeneity, which examines variations in study outcomes to ensure that differences in populations, interventions, or methodologies do not skew results. Techniques such as meta-regression or subgroup analysis are frequently employed to explore how various factors might influence the outcomes. This method is particularly effective when aiming to quantify the effect size, odds ratio, or risk ratio, providing a clearer numerical estimate that can significantly inform clinical or policy decisions.

How Meta-analyses and Systematic Reviews Work Together

Systematic reviews and meta-analyses function together, each complementing the other to provide a more robust understanding of research evidence. A systematic review meticulously gathers and evaluates all pertinent studies, establishing a solid foundation of qualitative and quantitative data. Within this framework, if the collected data exhibit sufficient homogeneity, a meta-analysis can be performed. This statistical synthesis allows for the integration of quantitative results from individual studies, producing a unified estimate of effect size. Techniques such as meta-regression or subgroup analysis may further refine these findings, elucidating how different variables impact the overall outcome. By combining these methodologies, researchers can achieve both a comprehensive narrative synthesis and a precise quantitative measure, enhancing the reliability and applicability of their conclusions. This integrated approach ensures that the findings are not only well-rounded but also statistically robust, providing greater confidence in the evidence base.

Why Don't All Systematic Reviews Use a Meta-Analysis?

Systematic reviews do not always have meta-analyses, due to variations in the data. For a meta-analysis to be viable, the data from different studies must be sufficiently similar, or homogeneous, in terms of design, population, and interventions. When the data shows significant heterogeneity, meaning there are considerable differences among the studies, combining them could lead to skewed or misleading conclusions. Furthermore, the quality of the included studies is critical; if the studies are of low methodological quality, merging their results could obscure true effects rather than explain them.

Protocol

A plan or set of steps that defines how something will be done. Before carrying out a research study, for example, the research protocol sets out what question is to be answered and how information will be collected and analysed.

Source

Asdfghj

sdfghjk