21a. Comparing groups

What to write

Statistical methods used to compare groups for primary and secondary outcomes, including harms

Examples

“The primary outcome was analysed using a mixed effects log-binomial model to generate an adjusted risk ratio (RR) and an adjusted risk difference (using an identity link function), including centre as a random effect. Statistical significance of the treatment group parameter was determined (p value generated) through examination of the associated χ2 statistic (obtained from the log-binomial model which produced the RR). Binary secondary outcomes were analysed as per the primary outcome. Time to hCG [human chorionic gonadotrophin] resolution was considered in a competing risk framework to account for participants who had surgical intervention for their ectopic pregnancy. A cumulative incidence function was used to estimate the probability of occurrence (hCG resolution) over time. A Fine and Gray model was then used to estimate a subdistribution adjusted hazard ratio (HR) directly from the cumulative incidence function. In addition, a further Cox proportional hazard model was fitted and applied to the cause-specific (non-surgical resolution) hazard function and used to generate an adjusted HR. Return to menses was analysed using a Cox regression model. Number of hospital visits associated with treatment was analysed using a Poisson regression model, including centre as a random effect to generate an adjusted incidence ratio.”1

“For the primary continuous outcome and secondary outcomes, linear mixed-effect models were used, with outcome measurement (at the two follow-up timepoints) as the dependent variable. The models included fixed effects for timepoint, treatment, timepoint by treatment interactions, the baseline measure of the outcome, and therapist, assuming a linear relationship between baseline and outcome. The dichotomous outcome of recovery in the delusion was analysed using a logistic mixed-effect model. Persecutory delusion conviction was analysed as a continuous and also as a dichotomous (recovery) variable. The models included a random intercept for participant, an unstructured correlation matrix for the residuals, and were fitted using restricted maximum likelihood estimation . . . For each outcome and timepoint, we report the treatment effect estimate as the adjusted mean difference between groups, its SE [standard error], 95% CIs [confidence intervals], and p value. In addition, we report estimates for Cohen’s d effect sizes as the adjusted mean difference of the outcome (between the groups) divided by the sample SD [standard deviation] of the outcome at baseline.”2

“Analyses followed a prespecified statistical analysis plan. The primary outcome (ODQ [Oswestry Disability Questionnaire] score at 18 weeks after randomisation) was compared between groups with a linear regression model, adjusted for baseline ODQ, with centre as a random effect. ODQ score, visual analogue scores (VAS) for back pain, VAS for leg pain, MRM [modified Roland-Morris] outcome score, and COMI [Core Outcome Measures Index] score at all follow-up visits were analysed with a repeated measures mixed-effects model adjusting for baseline outcome measure, treatment group, time (as a continuous variable), and a time-treatment arm interaction (if significant). Centre and participant were random effects in the repeated measures models. A second model adjusted for other prespecified variables, age, sex, duration of symptoms, body-mass index, and size of disc prolapse (as a percentage of the diameter of the spinal canal, categorised as <25%, 25–50%, or >50%).”3

“We analysed the primary outcome (between-group difference in the SPPB [short physical performance battery] at 12 months) using linear mixed models, adjusted for baseline measurements, minimisation variables (age, sex and CKD [chronic kidney disease] category) and a random effect variable for recruitment site. We analysed secondary outcomes using repeated measures mixed models, including all participants and including data from all available timepoints. Models were adjusted for baseline values and the minimisation variables. We conducted time-to-event analyses (time to death, time to commencing renal replacement therapy) using Cox proportional hazards models adjusted for minimisation variables. All participants were included in these analyses, with participants censored at the point of dropout or truncation of follow-up for those not reaching the analysis endpoint before 24 months. For all analyses, we took a two-sided p value of < 0.05 as significant with no adjustment for multiple testing.”4

Explanation

Various methods can be used to analyse data, and it is crucial to ensure that the chosen approach is suitable for the specific context. Specifying the statistical procedures and software used for each analysis is essential, and additional clarification may be required in the results section of the report. Authors should describe the statistical methods insufficient detail to allow a knowledgeable reader with access to the original data to verify the reported results, as emphasised by the ICMJE (https://www.icmje.org/). It is also important to elaborate on specific aspects of the statistical analysis, such as the intention-to-treat approach.

Details of all statistical analyses are frequently prespecified in a statistical analysis plan, a document that accompanies the trial protocol. In the report of the trial results, authors should detail and justify any deviation from the statistical analysis plan or from the protocol if no statistical analysis plan was developed. They should clarify which analyses were prespecified and which were post hoc.

Most analysis approaches provide an estimate of the treatment effect, representing the difference in outcomes between comparison groups, and authors should also indicate the effect measure (eg, absolute risk) considered. Authors should accompany this with a CI for the estimated effect, delineating a central range of uncertainty regarding the actual treatment effect. The CI may be interpreted as the range of values for the treatment effect that is compatible with the observed data. Typically, a 95% CI is presented, signifying the range anticipated to encompass the true value in 95 of 100 similar studies.

Study findings can also be assessed in terms of their statistical significance. The P value represents the probability that the observed data (or a more extreme result) could have arisen by chance when the interventions did not truly differ. The statistical significance level that will be used should be reported. In the results section, actual P values (for example, P=0.001) are strongly preferable to imprecise threshold reports such as P<0.05.5,6

Some trials may use bayesian methods.7910 In this case, the choices of priors, computational decisions, and any modelling methods used should be described. Most bayesian trials so far have been for early phases of drug development, but this approach can be applicable to any phase. Typically, results are presented as treatment effects along with credible intervals.

Where an analysis lacks statistical power (eg, harms outcomes), authors may prefer descriptive approaches over formal statistical analysis.

While the necessity for covariate adjustments is generally reduced in randomised trials compared with epidemiological studies, considering an adjusted analysis can have value in terms of increased power and precision, particularly if there is an indication that one or more variables may have prognostic value.11 It is preferable for adjusted analyses to be explicitly outlined in the study protocol (item 3). For instance, it is often advisable to make adjustments for stratification variables,12 in keeping with the principle that the analysis strategy should align with the study design. In the context of randomised trials, the decision to make adjustments should not be based on whether there are baseline covariates that are statistically significantly different between randomised groups. The testing of baseline imbalance in covariates should be avoided,11 as if randomisation is properly conducted, then by definition, any differences in baseline covariates between treatment arms are random. The rationale for any adjusted analyses and the statistical methods used should be specified, along with clarifying the choice of covariates that were adjusted for, indicating how continuous variables were handled (eg, linear, modelled with splines),13 and specifying whether the analysis was planned or post hoc. Reviews of published studies show that reporting of adjusted analyses is inadequate with regard to all of these aspects.14,15

Multiplicity issues are prevalent in trials and merit special consideration, especially in cases involving multiple primary outcomes, multiple time points stemming from repeated assessments of an outcome, multiple planned analyses for an outcome (such as interim or subgroup analyses (item 21d)), or analyses of numerous secondary outcomes (see CONSORT outcomes extension for more details).16 Any methods used to mitigate or account for multiplicity should be described. If no methods have been used to account for multiplicity (eg, not applicable, or not considered), then this should also be reported, particularly when a large number of analyses has been carried out.

Training

The UK EQUATOR Centre runs training on how to write using reporting guidelines.

Discuss this item

Visit this items’ discussion page to ask questions and give feedback.

References

1.
Horne AW, Tong S, Moakes CA, et al. Combination of gefitinib and methotrexate to treat tubal ectopic pregnancy (GEM3): A multicentre, randomised, double-blind, placebo-controlled trial. The Lancet. 2023;401(10377):655-663. doi:10.1016/s0140-6736(22)02478-3
2.
Freeman D, Emsley R, Diamond R, et al. Comparison of a theoretically driven cognitive therapy (the feeling safe programme) with befriending for the treatment of persistent persecutory delusions: A parallel, single-blind, randomised controlled trial. The Lancet Psychiatry. 2021;8(8):696-707. doi:10.1016/s2215-0366(21)00158-9
3.
Wilby MJ, Best A, Wood E, et al. Surgical microdiscectomy versus transforaminal epidural steroid injection in patients with sciatica secondary to herniated lumbar disc (NERVES): A phase 3, multicentre, open-label, randomised controlled trial and economic evaluation. The Lancet Rheumatology. 2021;3(5):e347-e356. doi:10.1016/s2665-9913(21)00036-9
4.
Clinical and cost-effectiveness of oral sodium bicarbonate therapy for older patients with chronic kidney disease and low-grade acidosis (BiCARB): A pragmatic randomised, double-blind, placebo-controlled trial. BMC Medicine. 2020;18(1). doi:10.1186/s12916-020-01542-9
5.
Lang TA, Secic M. How to report statistics in medicine: Annotated guidelines for authors. The Nurse Practitioner. 1997;22(5):198. doi:10.1097/00006205-199705000-00022
6.
Altman DG gore SM gardner MJ pocock SJ . Statistical guidelines for contributors to medical journals. In: Altman DG machin d bryant TN gardner MJ , eds. Statistics with confidence: Confidence intervals and statistical guidelines. 2nd ed. BMJ books, 2000: 171-90.
7.
Berry DA. Interim analyses in clinical trials: Classical vs. Bayesian approaches. Statistics in Medicine. 1985;4(4):521-526. doi:10.1002/sim.4780040412
8.
Bray R, Hartley A, Wenkert D, et al. Why are there not more bayesian clinical trials? Ability to interpret bayesian and conventional statistics among medical researchers. Therapeutic Innovation &amp; Regulatory Science. 2022;57(3):426-435. doi:10.1007/s43441-022-00482-1
9.
Fors M, González P. Current status of bayesian clinical trials for oncology, 2020. Contemporary Clinical Trials Communications. 2020;20:100658. doi:10.1016/j.conctc.2020.100658
10.
Jack Lee J, Chu CT. Bayesian clinical trials in action. Statistics in Medicine. 2012;31(25):2955-2972. doi:10.1002/sim.5404
11.
Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: An assessment of 12 outcomes from 8 studies. Trials. 2014;15(1). doi:10.1186/1745-6215-15-139
12.
Sullivan TR, Morris TP, Kahan BC, Cuthbert AR, Yelland LN. Categorisation of continuous covariates for stratified randomisation: How should we adjust? Statistics in Medicine. 2024;43(11):2083-2095. doi:10.1002/sim.10060
13.
Kahan BC, Rushton H, Morris TP, Daniel RM. A comparison of methods to adjust for continuous covariates in the analysis of randomised trials. BMC Medical Research Methodology. 2016;16(1). doi:10.1186/s12874-016-0141-3
14.
Yu LM, Chan AW, Hopewell S, Deeks JJ, Altman DG. Reporting on covariate adjustment in randomised controlled trials before and after revision of the 2001 CONSORT statement: A literature review. Trials. 2010;11(1). doi:10.1186/1745-6215-11-59
15.
Saquib N, Saquib J, Ioannidis JPA. Practices and impact of primary outcome adjustment in randomized controlled trials: Meta-epidemiologic study. BMJ. 2013;347(jul12 2):f4313-f4313. doi:10.1136/bmj.f4313
16.
Butcher NJ, Monsour A, Mew EJ, et al. Guidelines for reporting outcomes in trial reports: The CONSORT-outcomes 2022 extension. JAMA. 2022;328(22):2252. doi:10.1001/jama.2022.21022

Reuse

Most of the reporting guidelines and checklists on this website were originally published under permissive licenses that allowed their reuse. Some were published with propriety licenses, where copyright is held by the publisher and/or original authors. The original content of the reporting checklists and explanation pages on this website were drawn from these publications with knowledge and permission from the reporting guideline authors, and subsequently revised in response to feedback and evidence from research as part of an ongoing scholarly dialogue about how best to disseminate reporting guidance. The UK EQUATOR Centre makes no copyright claims over reporting guideline content. Our use of copyrighted content on this website falls under fair use guidelines.

Citation

For attribution, please cite this work as:
Hopewell S, Chan AW, Collins GS, et al. CONSORT 2025 statement: updated guideline for reporting randomised trials. BMJ. 2025;389:e081123. doi:10.1136/bmj-2024-081123

Reporting Guidelines are recommendations to help describe your work clearly

Your research will be used by people from different disciplines and backgrounds for decades to come. Reporting guidelines list the information you should describe so that everyone can understand, replicate, and synthesise your work.

Reporting guidelines do not prescribe how research should be designed or conducted. Rather, they help authors transparently describe what they did, why they did it, and what they found.

Reporting guidelines make writing research easier, and transparent research leads to better patient outcomes.

Easier writing

Following guidance makes writing easier and quicker.

Smoother publishing

Many journals require completed reporting checklists at submission.

Maximum impact

From nobel prizes to null results, articles have more impact when everyone can use them.

Who reads research?

You work will be read by different people, for different reasons, around the world, and for decades to come. Reporting guidelines help you consider all of your potential audiences. For example, your research may be read by researchers from different fields, by clinicians, patients, evidence synthesisers, peer reviewers, or editors. Your readers will need information to understand, to replicate, apply, appraise, synthesise, and use your work.

Cohort studies

A cohort study is an observational study in which a group of people with a particular exposure (e.g. a putative risk factor or protective factor) and a group of people without this exposure are followed over time. The outcomes of the people in the exposed group are compared to the outcomes of the people in the unexposed group to see if the exposure is associated with particular outcomes (e.g. getting cancer or length of life).

Source.

Case-control studies

A case-control study is a research method used in healthcare to investigate potential risk factors for a specific disease. It involves comparing individuals who have been diagnosed with the disease (cases) to those who have not (controls). By analysing the differences between the two groups, researchers can identify factors that may contribute to the development of the disease.

An example would be when researchers conducted a case-control study examining whether exposure to diesel exhaust particles increases the risk of respiratory disease in underground miners. Cases included miners diagnosed with respiratory disease, while controls were miners without respiratory disease. Participants' past occupational exposures to diesel exhaust particles were evaluated to compare exposure rates between cases and controls.

Source.

Cross-sectional studies

A cross-sectional study (also sometimes called a "cross-sectional survey") serves as an observational tool, where researchers capture data from a cohort of participants at a singular point. This approach provides a 'snapshot'— a brief glimpse into the characteristics or outcomes prevalent within a designated population at that precise point in time. The primary aim here is not to track changes or developments over an extended period but to assess and quantify the current situation regarding specific variables or conditions. Such a methodology is instrumental in identifying patterns or correlations among various factors within the population, providing a basis for further, more detailed investigation.

Source

Systematic reviews

A systematic review is a comprehensive approach designed to identify, evaluate, and synthesise all available evidence relevant to a specific research question. In essence, it collects all possible studies related to a given topic and design, and reviews and analyses their results.

The process involves a highly sensitive search strategy to ensure that as much pertinent information as possible is gathered. Once collected, this evidence is often critically appraised to assess its quality and relevance, ensuring that conclusions drawn are based on robust data. Systematic reviews often involve defining inclusion and exclusion criteria, which help to focus the analysis on the most relevant studies, ultimately synthesising the findings into a coherent narrative or statistical synthesis. Some systematic reviews will include a [meta-analysis]{.defined data-bs-toggle="offcanvas" href="#glossaryItemmeta_analyses" aria-controls="offcanvasExample" role="button"}.

Source

Systematic review protocols

TODO

Meta analyses of Observational Studies

TODO

Randomised Trials

A randomised controlled trial (RCT) is a trial in which participants are randomly assigned to one of two or more groups: the experimental group or groups receive the intervention or interventions being tested; the comparison group (control group) receive usual care or no treatment or a placebo. The groups are then followed up to see if there are any differences between the results. This helps in assessing the effectiveness of the intervention.

Source

Randomised Trial Protocols

TODO

Qualitative research

Research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or field observations in order to collect data that is rich in detail and context. Qualitative research is often used to explore complex phenomena or to gain insight into people's experiences and perspectives on a particular topic. It is particularly useful when researchers want to understand the meaning that people attach to their experiences or when they want to uncover the underlying reasons for people's behaviour. Qualitative methods include ethnography, grounded theory, discourse analysis, and interpretative phenomenological analysis.

Source

Case Reports

TODO

Diagnostic Test Accuracy Studies

Diagnostic accuracy studies focus on estimating the ability of the test(s) to correctly identify people with a predefined target condition, or the condition of interest (sensitivity) as well as to clearly identify those without the condition (specificity).

Prediction Models

Prediction model research is used to test the accurarcy of a model or test in estimating an outcome value or risk. Most models estimate the probability of the presence of a particular health condition (diagnostic) or whether a particular outcome will occur in the future (prognostic). Prediction models are used to support clinical decision making, such as whether to refer patients for further testing, monitor disease deterioration or treatment effects, or initiate treatment or lifestyle changes. Examples of well known prediction models include EuroSCORE II for cardiac surgery, the Gail model for breast cancer, the Framingham risk score for cardiovascular disease, IMPACT for traumatic brain injury, and FRAX for osteoporotic and hip fractures.

Source

Animal Research

TODO

Quality Improvement in Healthcare

Quality improvement research is about finding out how to improve and make changes in the most effective way. It is about systematically and rigourously exploring "what works" to improve quality in healthcare and the best ways to measure and disseminate this to ensure positive change. Most quality improvement effectiveness research is conducted in hospital settings, is focused on multiple quality improvement interventions, and uses process measures as outcomes. There is a great deal of variation in the research designs used to examine quality improvement effectiveness.

Source

Economic Evaluations in Healthcare

TODO

Meta Analyses

A meta-analysis is a statistical technique that amalgamates data from multiple studies to yield a single estimate of the effect size. This approach enhances precision and offers a more comprehensive understanding by integrating quantitative findings. Central to a meta-analysis is the evaluation of heterogeneity, which examines variations in study outcomes to ensure that differences in populations, interventions, or methodologies do not skew results. Techniques such as meta-regression or subgroup analysis are frequently employed to explore how various factors might influence the outcomes. This method is particularly effective when aiming to quantify the effect size, odds ratio, or risk ratio, providing a clearer numerical estimate that can significantly inform clinical or policy decisions.

How Meta-analyses and Systematic Reviews Work Together

Systematic reviews and meta-analyses function together, each complementing the other to provide a more robust understanding of research evidence. A systematic review meticulously gathers and evaluates all pertinent studies, establishing a solid foundation of qualitative and quantitative data. Within this framework, if the collected data exhibit sufficient homogeneity, a meta-analysis can be performed. This statistical synthesis allows for the integration of quantitative results from individual studies, producing a unified estimate of effect size. Techniques such as meta-regression or subgroup analysis may further refine these findings, elucidating how different variables impact the overall outcome. By combining these methodologies, researchers can achieve both a comprehensive narrative synthesis and a precise quantitative measure, enhancing the reliability and applicability of their conclusions. This integrated approach ensures that the findings are not only well-rounded but also statistically robust, providing greater confidence in the evidence base.

Why Don't All Systematic Reviews Use a Meta-Analysis?

Systematic reviews do not always have meta-analyses, due to variations in the data. For a meta-analysis to be viable, the data from different studies must be sufficiently similar, or homogeneous, in terms of design, population, and interventions. When the data shows significant heterogeneity, meaning there are considerable differences among the studies, combining them could lead to skewed or misleading conclusions. Furthermore, the quality of the included studies is critical; if the studies are of low methodological quality, merging their results could obscure true effects rather than explain them.

Protocol

A plan or set of steps that defines how something will be done. Before carrying out a research study, for example, the research protocol sets out what question is to be answered and how information will be collected and analysed.

Source

Asdfghj

sdfghjk