26. Numbers analysed, outcomes, and estimation

What to write

Items for each primary and secondary outcome

For each primary and secondary outcome, by group:

  • the number of participants included in the analysis

  • the number of participants with available data at the outcome time point

  • result for each group, and the estimated effect size and its precision (such as 95% CI)

  • for binary outcomes, presentation of both absolute and relative effect size.

Examples

“All principal analyses were based on the intention-to treat (ITT) principle, analysing participants in the groups to which they were randomly assigned irrespective of compliance with treatment allocation.”1

See Table 1,2 Table 2,2 and Table 3.3

Table 1: Example of good reporting: Summary results for each trial group (binary outcomes). Secondary outcomes of arm symptoms, and lymphoedema by treatment group. Data are number (%) unless stated otherwise. DN4=Douleur Neuropathique-4 (positive neuropathic pain=score >3); LBCQ=Lymphoedema and Breast Cancer Questionnaire (positive symptoms=arm both heavy and swollen). Adapted from Bruce et al.2 Numerical rating scale: acute and chronic post-operative pain. Moderate to severe pain=4-10.
Outcome Usual care Exercise Adjusted odds ratio (95% CI)
Acute and chronic postoperative pain*
Moderate to severe, 6 weeks 46/150 (31) 28/153 (18) 1.90 (1.02 to 3.52)
Moderate to severe, 6 months 30/133 (23) 25/145 (17) 1.42 (0.72 to 2.84)
Moderate to severe, 12 months 43/139 (31) 22/135 (16) 2.41 (1.24 to 4.70)
Neuropathic pain, DN4 positive
6 weeks 21/150 (14) 24/153 (16) 0.73 (0.22 to 2.45)
6 months 29/133 (22) 26/145 (18) 1.64 (0.63 to 4.23)
12 months 32/139 (23) 22/135 (16) 1.29 (0.45 to 3.69)
Lymphoedema, LBCQ
6 weeks 20/150 (13) 22/153 (14) 1.07 (0.52 to 2.24)
6 months 32/133 (24) 29/145 (20) 0.82 (0.43 to 1.56)
12 months 36/139 (26) 33/135 (24) 1.17 (0.62 to 2.23)
Table 2: Example of good reporting: Summary results for each trial group (continuous outcomes). Disability of Arm, Shoulder, Hand (DASH) scores by treatment group. Scores adjusted for age, baseline DASH, breast surgery, axillary surgery, radiotherapy, and chemotherapy. Mean differences in upper limb function favour exercise intervention. Adapted from Bruce et al.2 CI=confidence interval; ITT=intention to treat; SD=standard deviation. Absolute mean difference between treatment groups.
Time point, analysis Usual care Exercise Difference between groups (95% CI)*
No Mean (SD) No Mean (SD) Unadjusted Adjusted
6 months, ITT 125 20.8 (20.1) 134 18.0 (17.1) 2.76 (−1.79 to 7.31) 4.60 (0.30 to 8.90)
12 months, ITT (primary outcome) 138 23.7 (22.9) 132 16.3 (17.6) 7.34 (2.44 to 12.23) 7.81 (3.17 to 12.44)
Table 3: Example of good reporting: Absolute and relative effect sizes. Data are % (number) of participants, unless stated otherwise. Adapted from Sola et al.3 “The risk of oxygen dependence or death was reduced by 16% (95% CI 25% to 7%). The absolute difference was −6.3% (95% CI −9.9% to −2.7%); early administration to an estimated 16 babies would therefore prevent 1 baby dying or being long-term dependent on oxygen.”3 CI=confidence interval.
Primary outcome Early administration (%) (n=1344) Delayed selective administration (%) (n=1346) Risk ratio
(95% CI)
Risk difference (95% CI)
Death or oxygen dependence at “expected date of delivery” 31.9 (429) 38.2 (514) 0.84 (0.75 to 0.93) −6.3 (−9.9 to −2.7)

Explanation

For each primary and secondary outcome, the number of participants included in each group is an essential element of the analyses. Although the flow diagram (item 22a) should indicate the numbers of participants included in the analysis of the primary outcome, the number of participants with available data will often vary for different outcomes and at different time points.

Missing data can introduce potential bias through different types of participants being included in each treatment group. It can also reduce, through loss of information, the power to detect a difference between treatment groups if one exists (item 21c) and reduce the generalisability of the trial findings.4 It is therefore important to report the number of participants with available data for each primary and secondary outcome and at each timepoint. Where possible, it is also important to report the reason data were not available, for example, if the participant did not attend follow-up appointments, or if data were truncated because the participant died.4 The extent and causes of missing data can vary. For example, a systematic review of palliative care trials estimated that 23% of primary outcome data were not available5; this compares to a recent review of trials published in four top general medical journals where the median percentage of participants with a missing outcome was around 9%.6

Trial results are often more clearly displayed in a table rather than in the text, as shown in Table 1 and Table 2. For each outcome, results should be reported as a summary of the outcome in each group (eg, the number of participants included in the analysis with or without the event and the denominators, or the mean and standard deviation of measurements), together with the contrast between the groups, known as the effect size. For binary outcomes, the effect size could be the risk ratio (relative risk), odds ratio, or risk difference; for survival time data, it could be the hazard ratio or difference in median survival time; and for continuous data, it is usually the difference in means.

For all outcomes, authors should provide a CI to indicate the precision (uncertainty) of the estimated effect size.7,8 A 95% CI is conventional, but occasionally other levels are used. Most journals require or strongly encourage the use of CIs.9 They are especially valuable in relation to differences that do not meet conventional statistical significance, for which they often indicate that the result does not rule out an important clinical difference. The use of CIs has increased markedly in recent years, although not in all medical specialties.10 A common error is the presentation of separate CIs for the outcome in each group rather than for the treatment effect.10 Although P values may be provided in addition to CIs, results should not be reported solely as P values.11,12 Results should be reported for all planned primary and secondary outcomes and at each time point, not just for analyses that were statistically significant or thought to be interesting. Selective reporting within studies is a widespread and serious problem.1314

When the primary outcome is binary, both the relative effect (risk ratio (relative risk) or odds ratio), and the absolute effect (risk difference) should be reported (with CIs) (Table 3), as neither the relative measure nor the absolute measure alone gives a complete picture of the effect and its implications. Different audiences may prefer either relative or absolute risk, but both clinicians and lay people tend to overestimate the effect when it is presented solely in terms of relative risk.1516,17 The magnitude of the risk difference is less generalisable to other populations than the relative risk since it depends on the baseline risk in the unexposed group, which tends to vary across populations. For diseases where the outcome is common, a relative risk near unity might nonetheless indicate clinically important differences in public health terms. In contrast, a large relative risk when the outcome is rare may not be so important for public health (although it may be important to an individual in a high risk category). For both binary and survival time data, expressing the results also as the number needed to treat for benefit or harm can be helpful.1819

Training

The UK EQUATOR Centre runs training on how to write using reporting guidelines.

Discuss this item

Visit this items’ discussion page to ask questions and give feedback.

References

1.
Beard DJ, Davies L, Cook JA, et al. Rehabilitation versus surgical reconstruction for non-acute anterior cruciate ligament injury (ACL SNNAP): A pragmatic randomised controlled trial. The Lancet. 2022;400(10352):605-615. doi:10.1016/s0140-6736(22)01424-6
2.
Bruce J, Mazuquin B, Canaway A, et al. Exercise versus usual care after non-reconstructive breast cancer surgery (UK PROSPER): Multicentre randomised controlled trial and economic evaluation. BMJ. Published online November 2021:e066542. doi:10.1136/bmj-2021-066542
3.
Group TOC. Early versus delayed neonatal administration of a synthetic surfactant — the judgment of OSIRIS. The Lancet. 1992;340(8832):1363-1369. doi:10.1016/0140-6736(92)92557-v
4.
Hussain JA, White IR, Johnson MJ, et al. Development of guidelines to reduce, handle and report missing data in palliative care trials: A multi-stakeholder modified nominal group technique. Palliative Medicine. 2022;36(1):59-70. doi:10.1177/02692163211065597
5.
Hussain JA, White IR, Langan D, et al. Missing data in randomized controlled trials testing palliative interventions pose a significant risk of bias and loss of power: A systematic review and meta-analyses. Journal of Clinical Epidemiology. 2016;74:57-65. doi:10.1016/j.jclinepi.2015.12.003
6.
Bell ML, Fiero M, Horton NJ, Hsu CH. Handling missing data in RCTs; a review of the top medical journals. BMC Medical Research Methodology. 2014;14(1). doi:10.1186/1471-2288-14-118
7.
Lang TA, Secic M. How to report statistics in medicine: Annotated guidelines for authors. The Nurse Practitioner. 1997;22(5):198. doi:10.1097/00006205-199705000-00022
8.
Altman DG . Clinical trials and meta-analyses. In: Altman DG machin d bryant DM gardner MJ , eds. Statistics with confidence. 2nd ed. BMJ books, 2000: 120-38.
9.
Annals of Internal Medicine. 1997;126(1):36-47. doi:10.7326/0003-4819-126-1-199701010-00006
10.
Altman DG . Confidence intervals in practice. In: Altman DG machin d bryant TN gardner MJ , eds. Statistics with confidence. 2nd ed. BMJ books, 2000: 6-14.
11.
Gardner MJ, Altman DG. Confidence intervals rather than p values: Estimation rather than hypothesis testing. BMJ. 1986;292(6522):746-750. doi:10.1136/bmj.292.6522.746
12.
BAILAR JC, MOSTELLER F. Guidelines for statistical reporting in articles for medical journals: Amplifications and explanations. Annals of Internal Medicine. 1988;108(2):266-273. doi:10.7326/0003-4819-108-2-266
13.
Chan AW, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles. JAMA. 2004;291(20):2457. doi:10.1001/jama.291.20.2457
14.
Dwan K, Altman DG, Arnaiz JA, et al. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. Siegfried N, ed. PLoS ONE. 2008;3(8):e3081. doi:10.1371/journal.pone.0003081
15.
Sorensen L, Gyrd-Hansen D, Kristiansen IS, Nexøe J, Nielsen JB. Laypersons’ understanding of relative risk reductions: Randomised cross-sectional study. BMC Medical Informatics and Decision Making. 2008;8(1). doi:10.1186/1472-6947-8-31
16.
Bobbio M, Demichelis B, Giustetto G. Completeness of reporting trial results: Effect on physicians’ willingness to prescribe. The Lancet. 1994;343(8907):1209-1211. doi:10.1016/s0140-6736(94)92407-4
17.
Naylor CD, Chen E, Strauss B. Measured enthusiasm: Does the method of reporting trial results alter perceptions of therapeutic effectiveness? Annals of Internal Medicine. 1992;117(11):916-921. doi:10.7326/0003-4819-117-11-916
18.
Cook RJ, Sackett DL. The number needed to treat: A clinically useful measure of treatment effect. BMJ. 1995;310(6977):452-454. doi:10.1136/bmj.310.6977.452
19.
Altman DG, Andersen PK. Calculating the number needed to treat for trials where the outcome is time to an event. BMJ. 1999;319(7223):1492-1495. doi:10.1136/bmj.319.7223.1492

Reuse

Most of the reporting guidelines and checklists on this website were originally published under permissive licenses that allowed their reuse. Some were published with propriety licenses, where copyright is held by the publisher and/or original authors. The original content of the reporting checklists and explanation pages on this website were drawn from these publications with knowledge and permission from the reporting guideline authors, and subsequently revised in response to feedback and evidence from research as part of an ongoing scholarly dialogue about how best to disseminate reporting guidance. The UK EQUATOR Centre makes no copyright claims over reporting guideline content. Our use of copyrighted content on this website falls under fair use guidelines.

Citation

For attribution, please cite this work as:
Hopewell S, Chan AW, Collins GS, et al. CONSORT 2025 statement: updated guideline for reporting randomised trials. BMJ. 2025;389:e081123. doi:10.1136/bmj-2024-081123

Reporting Guidelines are recommendations to help describe your work clearly

Your research will be used by people from different disciplines and backgrounds for decades to come. Reporting guidelines list the information you should describe so that everyone can understand, replicate, and synthesise your work.

Reporting guidelines do not prescribe how research should be designed or conducted. Rather, they help authors transparently describe what they did, why they did it, and what they found.

Reporting guidelines make writing research easier, and transparent research leads to better patient outcomes.

Easier writing

Following guidance makes writing easier and quicker.

Smoother publishing

Many journals require completed reporting checklists at submission.

Maximum impact

From nobel prizes to null results, articles have more impact when everyone can use them.

Who reads research?

You work will be read by different people, for different reasons, around the world, and for decades to come. Reporting guidelines help you consider all of your potential audiences. For example, your research may be read by researchers from different fields, by clinicians, patients, evidence synthesisers, peer reviewers, or editors. Your readers will need information to understand, to replicate, apply, appraise, synthesise, and use your work.

Cohort studies

A cohort study is an observational study in which a group of people with a particular exposure (e.g. a putative risk factor or protective factor) and a group of people without this exposure are followed over time. The outcomes of the people in the exposed group are compared to the outcomes of the people in the unexposed group to see if the exposure is associated with particular outcomes (e.g. getting cancer or length of life).

Source.

Case-control studies

A case-control study is a research method used in healthcare to investigate potential risk factors for a specific disease. It involves comparing individuals who have been diagnosed with the disease (cases) to those who have not (controls). By analysing the differences between the two groups, researchers can identify factors that may contribute to the development of the disease.

An example would be when researchers conducted a case-control study examining whether exposure to diesel exhaust particles increases the risk of respiratory disease in underground miners. Cases included miners diagnosed with respiratory disease, while controls were miners without respiratory disease. Participants' past occupational exposures to diesel exhaust particles were evaluated to compare exposure rates between cases and controls.

Source.

Cross-sectional studies

A cross-sectional study (also sometimes called a "cross-sectional survey") serves as an observational tool, where researchers capture data from a cohort of participants at a singular point. This approach provides a 'snapshot'— a brief glimpse into the characteristics or outcomes prevalent within a designated population at that precise point in time. The primary aim here is not to track changes or developments over an extended period but to assess and quantify the current situation regarding specific variables or conditions. Such a methodology is instrumental in identifying patterns or correlations among various factors within the population, providing a basis for further, more detailed investigation.

Source

Systematic reviews

A systematic review is a comprehensive approach designed to identify, evaluate, and synthesise all available evidence relevant to a specific research question. In essence, it collects all possible studies related to a given topic and design, and reviews and analyses their results.

The process involves a highly sensitive search strategy to ensure that as much pertinent information as possible is gathered. Once collected, this evidence is often critically appraised to assess its quality and relevance, ensuring that conclusions drawn are based on robust data. Systematic reviews often involve defining inclusion and exclusion criteria, which help to focus the analysis on the most relevant studies, ultimately synthesising the findings into a coherent narrative or statistical synthesis. Some systematic reviews will include a [meta-analysis]{.defined data-bs-toggle="offcanvas" href="#glossaryItemmeta_analyses" aria-controls="offcanvasExample" role="button"}.

Source

Systematic review protocols

TODO

Meta analyses of Observational Studies

TODO

Randomised Trials

A randomised controlled trial (RCT) is a trial in which participants are randomly assigned to one of two or more groups: the experimental group or groups receive the intervention or interventions being tested; the comparison group (control group) receive usual care or no treatment or a placebo. The groups are then followed up to see if there are any differences between the results. This helps in assessing the effectiveness of the intervention.

Source

Randomised Trial Protocols

TODO

Qualitative research

Research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or field observations in order to collect data that is rich in detail and context. Qualitative research is often used to explore complex phenomena or to gain insight into people's experiences and perspectives on a particular topic. It is particularly useful when researchers want to understand the meaning that people attach to their experiences or when they want to uncover the underlying reasons for people's behaviour. Qualitative methods include ethnography, grounded theory, discourse analysis, and interpretative phenomenological analysis.

Source

Case Reports

TODO

Diagnostic Test Accuracy Studies

Diagnostic accuracy studies focus on estimating the ability of the test(s) to correctly identify people with a predefined target condition, or the condition of interest (sensitivity) as well as to clearly identify those without the condition (specificity).

Prediction Models

Prediction model research is used to test the accurarcy of a model or test in estimating an outcome value or risk. Most models estimate the probability of the presence of a particular health condition (diagnostic) or whether a particular outcome will occur in the future (prognostic). Prediction models are used to support clinical decision making, such as whether to refer patients for further testing, monitor disease deterioration or treatment effects, or initiate treatment or lifestyle changes. Examples of well known prediction models include EuroSCORE II for cardiac surgery, the Gail model for breast cancer, the Framingham risk score for cardiovascular disease, IMPACT for traumatic brain injury, and FRAX for osteoporotic and hip fractures.

Source

Animal Research

TODO

Quality Improvement in Healthcare

Quality improvement research is about finding out how to improve and make changes in the most effective way. It is about systematically and rigourously exploring "what works" to improve quality in healthcare and the best ways to measure and disseminate this to ensure positive change. Most quality improvement effectiveness research is conducted in hospital settings, is focused on multiple quality improvement interventions, and uses process measures as outcomes. There is a great deal of variation in the research designs used to examine quality improvement effectiveness.

Source

Economic Evaluations in Healthcare

TODO

Meta Analyses

A meta-analysis is a statistical technique that amalgamates data from multiple studies to yield a single estimate of the effect size. This approach enhances precision and offers a more comprehensive understanding by integrating quantitative findings. Central to a meta-analysis is the evaluation of heterogeneity, which examines variations in study outcomes to ensure that differences in populations, interventions, or methodologies do not skew results. Techniques such as meta-regression or subgroup analysis are frequently employed to explore how various factors might influence the outcomes. This method is particularly effective when aiming to quantify the effect size, odds ratio, or risk ratio, providing a clearer numerical estimate that can significantly inform clinical or policy decisions.

How Meta-analyses and Systematic Reviews Work Together

Systematic reviews and meta-analyses function together, each complementing the other to provide a more robust understanding of research evidence. A systematic review meticulously gathers and evaluates all pertinent studies, establishing a solid foundation of qualitative and quantitative data. Within this framework, if the collected data exhibit sufficient homogeneity, a meta-analysis can be performed. This statistical synthesis allows for the integration of quantitative results from individual studies, producing a unified estimate of effect size. Techniques such as meta-regression or subgroup analysis may further refine these findings, elucidating how different variables impact the overall outcome. By combining these methodologies, researchers can achieve both a comprehensive narrative synthesis and a precise quantitative measure, enhancing the reliability and applicability of their conclusions. This integrated approach ensures that the findings are not only well-rounded but also statistically robust, providing greater confidence in the evidence base.

Why Don't All Systematic Reviews Use a Meta-Analysis?

Systematic reviews do not always have meta-analyses, due to variations in the data. For a meta-analysis to be viable, the data from different studies must be sufficiently similar, or homogeneous, in terms of design, population, and interventions. When the data shows significant heterogeneity, meaning there are considerable differences among the studies, combining them could lead to skewed or misleading conclusions. Furthermore, the quality of the included studies is critical; if the studies are of low methodological quality, merging their results could obscure true effects rather than explain them.

Protocol

A plan or set of steps that defines how something will be done. Before carrying out a research study, for example, the research protocol sets out what question is to be answered and how information will be collected and analysed.

Source

Asdfghj

sdfghjk