16a. How sample size was determined

What to write

How sample size was determined, including all assumptions supporting the sample size calculation

Examples

“We expected an improvement in PFS [progression free survival], in favor of avelumab, with a hazard ratio (HR) of 0.58. Considering a fixed design with a 2-sided α risk of 5% and a power of 80%, 106 events (progression or death) are needed to demonstrate this difference based on the Schoenfeld method. With an estimated recruitment rate of 3 patients per month, a follow-up period for each patient of 24 months, and a percentage of patients lost to follow-up or not evaluable of 15%, 132 patients had to be randomized, and we planned to enroll a total of 66 patients per group.”1

“The target sample size was 300 (150 per arm) over a 3-and-a-half-year recruitment period. This was based on an assumed proportion of individuals with clinically meaningful improvement in VA [visual acuity] (>10 letters) of 55% in the standard care arm and a 19% increase in the adjunct group to 75%, with approximately 7% loss to follow-up, at least 90% power and two-sided 5% type 1 error.”2

“In order to detect a minimum clinically important difference (MCID) in mean volume of daily PA [physical activity] of 2.1 m g [milligravity] at 12 months, and assuming a standard deviation (SD) of 5.3 m g, power of 80%, and a statistical significance level of 5%, a total of 202 participants were required. Allowing for 20% loss to follow-up and 20% non-compliance of accelerometer/intervention attendance meant that at least 338 participants were required (169 per group). The value of 2.1 m g was chosen as it represents an increase in PA that is equivalent to walking at the threshold between light intensity and moderate intensity (for example, 4 km per hour) for 30 min per day or 10–15 min of brisk walking per day.”3

“Sample size was based on the primary outcome measure, HOS ADL [hip outcome score activities of daily living subscale] at eight months post-randomisation, and was calculated using a minimum clinically important difference between groups of 9 points. We estimated the standard deviation to be 14 points; however, summaries presented at a planned interim data monitoring meeting found that the standard deviation was 18 points. A revised calculation (significance level 5%, power 90%, loss to follow-up 20%) gave a sample size of 214 (107 participants in each group). The data monitoring committee approved the sample size increase from 120 to 214 participants.”4

Explanation

Sample size calculations are a key design component for a trial and need careful planning. Sample size calculations need to balance ethical and logistical considerations alongside medical and statistical considerations so that the scientific question can be reliably and precisely answered in a timely manner without unnecessarily exposing individuals to ineffective or harmful interventions. They are generally based on one primary outcome. A trial should therefore be sufficiently large to have a high probability (power) of identifying a clinically important difference of a prespecified size that meets a criterion of statistical significance, if such a difference exists. The magnitude of the effect has an inverse relationship with the sample size required for its detection; that is, larger sample sizes are needed to detect smaller differences. Moreover, the inverse relationship is not linear: very small differences require enormous sample sizes to have good power to detect.

All details on how the sample size was determined should be reported to allow replication (in principle). Elements of the sample size calculation that need to be specified are the primary outcome (and time point) on which the calculation was based (item 14); the anticipated values for the outcome in each trial group (which implies the clinically important target difference between the intervention groups) at a specific time point with rationale or provenance of all quantities, including any relevant citations; or continuous outcomes, the standard deviation of the measurements5; the statistical test; the α (type I error) value and whether it is two sided; the statistical power (or the β (type II error) value); and the resulting target sample size per trial group (see the box below). Details should be given of any inflation of the sample size made for attrition or non-adherence during the study. Reference to any formulas or software packages used for the sample size calculation should all be reported. The reporting will have additional considerations for crossover trials,6 factorial trials,7 cluster trials,8 multi-arm trials,9 within-person trials,10 and non-inferiority and equivalence trials.11

DELTA12 recommended reporting items for the sample size calculation of a randomised controlled trial with a superiority question*13

Core items

  1. Primary outcome (and any other outcome on which the calculation is based). If a primary outcome is not used as the basis for the sample size calculation, state why

  2. Statistical significance level and power

  3. Express the target difference according to outcome type

    1. Binary—state the target difference as an absolute or relative effect (or both), along with the intervention and control group proportions. If both an absolute and a relative difference are provided, clarify if either takes primacy in terms of the sample size calculation

    2. Continuous—state the target mean difference on the natural scale, common standard deviation, and standardised effect size (mean difference divided by the standard deviation)

    3. Time to event—state the target difference as an absolute or relative difference (or both); provide the control group event proportion, planned length of follow-up, intervention and control group survival distributions, and accrual time (if assumptions regarding these values are made). If both an absolute and relative difference are provided for a particular time point, clarify if either takes primacy in terms of the sample size calculation

  4. Allocation ratio. If an unequal ratio is used, the reason for this should be stated

  5. Sample size based on the assumptions as per above

    1. Reference the formula/sample size calculation approach, if standard binary, continuous, or survival outcome formulas are not used. For a time-to-event outcome, the number of events required should be stated

    2. If any adjustments (eg, allowance for loss to follow-up, multiple testing) that alter the required sample size are incorporated, they should also be specified, referenced, and justified along with the final sample size

    3. For alternative designs, additional input should be stated and justified

    4. Provide details of any assessment of the sensitivity of the sample size to the inputs used

Additional items for grant application and trial protocol

  1. Underlying basis used for specifying the target difference (an important or realistic difference)

  2. Explain the choice of target difference—specify and reference any formal method used or relevant previous research

Additional item for trial results paper

  1. Reference the trial protocol.

Transparency in the sample size reveals the power of the trial to readers and gives them a measure by which to assess whether the trial attained its planned size. Any differences in the planned sample size described in the trial registration (item 2), study protocol (item 3), or statistical analysis plan should be explained.

Interim analyses are used in some trials to help decide whether to stop early or to continue recruiting sometimes beyond the planned trial end (item 16b). If the actual sample size differed from the originally intended sample size for some other reason (eg, because of poor recruitment or revision of the target sample size), an explanation should be given alongside details of the revised sample size. Many reviews have found that few authors report how they determined the sample size.141920,21

There is no value in conducting and reporting a post hoc calculation of statistical power using the results of a trial, for example, as a pretext to explain non-significant findings; this may even mislead and confuse readers.22,23

Training

The UK EQUATOR Centre runs training on how to write using reporting guidelines.

Discuss this item

Visit this items’ discussion page to ask questions and give feedback.

References

1.
Taïeb J, Bouche O, André T, et al. Avelumab vs standard second-line chemotherapy in patients with metastatic colorectal cancer and microsatellite instability: A randomized clinical trial. JAMA Oncology. 2023;9(10):1356. doi:10.1001/jamaoncol.2023.2761
2.
Casswell EJ, Cro S, Cornelius VR, et al. Randomised controlled trial of adjunctive triamcinolone acetonide in eyes undergoing vitreoretinal surgery following open globe trauma: The ASCOT study. British Journal of Ophthalmology. 2023;108(3):440-448. doi:10.1136/bjo-2022-322787
3.
Khunti K, Highton PJ, Waheed G, et al. Promoting physical activity with self-management support for those with multimorbidity: A randomised controlled trial. British Journal of General Practice. 2021;71(713):e921-e930. doi:10.3399/bjgp.2021.0172
4.
Palmer AJR, Ayyar Gupta V, Fernquest S, et al. Arthroscopic hip surgery compared with physiotherapy and activity modification for the treatment of symptomatic femoroacetabular impingement: Multicentre randomised controlled trial. BMJ. Published online February 2019:l185. doi:10.1136/bmj.l185
5.
Campbell MJ, Julious SA, Altman DG. Estimating sample sizes for binary, ordered categorical, and continuous outcomes in two group comparisons. BMJ. 1995;311(7013):1145-1148. doi:10.1136/bmj.311.7013.1145
6.
Dwan K, Li T, Altman DG, Elbourne D. CONSORT 2010 statement: Extension to randomised crossover trials. BMJ. Published online July 2019:l4378. doi:10.1136/bmj.l4378
7.
Kahan BC, Hall SS, Beller EM, et al. Reporting of factorial randomized trials: Extension of the CONSORT 2010 statement. JAMA. 2023;330(21):2106. doi:10.1001/jama.2023.19793
8.
Campbell MK, Piaggio G, Elbourne DR, Altman DG. Consort 2010 statement: Extension to cluster randomised trials. BMJ. 2012;345(sep04 1):e5661-e5661. doi:10.1136/bmj.e5661
9.
Juszczak E, Altman DG, Hopewell S, Schulz K. Reporting of multi-arm parallel-group randomized trials: Extension of the CONSORT 2010 statement. JAMA. 2019;321(16):1610. doi:10.1001/jama.2019.3087
10.
Pandis N, Chung B, Scherer RW, Elbourne D, Altman DG. CONSORT 2010 statement: Extension checklist for reporting within person randomised trials. BMJ. Published online June 2017:j2835. doi:10.1136/bmj.j2835
11.
Piaggio G, Elbourne DR, Pocock SJ, Evans SJW, Altman DG, CONSORT Group for the. Reporting of noninferiority and equivalence randomized trials: Extension of the CONSORT 2010 statement. JAMA. 2012;308(24):2594. doi:10.1001/jama.2012.87802
12.
Glasziou P, Altman DG, Bossuyt P, et al. Reducing waste from incomplete or unusable reports of biomedical research. The Lancet. 2014;383(9913):267-276. doi:10.1016/s0140-6736(13)62228-x
13.
Cook JA, Julious SA, Sones W, et al. DELTA2 guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. BMJ. Published online November 2018:k3750. doi:10.1136/bmj.k3750
14.
Bridgman AC, McPhie ML, Voineskos SH, Chan AW, Drucker AM. Reporting of primary outcome measures and sample size calculations in randomized controlled trials in dermatology journals. Journal of the American Academy of Dermatology. 2022;87(4):912-914. doi:10.1016/j.jaad.2021.12.022
15.
Speich B. Adequate reporting of the sample size calculation in surgical randomized controlled trials. Surgery. 2020;167(5):812-814. doi:10.1016/j.surg.2019.10.011
16.
Weinberg T, Wang G, Lam K, Kitchen J, Chan A‐W. Reporting of sample‐size calculations for randomized trials in dermatology: Comparison of publications with registries. British Journal of Dermatology. 2018;180(4):929-930. doi:10.1111/bjd.17332
17.
Amiri. 2020;56.
18.
Dumbrigue HB, Dumbrigue EC, Dumbrigue DC, Chingbingyong MI. Reporting of sample size parameters in randomized controlled trials published in prosthodontic journals. Journal of Prosthodontics. 2019;28(2):159-162. doi:10.1111/jopr.13010
19.
Nontshe M, Khan S, Mandebvu T, Merrifield B, Rodseth R. Sample-size determination and adherence in randomised controlled trials published in anaesthetic journals. Southern African Journal of Anaesthesia and Analgesia. 2018;24(2):40-46. doi:10.1080/22201181.2018.1439602
20.
Copsey B, Thompson JY, Vadher K, et al. Sample size calculations are poorly conducted and reported in many randomized trials of hip and knee osteoarthritis: Results of a systematic review. Journal of Clinical Epidemiology. 2018;104:52-61. doi:10.1016/j.jclinepi.2018.08.013
21.
Lee PH, Tse ACY. The quality of the reported sample size calculations in randomized controlled trials indexed in PubMed. European Journal of Internal Medicine. 2017;40:16-21. doi:10.1016/j.ejim.2016.10.008
22.
Althouse AD. Post hoc power: Not empowering, just misleading. Journal of Surgical Research. 2021;259:A3-A6. doi:10.1016/j.jss.2019.10.049
23.
Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Annals of Internal Medicine. 1994;121(3):200-206. doi:10.7326/0003-4819-121-3-199408010-00008

Reuse

Most of the reporting guidelines and checklists on this website were originally published under permissive licenses that allowed their reuse. Some were published with propriety licenses, where copyright is held by the publisher and/or original authors. The original content of the reporting checklists and explanation pages on this website were drawn from these publications with knowledge and permission from the reporting guideline authors, and subsequently revised in response to feedback and evidence from research as part of an ongoing scholarly dialogue about how best to disseminate reporting guidance. The UK EQUATOR Centre makes no copyright claims over reporting guideline content. Our use of copyrighted content on this website falls under fair use guidelines.

Citation

For attribution, please cite this work as:
Hopewell S, Chan AW, Collins GS, et al. CONSORT 2025 statement: updated guideline for reporting randomised trials. BMJ. 2025;389:e081123. doi:10.1136/bmj-2024-081123

Reporting Guidelines are recommendations to help describe your work clearly

Your research will be used by people from different disciplines and backgrounds for decades to come. Reporting guidelines list the information you should describe so that everyone can understand, replicate, and synthesise your work.

Reporting guidelines do not prescribe how research should be designed or conducted. Rather, they help authors transparently describe what they did, why they did it, and what they found.

Reporting guidelines make writing research easier, and transparent research leads to better patient outcomes.

Easier writing

Following guidance makes writing easier and quicker.

Smoother publishing

Many journals require completed reporting checklists at submission.

Maximum impact

From nobel prizes to null results, articles have more impact when everyone can use them.

Who reads research?

You work will be read by different people, for different reasons, around the world, and for decades to come. Reporting guidelines help you consider all of your potential audiences. For example, your research may be read by researchers from different fields, by clinicians, patients, evidence synthesisers, peer reviewers, or editors. Your readers will need information to understand, to replicate, apply, appraise, synthesise, and use your work.

Cohort studies

A cohort study is an observational study in which a group of people with a particular exposure (e.g. a putative risk factor or protective factor) and a group of people without this exposure are followed over time. The outcomes of the people in the exposed group are compared to the outcomes of the people in the unexposed group to see if the exposure is associated with particular outcomes (e.g. getting cancer or length of life).

Source.

Case-control studies

A case-control study is a research method used in healthcare to investigate potential risk factors for a specific disease. It involves comparing individuals who have been diagnosed with the disease (cases) to those who have not (controls). By analysing the differences between the two groups, researchers can identify factors that may contribute to the development of the disease.

An example would be when researchers conducted a case-control study examining whether exposure to diesel exhaust particles increases the risk of respiratory disease in underground miners. Cases included miners diagnosed with respiratory disease, while controls were miners without respiratory disease. Participants' past occupational exposures to diesel exhaust particles were evaluated to compare exposure rates between cases and controls.

Source.

Cross-sectional studies

A cross-sectional study (also sometimes called a "cross-sectional survey") serves as an observational tool, where researchers capture data from a cohort of participants at a singular point. This approach provides a 'snapshot'— a brief glimpse into the characteristics or outcomes prevalent within a designated population at that precise point in time. The primary aim here is not to track changes or developments over an extended period but to assess and quantify the current situation regarding specific variables or conditions. Such a methodology is instrumental in identifying patterns or correlations among various factors within the population, providing a basis for further, more detailed investigation.

Source

Systematic reviews

A systematic review is a comprehensive approach designed to identify, evaluate, and synthesise all available evidence relevant to a specific research question. In essence, it collects all possible studies related to a given topic and design, and reviews and analyses their results.

The process involves a highly sensitive search strategy to ensure that as much pertinent information as possible is gathered. Once collected, this evidence is often critically appraised to assess its quality and relevance, ensuring that conclusions drawn are based on robust data. Systematic reviews often involve defining inclusion and exclusion criteria, which help to focus the analysis on the most relevant studies, ultimately synthesising the findings into a coherent narrative or statistical synthesis. Some systematic reviews will include a [meta-analysis]{.defined data-bs-toggle="offcanvas" href="#glossaryItemmeta_analyses" aria-controls="offcanvasExample" role="button"}.

Source

Systematic review protocols

TODO

Meta analyses of Observational Studies

TODO

Randomised Trials

A randomised controlled trial (RCT) is a trial in which participants are randomly assigned to one of two or more groups: the experimental group or groups receive the intervention or interventions being tested; the comparison group (control group) receive usual care or no treatment or a placebo. The groups are then followed up to see if there are any differences between the results. This helps in assessing the effectiveness of the intervention.

Source

Randomised Trial Protocols

TODO

Qualitative research

Research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or field observations in order to collect data that is rich in detail and context. Qualitative research is often used to explore complex phenomena or to gain insight into people's experiences and perspectives on a particular topic. It is particularly useful when researchers want to understand the meaning that people attach to their experiences or when they want to uncover the underlying reasons for people's behaviour. Qualitative methods include ethnography, grounded theory, discourse analysis, and interpretative phenomenological analysis.

Source

Case Reports

TODO

Diagnostic Test Accuracy Studies

Diagnostic accuracy studies focus on estimating the ability of the test(s) to correctly identify people with a predefined target condition, or the condition of interest (sensitivity) as well as to clearly identify those without the condition (specificity).

Prediction Models

Prediction model research is used to test the accurarcy of a model or test in estimating an outcome value or risk. Most models estimate the probability of the presence of a particular health condition (diagnostic) or whether a particular outcome will occur in the future (prognostic). Prediction models are used to support clinical decision making, such as whether to refer patients for further testing, monitor disease deterioration or treatment effects, or initiate treatment or lifestyle changes. Examples of well known prediction models include EuroSCORE II for cardiac surgery, the Gail model for breast cancer, the Framingham risk score for cardiovascular disease, IMPACT for traumatic brain injury, and FRAX for osteoporotic and hip fractures.

Source

Animal Research

TODO

Quality Improvement in Healthcare

Quality improvement research is about finding out how to improve and make changes in the most effective way. It is about systematically and rigourously exploring "what works" to improve quality in healthcare and the best ways to measure and disseminate this to ensure positive change. Most quality improvement effectiveness research is conducted in hospital settings, is focused on multiple quality improvement interventions, and uses process measures as outcomes. There is a great deal of variation in the research designs used to examine quality improvement effectiveness.

Source

Economic Evaluations in Healthcare

TODO

Meta Analyses

A meta-analysis is a statistical technique that amalgamates data from multiple studies to yield a single estimate of the effect size. This approach enhances precision and offers a more comprehensive understanding by integrating quantitative findings. Central to a meta-analysis is the evaluation of heterogeneity, which examines variations in study outcomes to ensure that differences in populations, interventions, or methodologies do not skew results. Techniques such as meta-regression or subgroup analysis are frequently employed to explore how various factors might influence the outcomes. This method is particularly effective when aiming to quantify the effect size, odds ratio, or risk ratio, providing a clearer numerical estimate that can significantly inform clinical or policy decisions.

How Meta-analyses and Systematic Reviews Work Together

Systematic reviews and meta-analyses function together, each complementing the other to provide a more robust understanding of research evidence. A systematic review meticulously gathers and evaluates all pertinent studies, establishing a solid foundation of qualitative and quantitative data. Within this framework, if the collected data exhibit sufficient homogeneity, a meta-analysis can be performed. This statistical synthesis allows for the integration of quantitative results from individual studies, producing a unified estimate of effect size. Techniques such as meta-regression or subgroup analysis may further refine these findings, elucidating how different variables impact the overall outcome. By combining these methodologies, researchers can achieve both a comprehensive narrative synthesis and a precise quantitative measure, enhancing the reliability and applicability of their conclusions. This integrated approach ensures that the findings are not only well-rounded but also statistically robust, providing greater confidence in the evidence base.

Why Don't All Systematic Reviews Use a Meta-Analysis?

Systematic reviews do not always have meta-analyses, due to variations in the data. For a meta-analysis to be viable, the data from different studies must be sufficiently similar, or homogeneous, in terms of design, population, and interventions. When the data shows significant heterogeneity, meaning there are considerable differences among the studies, combining them could lead to skewed or misleading conclusions. Furthermore, the quality of the included studies is critical; if the studies are of low methodological quality, merging their results could obscure true effects rather than explain them.

Protocol

A plan or set of steps that defines how something will be done. Before carrying out a research study, for example, the research protocol sets out what question is to be answered and how information will be collected and analysed.

Source

Asdfghj

sdfghjk