Unmasking P-values
20 May 2020
For example, if we are considering the comparison of response rates between two equally sized treatment groups, the parameter is the difference between the response rates; the null hypothesis states that this difference is zero. If one observed response rates of 20% and 25% in the 2 groups, thus an observed difference of 5%, the P-value measures the likelihood that a difference at least as large as 5% is observed, if the response rates in the two groups were in fact identical. For groups of 40 patients, the P-value equals 0.74, indicating, (assuming that the two treatments were equally effective), a difference of 5% or greater in response rate has a 74% chance to be observed. If the observed response rates were 20% and 40%, the P-value would be 0.011, indicating that such a large difference would be unlikely to occur by chance alone in absence of true difference.
It is important to note that as such P-values also do not tell anything about the magnitude of the true difference in response rates. In fact, with a sufficiently large sample size, say 800 patients per arm and observed response rates of 20% and 25%, the P-value would be 0.016. This is because P-values are influenced by precision (sample size), and not just by effect size.
The formalisation of the hypothesis testing, with the notion of statistical significance level (α) and that the alternative hypothesis (Ha) were later developed by Neyman and Pearson (Neyman and Pearson, 1933). Hypothesis testing provides a dichotomous decision as to whether or not the data are compatible “enough” with the null hypothesis (usually of zero effect) or if is not. In the latter case, the test concludes that the data were not generated under the null hypothesis and that the hypothesis of no difference should thus be rejected in favour of the alternative hypothesis (of a non-zero effect). This decision is made by comparing the P-value to a small threshold, typically arbitrarily set to α=0.05. The value of α is chosen to ensure that (on average over a large number of repetitions of the exact same experiment); a decision error to reject H0 is taken no more than 5% of the time. In our examples above with sample size of 40 patients per arm, the test would not reject H0 for the experiment that observed response rates of 20% and 25% (as the P-value was 0.74) but it would reject the null hypothesis in the experiment with observed response rates of 20% and 40%. However, in the experiment with sample size of 800 patients per arm, the test would also reject the null hypothesis with observed response rates of 20 and 25%.
Related News
EORTC welcomes EU Biotech Act and calls for refinements to enable patient-centred trials
24 Jun 2026
New study confirms a key quality of life tool can be used with adolescents with cancer
9 Jun 2026
“Changing practice, improving lives”: EORTC publishes its Annual Report 2025
8 Jun 2026
This Clinical Trials Day, EORTC announces the upcoming Summit for Clinical Cancer Research
20 May 2026
Long-term EORTC trial challenges assumptions about lymph node radiation therapy in breast cancer
17 May 2026
Multinational study provides new evidence for the value of response-adapted, personalised treatment in Hodgkin lymphoma
1 May 2026
EORTC’s presence at ESTRO 2026
30 Apr 2026
Independent, academic cancer trials are vital to improve patient outcomes worldwide
28 Apr 2026
EORTC Imaging Group Becomes the Diagnostic and Therapeutic Imaging Group
24 Apr 2026
Investing in the future of cancer research; EORTC’s Young and Early Career investigator Network shows its value
23 Apr 2026
