Unmasking P-values
20 May 2020
For example, if we are considering the comparison of response rates between two equally sized treatment groups, the parameter is the difference between the response rates; the null hypothesis states that this difference is zero. If one observed response rates of 20% and 25% in the 2 groups, thus an observed difference of 5%, the P-value measures the likelihood that a difference at least as large as 5% is observed, if the response rates in the two groups were in fact identical. For groups of 40 patients, the P-value equals 0.74, indicating, (assuming that the two treatments were equally effective), a difference of 5% or greater in response rate has a 74% chance to be observed. If the observed response rates were 20% and 40%, the P-value would be 0.011, indicating that such a large difference would be unlikely to occur by chance alone in absence of true difference.
It is important to note that as such P-values also do not tell anything about the magnitude of the true difference in response rates. In fact, with a sufficiently large sample size, say 800 patients per arm and observed response rates of 20% and 25%, the P-value would be 0.016. This is because P-values are influenced by precision (sample size), and not just by effect size.
The formalisation of the hypothesis testing, with the notion of statistical significance level (α) and that the alternative hypothesis (Ha) were later developed by Neyman and Pearson (Neyman and Pearson, 1933). Hypothesis testing provides a dichotomous decision as to whether or not the data are compatible “enough” with the null hypothesis (usually of zero effect) or if is not. In the latter case, the test concludes that the data were not generated under the null hypothesis and that the hypothesis of no difference should thus be rejected in favour of the alternative hypothesis (of a non-zero effect). This decision is made by comparing the P-value to a small threshold, typically arbitrarily set to α=0.05. The value of α is chosen to ensure that (on average over a large number of repetitions of the exact same experiment); a decision error to reject H0 is taken no more than 5% of the time. In our examples above with sample size of 40 patients per arm, the test would not reject H0 for the experiment that observed response rates of 20% and 25% (as the P-value was 0.74) but it would reject the null hypothesis in the experiment with observed response rates of 20% and 40%. However, in the experiment with sample size of 800 patients per arm, the test would also reject the null hypothesis with observed response rates of 20 and 25%.
Related News
EORTC celebrates World Radiotherapy Awareness Day (WRAD) through innovation and collaboration in clinical cancer trials
3 Sep 2025
2148 MRD Study reaches milestone with First Patient In (FPI)
4 Aug 2025
PET imaging widely used in European brain tumour centres, survey shows
1 Jul 2025
New EORTC Leadership: President-Elect and Scientific Chairs Council
27 Jun 2025
First Site Activated in EU-Funded DE-ESCALATE Clinical Trial on Advanced Metastatic Prostate Cancer
10 Jun 2025
EORTC publishes its 2024 Annual Report: Driving progress in cancer treatment, together
4 Jun 2025
EORTC in the spotlight at ASCO 2025
4 Jun 2025
EORTC’s presence at ASCO 2025
28 May 2025
Celebrating Clinical Trials Day 2025 with EORTC’s Young and Early Career Investigators
20 May 2025
New commentary urges a rethink of quality of life metrics in cancer care economics in the context of EU HTA reform
19 May 2025