Unmasking P-values
20 May 2020
For example, if we are considering the comparison of response rates between two equally sized treatment groups, the parameter is the difference between the response rates; the null hypothesis states that this difference is zero. If one observed response rates of 20% and 25% in the 2 groups, thus an observed difference of 5%, the P-value measures the likelihood that a difference at least as large as 5% is observed, if the response rates in the two groups were in fact identical. For groups of 40 patients, the P-value equals 0.74, indicating, (assuming that the two treatments were equally effective), a difference of 5% or greater in response rate has a 74% chance to be observed. If the observed response rates were 20% and 40%, the P-value would be 0.011, indicating that such a large difference would be unlikely to occur by chance alone in absence of true difference.
It is important to note that as such P-values also do not tell anything about the magnitude of the true difference in response rates. In fact, with a sufficiently large sample size, say 800 patients per arm and observed response rates of 20% and 25%, the P-value would be 0.016. This is because P-values are influenced by precision (sample size), and not just by effect size.
The formalisation of the hypothesis testing, with the notion of statistical significance level (α) and that the alternative hypothesis (Ha) were later developed by Neyman and Pearson (Neyman and Pearson, 1933). Hypothesis testing provides a dichotomous decision as to whether or not the data are compatible “enough” with the null hypothesis (usually of zero effect) or if is not. In the latter case, the test concludes that the data were not generated under the null hypothesis and that the hypothesis of no difference should thus be rejected in favour of the alternative hypothesis (of a non-zero effect). This decision is made by comparing the P-value to a small threshold, typically arbitrarily set to α=0.05. The value of α is chosen to ensure that (on average over a large number of repetitions of the exact same experiment); a decision error to reject H0 is taken no more than 5% of the time. In our examples above with sample size of 40 patients per arm, the test would not reject H0 for the experiment that observed response rates of 20% and 25% (as the P-value was 0.74) but it would reject the null hypothesis in the experiment with observed response rates of 20% and 40%. However, in the experiment with sample size of 800 patients per arm, the test would also reject the null hypothesis with observed response rates of 20 and 25%.
Related News
EORTC: Advancing research and treatment for rare cancers
29 Feb 2024
EORTC Fellowship Programme: celebrating more than 20 years of impactful collaboration
22 Feb 2024
Appointment of Malte Peters as EORTC Strategic Alliance Officer
9 Feb 2024
Unique series of workshops in partnership with the European Medicines Agency (EMA)
7 Feb 2024
EORTC launches a prominent clinical trial in older patients with locally advanced (LA) HNSCC (Head and Neck Squamous Cell Carcinoma)
14 Dec 2023
Seven IMMUcan abstracts selected for ESMO Immuno-Oncology Congress 2023
6 Dec 2023
EORTC Quality of Life measures integrated in CDISC
20 Nov 2023
EORTC and Immunocore are collaborating to launch the ATOM clinical trial of tebentafusp in Adjuvant Uveal Melanoma
7 Nov 2023
Treatment with decitabine resulted in a similar survival and fewer adverse events compared with conventional chemotherapy in older fit patients with acute myeloid leukaemia
31 Oct 2023
New results and forthcoming EORTC trials in rare cancers, lung, head and neck, and breast carcinomas presented at ESMO 2023
20 Oct 2023