Share

Unmasking P-values

For example, if we are considering the comparison of response rates between two equally sized treatment groups, the parameter is the difference between the response rates; the null hypothesis states that this difference is zero. If one observed response rates of 20% and 25% in the 2 groups, thus an observed difference of 5%, the P-value measures the likelihood that a difference at least as large as 5% is observed, if the response rates in the two groups were in fact identical. For groups of 40 patients, the P-value equals 0.74, indicating, (assuming that the two treatments were equally effective), a difference of 5% or greater in response rate has a 74% chance to be observed. If the observed response rates were 20% and 40%, the P-value would be 0.011, indicating that such a large difference would be unlikely to occur by chance alone in absence of true difference.

It is important to note that as such P-values also do not tell anything about the magnitude of the true difference in response rates.  In fact, with a sufficiently large sample size, say 800 patients per arm and observed response rates of 20% and 25%, the P-value would be 0.016. This is because P-values are influenced by precision (sample size), and not just by effect size.

The formalisation of the hypothesis testing, with the notion of statistical significance level (α) and that the alternative hypothesis (Ha) were later developed by Neyman and Pearson (Neyman and Pearson, 1933). Hypothesis testing provides a dichotomous decision as to whether or not the data are compatible “enough” with the null hypothesis (usually of zero effect) or if is not. In the latter case, the test concludes that the data were not generated under the null hypothesis and that the hypothesis of no difference should thus be rejected in favour of the alternative hypothesis (of a non-zero effect). This decision is made by comparing the P-value to a small threshold, typically arbitrarily set to α=0.05. The value of α is chosen to ensure that (on average over a large number of repetitions of the exact same experiment); a decision error to reject H0 is taken no more than 5% of the time.  In our examples above with sample size of 40 patients per arm, the test would not reject H0 for the experiment that observed response rates of 20% and 25% (as the P-value was 0.74) but it would reject the null hypothesis in the experiment with observed response rates of 20% and 40%.  However, in the experiment with sample size of 800 patients per arm, the test would also reject the null hypothesis with observed response rates of 20 and 25%.

Back to news list

Related News

  • EORTC: Advancing research and treatment for rare cancers

  • EORTC Fellowship Programme: celebrating more than 20 years of impactful collaboration

  • Appointment of Malte Peters as EORTC Strategic Alliance Officer

  • Unique series of workshops in partnership with the European Medicines Agency (EMA)

  • EORTC launches a prominent clinical trial in older patients with locally advanced (LA) HNSCC (Head and Neck Squamous Cell Carcinoma)

  • Seven IMMUcan abstracts selected for ESMO Immuno-Oncology Congress 2023

  • EORTC Quality of Life measures integrated in CDISC

  • EORTC and Immunocore are collaborating to launch the ATOM clinical trial of tebentafusp in Adjuvant Uveal Melanoma

  • Treatment with decitabine resulted in a similar survival and fewer adverse events compared with conventional chemotherapy in older fit patients with acute myeloid leukaemia

  • New results and forthcoming EORTC trials in rare cancers, lung, head and neck, and breast carcinomas presented at ESMO 2023