Share

Unmasking P-values

A non-significant P-value is often mistaken to indicate that there is the true effect is zero (the null hypothesis is true) (Goodman 2008). No! Not only can the P-value be non-significant due to chance (false negative error, as shown in figure 2), but it can also be due to lack of statistical power. For instance if the effect of interest was a difference of 15% in response rate, we see in figure 2 that a study of 80 patients would miss that difference over 55% of the time. Nevertheless, even if the study is reasonably sized and if the test result is not a chance error, the true effect is plausibly some small magnitude and not exactly null. This is why at EORTC we do not formally compare baseline characteristics in randomized experiments, indeed even small differences in the distribution of major prognostic factors may influence outcome comparisons outcomes. The presence of meaningful imbalances in baseline factors is best judged with a clinical eye than with a statistical test!

The significance level controls the risk of making a false positive claim of a difference when doing one test. When accumulating significance testing, for example when repeating analyses of accumulating data, when testing for associations between multiple genes and an outcome, and when testing in subgroups,  the risk of making at least one false claim increases very quickly. When running 10 independent tests at the 0.05 significance level each the chance of at least one false positive is 40%, with five tests it is already close to 25%! This feature, coupled with the great attractiveness of P-values explain the over representation of positive claims in published reports (Cristea and Ioannidis 2018). At EORTC, we enforce the definition of a prospective analysis plan that specifies and justifies the hypotheses that will be conducted, and we implement measures to control the risk of false positive claims. This is done by alpha-spending functions in interim analyses, error adjustment methods in when multiple tests are conducted, or FDR control in gene association studies. We also make sure that reports of our studies are interpreted in accordance with the protocols and statistical plans, to avoid opportunistic emphasis on potentially false positive findings.

Back to news list

Related News

  • EORTC: Advancing research and treatment for rare cancers

  • EORTC Fellowship Programme: celebrating more than 20 years of impactful collaboration

  • Appointment of Malte Peters as EORTC Strategic Alliance Officer

  • Unique series of workshops in partnership with the European Medicines Agency (EMA)

  • EORTC launches a prominent clinical trial in older patients with locally advanced (LA) HNSCC (Head and Neck Squamous Cell Carcinoma)

  • Seven IMMUcan abstracts selected for ESMO Immuno-Oncology Congress 2023

  • EORTC Quality of Life measures integrated in CDISC

  • EORTC and Immunocore are collaborating to launch the ATOM clinical trial of tebentafusp in Adjuvant Uveal Melanoma

  • Treatment with decitabine resulted in a similar survival and fewer adverse events compared with conventional chemotherapy in older fit patients with acute myeloid leukaemia

  • New results and forthcoming EORTC trials in rare cancers, lung, head and neck, and breast carcinomas presented at ESMO 2023