I was rather surprised to see recently (OK, it was a couple of months ago, but I do have a day job to do as well as writing this blog) that the journal Basic and Applied Social Psychology has banned P values.
That’s quite a bold move. There are of course many problems with P values, about which David Colquhoun has written some sensible thoughts. Those problems seem to be particularly acute in the field of psychology, which suffers from something of a problem when it comes to replicating results. It’s undoubtedly true that many published papers with significant P values haven’t really discovered what they claimed to have discovered, but have just made type I errors, or in other words, have obtained significant results just by chance, rather than because what they claim to have discovered is actually true.
It’s worth reminding ourselves what the conventional test of statistical significance actually means. If we say we have a significant result with P < 0.05, then that means that there is a 1 in 20 chance we would have seen that result if in fact we had completely random data. A 1 in 20 chance is not at all rare, particularly when you consider the huge number of papers that are published every day. Many of them are going to have type I errors.
Clearly, something must be done.
However, call me a cynic if you like, but I’m not sure how banning P values (and confidence intervals as well, if you thought just banning P values was radical enough) is going to help. Perhaps if all articles in Basic and Applied Social Psychology in the future have robust Bayesian analyses that would be an improvement. But I hardly think that’s likely to happen. What is more likely is that researchers will claim to have discovered effects even if they are not conventionally statistically significant, which surely is even worse than where we were before.
I suspect one of the problems with psychology research is that much research, particularly negative research, goes unpublished. It’s probably a lot easier to get a paper published showing that you have just demonstrated some fascinating psychological effect than if you have just demonstrated that the effect you had hypothesised doesn’t in fact exist.
This is a problem we know well in my world of clinical trials. There is abundant evidence that positive clinical trials are more likely to be published than negative ones. This is a problem that the clinical research community has become very much aware of, and has been working quite hard to solve. I wouldn’t say it is completely solved yet, but things are a lot better now than they were a decade or two ago.
One relevant factor is the move to prospective trial registration. It seems that prospectively registering trials is helping to solve the problem of publication bias. While clinical research doesn’t yet have a 100% publication record (though some recent studies do show disclosure rates of > 80%), I suspect clinical research is far ahead of the social sciences.
Perhaps a better solution to the replication crisis in psychology would be a system for prospectively registering all psychology experiments and a commitment by researchers and journals to publish all results, positive or negative. That wouldn’t necessarily mean more results get replicated, of course, but it would mean that we’d be more likely to know about it when results are not replicated.
I’m not pretending this would be easy. Clinical trials are often multi-million dollar affairs, and the extra bureaucracy involved in trial registration is trivial in comparison with the overall effort. Many psychology experiments are done on a much smaller scale, and the extra bureaucracy would probably add proportionately a lot more to the costs. But personally, I think we’d all be better off with fewer experiments done and more of them being published.
I don’t think the move by Basic and Applied Social Psychology is likely to improve the quality of reporting in that journal. But if it gets us all talking about the limitations of P values, then maybe that’s not such a bad thing.