DOES IT MATTER IF HYPOTHESIZING
IS SECRETLY POST HOC?

Mark Rubin

While no-one is looking, a Texas sharpshooter fires his gun at a barn wall. He then walks up to his bullet holes and paints targets around them. When his friends arrive, he points at the targets and claims that he’s a good shot (de Groot [2014]; Rubin [2017b]). Norbert Kerr ([1998]) discusses an analogous situation in which researchers engage in undisclosed hypothesizing after the results are known, or HARKing. In this case, researchers conduct statistical tests, observe their research results (bullet holes), and then construct post hoc predictions (paint targets) to fit these results. In their research reports, they then pretend that their post hoc hypotheses are actually a priori hypotheses. This questionable research practice is thought to have contributed to the replication crisis in science (Shrout and Rodgers [2018]), and it provides part of the rationale for researchers to publicly pre-register their hypotheses ahead of conducting their research (Wagenmakers et al. [2012]). In my BJPS article, ‘The Costs of HARKing’, I discuss the concept of HARKing from a philosophical standpoint and then undertake a critical analysis of Kerr’s twelve potential costs of HARKing.

Source: Dirk-Jan Hoek

I begin by arguing that scientists do not make absolute, dichotomous judgements about theories and hypotheses being ‘true’ or ‘false’. Instead, when accounting for certain phenomena, they make relative judgements about theories and hypotheses being more or less true than other theories and hypotheses. Such judgements can be described as ‘estimates of relative verisimilitude’ (Popper [1985], p. 58; Meehl [1990]; Cevolani and Festa [forthcoming]). HARKers are obliged to provide a theoretical rationale for each of their secretly post hoc hypotheses in the introduction sections of their research reports. Despite being secretly post hoc, I argue that this theoretical rationale provides a result-independent basis for an initial estimate of the relative verisimilitude of a hypothesis.

The reported research results can then provide a second, epistemically independent basis for adjusting this initial estimate of verisimilitude (for a similar view, see Lewandowsky [2019]; Oberauer and Lewandowsky [2019]). Hence, readers can estimate the relative verisimilitude of a hypothesis (a) without taking the reported result into account and (b) after taking the result into account, even if they have been misled about when researchers constructed the hypothesis. Consequently, readers are able to undertake a valid counterfactual updating of their estimated relative verisimilitude of a hypothesis even though HARKing has occurred. Importantly, there is no ‘double-counting’ (Mayo [2008]) or violation of the use novelty principle (Worrall [1985], [2014]), because the current result contributes new information to the readers’ initial estimate of relative verisimilitude that has been generates in a results-independent manner.

To translate this reasoning to the Texas sharpshooter analogy, we need to distinguish HARKing from p-hacking. Frequentist researchers distinguish the above example of HARKing from p-hacking (Simmons et al. [2011]; Rubin [2017b], p. 325). The latter occurs when a researcher conducts multiple statistical tests and then selectively reports only those results that support their original, a priori substantive hypothesis. In our analogy, this would be similar to our sharpshooter painting a new target but retaining his substantive claim that he is a good shot. To be a HARKer, researchers must also change their original a priori hypothesis or create a totally new one. Hence, a more appropriate analogy is to consider a sharpshooter who changes both their statistical hypothesis (the target’s location) and their broader substantive hypothesis (the claim).

For example, imagine that a sharpshooter initially believes the assertion ‘I’m a good shot’, but seeing that she has missed her target, she secretly changes her claim and declares to her friends, ‘I’m a good shot, but I can’t adjust for windy conditions’. Knowing what they know about the sharpshooter, her friends should be able to form an initial opinion about the verisimilitude of this claim (for example, ‘She’s always trained indoors. It makes sense that she hasn’t learned to adjust for windy conditions, and it was windy when she took the shot.’). To support her claim, the sharpshooter might provide her friends with accurate procedural information about (a) the direction in which she was aiming her gun when she took the shot and (b) the speed and direction of the wind at the time of her shot. Her friends would then be able to combine this procedural information with a priori theoretical information about the way gun shots are affected by the wind in order to calculate the predicted location of the sharpshooter’s bullet hole in a result-independent manner. They could then observe the extent to which this predicted location matches the location of the sharpshooter’s bullet hole (and newly painted target). The greater the match, the more they should increase their belief in the (secretly HARKed) hypothesis that the sharpshooter is a good shot but did not adjust for the current windy conditions.

The second part of my article provides a critical analysis of Kerr’s ([1998]) twelve costs of HARKing. I argue that HARKing conceals the timing of a researcher’s personal hypothesizing but does not conceal the quality of (a) the hypothesizing, (b) the research methodology, or (c) the statistical analysis. Readers can make judgements about the quality of each these aspects of the research without knowing the timing of the researcher’s hypothesizing. So, even if readers are unaware that a hypothesis has been HARKed, they are still able to criticize (a) the theoretical quality of the HARKed hypothesis, (b) the appropriateness of the methodology for testing those hypotheses, and (c) the appropriateness of the statistical analyses (lack of correction for multiple testing, lack of justification for directional tests).

Given the potentially limited costs of HARKing to the scientific process, I argue that it is premature to conclude that HARKing is an important contributor to the replication crisis in science.

FULL ARTICLE

Rubin, M. [2022]: ‘The Costs of HARKing‘, British Journal for the Philosophy of Science, 73,
doi: 10.1093/bjps/axz050

Mark Rubin
School of Psychology
Newcastle University
Mark.Rubin@newcastle.edu.au

References

Cevolani, G. and Festa, R. [forthcoming]: ‘A Partial Consequence Account of Truthlikeness’, Synthese.

de Groot, A. D. [2014]: ‘The Meaning of “Significance” for Different Types of Research’, Acta Psychologica, 148, pp. 188–94.

Kerr, N. L. [1998]: ‘HARKing: Hypothesizing after the Results Are Known’, Personality and Social Psychology Review, 2, pp. 196–217.

Lewandowsky, S. [2019]: ‘Avoiding Nimitz Hill with More than a Little Red Book: Summing up #PSprereg’, The Psychonomic Society.

Mayo, D. G. [2008]: ‘How to Discount Double-Counting When It Counts: Some Clarifications’, British Journal for the Philosophy of Science, 59, pp. 857–79.

Meehl, P. E. [1990]: ‘Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles That Warrant It’, Psychological Inquiry, 1, pp. 108–41.

Oberauer, K. and Lewandowsky, S. [2019]: ‘Addressing the Theory Crisis in Psychology’, Psychonomic Bulletin and Review.

Popper, K. [1985]: Realism and the Aim of Science: From the Postscript to the Logic of Scientific Discovery, London: Routledge.

Rubin, M. [2017a]: ‘An Evaluation of Four Solutions to the Forking Paths Problem: Adjusted Alpha, Preregistration, Sensitivity Analyses, and Abandoning the Neyman–Pearson Approach’, Review of General Psychology, 21, pp. 321–9.

Rubin, M. [2017b]: ‘When Does HARKing Hurt? Identifying When Different Types of Undisclosed post hoc Hypothesizing Harm Scientific Progress’, Review of General Psychology, 21, pp. 308–20.

Shrout, P. E. Rodgers, J. L. [2018]: ‘Psychology, Science, and Knowledge Construction: Broadening Perspectives from the Replication Crisis’, Annual Review of Psychology, 69, pp. 487–510.

Simmons, J. P., Nelson, L. D. and Simonsohn, U. [2011]: ‘False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant’, Psychological Science, 22, pp. 1359–66.

Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L. Kievit, R. A. [2012]: ‘An Agenda for Purely Confirmatory Research’, Perspectives on Psychological Science, 7, pp. 632–8.

Worrall, J. [1985]: ‘Scientific Discovery and Theory-Confirmation’, in J. C. Pitt (ed.), Change and Progress in Modern Science, Dordrecht: Reidel, pp. 301–31.

Worrall, J. [2014]: ‘Prediction and Accommodation Revisited’, Studies in History and Philosophy of Science, 45, pp. 54–61.

© The Author (2021)

FULL ARTICLE

Rubin, M. [2022]: ‘The Costs of HARKing‘, British Journal for the Philosophy of Science,
73, doi: 10.1093/bjps/axz050