FORMAL MODELS OF INFERENCE FOR THE
VARIETY OF EVIDENCE THESIS

Barbara Osimani & Jürgen Landes

Introduction

According to the variety of evidence thesis (VET), items of evidence from independent lines of investigation are more confirmatory, ceteris paribus, than, for example, replications of analogous studies. Although intuitively plausible, this thesis is known to fail (Bovens and Hartmann [2003]; Claveau [2013]). In our article, we investigate the epistemic dynamics of VET failure by changing the model parameters regarding the ‘reliability’ of the source. The comparison of our results with previous attempts to analyse the VET illustrates that the way we model, and think of, (un)reliability impacts the inferential import of consistent results.

Worked Example

Take a prototypical medical study. In 5% of the cases it will falsely report that the hypothesis under investigation holds with a power of 80% (while in 20% of the cases it will report that the hypothesis is false when in fact the hypothesis holds). Now suppose that the study reports that the hypothesis of interest does hold. How strongly does this report confirm the hypothesis of interest?

Bayesian epistemology provides an answer to this question in terms of the posterior probability of the hypothesis, P(Hyp|E). If P(Hyp) is the prior probability of the hypothesis being true, P(E|Hyp) is the probability of a true positive (95% in our imagined medical study), and is the probability of false negative (20%), then the posterior probability is given by:

$$\displaylines{
P(Hyp|E) = \cfrac{P(Hyp)}{P(Hyp)+(1-P(Hyp))\cdot\cfrac{P(E|\overline{Hyp})}{P(E|Hyp)}} \\
= \cfrac{P(Hyp)}{P(Hyp)+(1-P(Hyp))\cdot\cfrac{20\%}{95\%}} \\
\approx \cfrac{P(Hyp)}{P(Hyp)+(1-P(Hyp))\cdot0.21}.}$$

The posterior probability only depends on the prior probability of the hypothesis and the Bayes factor (the ratio of true positives to false positives, namely, 95:20).

Due to the possibility of a false positive, there’s uncertainty about the truth of the hypothesis and a second study is performed. Now imagine that you are tasked with commissioning that second study. Would you rather use the same experimental set-up or use a different experimental set-up that is of the exact same informativeness as the first (see Figure 1)? Assuming a positive report—that the study suggests that the hypothesis holds—your next choice depends on which option is more confirmatory: a varied strategy employing two independent experiments or a replication strategy using the same experiment twice?

Figure 1. A common modelling choice—see (Bovens and Hartmann [2003]; Olsson [2011]; Claveau [2011]; Osimani [2020])—is to assume the variable node representing the incoming evidence REP (for ‘report’) both as a ‘descendant’ of the investigated hypothesis node (HYP) and of the node representing the hypothesis about the reliability of the instrument (REL). In this way, (in)consistent evidence is represented as not only symptomatic of the hypothesis being (in)valid, but also of the reliability status of the source.

In reality, of course, things are more complicated. It may be the case, for instance, that that evidence cannot be taken at face value due to biases. In medicine, sponsorship bias is a significant factor that needs to be borne in mind. In our example, this means that the hypothesis ‘this drug is safe and efficacious’ is more likely to be reported in a study plagued by sponsorship bias than in a study without such bias. In light of these considerations, one should prima facie prefer the variation strategy. However, in our article, we show that this is not always rational (in the Bayesian sense).

Given certain knowledge about the instruments themselves, receiving two confirmatory reports from the same experimental set-up is more confirmatory than receiving two such reports from different set-ups. Interestingly, knowing which study is the more confirmatory is independent of the prior belief that the study suffers from sponsorship bias. Indeed, as it turns out, the crucial and only factor this depends on is the comparison of Bayes factors for the sponsored versus non-sponsored study. For example, suppose you expect that a sponsored study will always report that the hypothesis holds if the hypothesis is indeed true. Then if the probability of a false positive in a sponsored study is less than 21%, it is more confirmatory to replicate (21% is the inverse of the Bayes factor of the prototypical medical study).

These epistemic dynamics reveal the central role of our assumptions about the instrumental set-up generating the evidence when we evaluate its diagnostic value. In standard settings, though, such diagnostic value is itself uncertain. Thus evidence is used to update our belief both in the investigated hypothesis and in the reliability of the source of the evidence. In our article, we draw on a consolidated tradition in Bayesian epistemology that models such problems via Bayes nets representing epistemic entities and their probabilistic relationships (Bovens and Olsson [2000]; Bovens and Hartmann [2002], [2003]; Claveau [2013]; Collins et al. [2018]; Landes [2020]). In particular, we follow Bovens and Hartmann ([2003]) in modelling replication from the same source (for example, the same research group) by having the reports of distinct studies, coming from the same source, share the node in the net that represents the source being reliable or not (see Figure 1).

This modelling choice implies that the hypothesis is not the only (ancestral) cause of the evidence reports; hence evidence results both from coming from nature (that is, the observable outputs of natural phenomena) and other intervening causes, which may interpolate such signals in various ways. The different ways that signal transmission is framed explicate distinct ways that the reliability of such transmission may be conceptualized (see also Bonzio et al. [2020]).

For instance, in (Bovens and Hartmann [2003]), the reliable instrument perfectly transmits the signals received from nature by delivering positive and negative reports with a probability of one when the hypothesis holds versus when it doesn’t hold. In contrast, when the instrument is unreliable, the probability of delivering a positive versus negative report becomes independent of the state of nature; the instrument is a randomizer. Olsson ([2011]) allows reliability to be parametrized on a continuum scale from fully reliable sources, through randomizing ones, up to systematic liars. In (Claveau [2013]), unreliability is modelled as deterministic bias: the positively biased source always delivers positive reports no matter what.

In our case, bias is no longer deterministic—a positively biased source may also deliver negative reports. And a reliable source is not fully reliable. Hence, the uncertainty in our model is between a source affected by random error only and a source that is positively biased, but not deterministically so. Thus, one thing that distinguishes Olsson’s model and our own from Claveau’s and Bovens and Hartmann’s is that in the latter, unreliability implies that the source is fully disconnected from reality and delivers positive and negative reports no matter what the truth is. In the former case, some dependence of the reports on the true state of nature is warranted in all the parameter space (see our article for more), or at least in most of it (Olsson [2011]).

However, whereas Olsson intends to answer the sceptical challenge regarding whether social epistemic practices may be truth-conducive, Bovens and Hartmann, Claveau, and we aim to verify, under different modelling assumptions, what are the conditions for the failure of the reasonable intuition that evidence from independent sources is more confirmatory than evidence from one and the same source.  In each case, the VET fails for different reasons. For instance, in Bovens and Hartmann’s article, what plays a fundamental role in the failure of the VET is the reasoning that receiving two positive reports from the same instrument is more probable under the assumption of the hypothesis being true and the instrument being reliable, rather than receiving them from two distinct sources. In our case, the opposite happens: the failure of the VET stems from the reasoning that receiving two positive reports from the same instrument is more probable under the assumption of the hypothesis being true and the instrument being positively biased, rather than receiving them from two distinct sources.

Conclusions

The take-home lesson from the comparison of these results is that the confirmatory value of evidence can never be disconnected from our assumptions regarding the set-up from which it emerges. For instance, the decision to formalize the relationship between uncertainty about source reliability and confirmatory boost of the hypothesis by modelling the evidence as the effect of both the hypothesis and the reliability nodes (and hence being a ‘collider’) is neither formally nor substantially neutral. One could as well incorporate such uncertainty directly into the likelihoods of the hypothesis on the evidence, as in (Sober [1989]). More generally, with other configurations, where some of the pieces of evidence may take the place of root nodes (such as motives in legal settings), the confirmatory support of varied evidence is constrained by the dependence relations among them and with respect to the hypothesis (Wheeler [2009]; Wheeler and Scheines [2013]).

The analysis of such structural relations between items of evidence, the investigated hypothesis, and hypotheses regarding the evidence source itself promises to be an essential step forward in understanding (scientific) reasoning and in shaping scientific methodology (Osimani [2020]). We hope our article encourages further progress on the matter.

Listen to the latest audio essay

FULL ARTICLE

Osimani, B. and Landes, J. [2023]: ‘Varieties of Error and Varieties of Evidence in Scientific Inference’, British Journal for the Philosophy of Science, 74, <doi.org/10.1086/714803>.

Barbara Osimani
Universita Politecnica delle March
barbaraosimani@gmail.com

Jürgen Landes
Ludwig-Maximilians-Universität München
juergen_landes@yahoo.de

References

Bonzio, S., Landes, J. and Osimani, B. [2020]: ‘Reliability: An Introduction’, Synthese, pp. 1–10.

Bovens, L. and Hartmann, S. [2002]: ‘Bayesian Networks and the Problem of Unreliable Instruments’, Philosophy of Science, 69, pp. 29–72.

Bovens, L. and Hartmann, S. [2003]: Bayesian Epistemology, Oxford: Oxford University Press.

Bovens, L. and Olsson, E. J. [2000]: ‘Coherentism, Reliability, and Bayesian Networks’, Mind, 109, pp. 685–719.

Claveau, F. [2011]: ‘Evidential Variety as a Source of Credibility for Causal Inference: Beyond Sharp Designs and Structural Models’, Journal of Economic Methodology, 18, pp. 233–53.

Claveau, F. [2013]: ‘The Independence Condition in the Variety-of-Evidence Thesis’, Philosophy of Science, 80, pp. 94–118.

Collins, P. J., Hahn, U., von Gerber, Y. and Olsson, E. J. [2018]: ‘The Bi-directional Relationship between Source Characteristics and Message Content’, Frontiers in Psychology, 9, p. 18.

Landes, J. [2020]: ‘Variety of Evidence’, Erkenntnis, 85, pp. 183–223.

Olsson, E. J. [2011]: ‘A Simulation Approach to Veritistic Social Epistemology’, Episteme, 8, pp. 127–43.

Osimani, B. [2020]: ‘The Causal Structure of Epistemic Environments’, in C. Littlejohn and M. Lasonen-Aarnio (eds), Routledge Handbook of the Philosophy of Evidence, London: Routledge.

Sober, E. [1989]: ‘Independent Evidence about a Common Cause’, Philosophy of Science, 56, pp. 275–87.

Wheeler, G. [2009]: ‘Focused Correlation and Confirmation’, British Journal for the Philosophy of Science, 60, pp. 79–100.

Wheeler, G. and Scheines, R. [2013]: ‘Coherence and Confirmation through Causation’, Mind, 122, pp. 135–70.

© The Authors (2022)

FULL ARTICLE

Osimani, B. and Landes, J. [2023]: ‘Varieties of Error and Varieties of Evidence in Scientific Inference’, British Journal for the Philosophy of Science, 74, <doi.org/10.1086/714803>.