The Bayes factor assisted you in optimal guesswork: how much you should, subjectively, favor one hypothesis over the other, in light of the available evidence. We also saw that assigning priors is theoretically and, above all, in practice very difficult. An important point here is that, irrespective of priors, we could quantify the obtained data’s impact on how we update probabilities based only on the likelihood ratio. In other words, we can sidestep the prior-problem by simply considering how much the evidence favors one hypothesis over the other, and this measure, the relativeevidential weight, is conceptually distinct from that of subjective beliefs.
For example, let us say that we have two different hypotheses concerning the probability distribution of some population. One of them is that the proportion of males is 50%, the other hypothesis states that it is 100%. These are your hypotheses – there are no priors involved now. You take a 5 random samples, all of which are male. You proceed to calculate the likelihood ratio P( 5 male | 100% men)/P(5 male | 50% male) = 1/(0.5^5)=32. This means that the evidence favors the numerator hypothesis 32 times more than it favors the denominator hypothesis, because that hypothesis predicted the data 32 more strongly. According to Richard Royall, a strong proponent of the likelihood approach, 32 happens to be a reasonably threshold for “strong relative evidence”, with 8 being “fairly strong”.
We may plot a hypothesis’ likelihood to see how it changes as a function of sample size (for example, if the above example had a hypothesis of 0.8, this function would be the Bernoulli function 0.8^n * 0.2^(m-n), where the sample has size m and n of them are the outcome believed to have probability 80%), and given sample size, we can plan how many samples need to have a certain outcome in order to get a certain likelihood ratio. For example, given a sample of 20 people, to get a likelihood ratio greater than 8, 15 people need to get the 80% outcome to favor that hypothesis ((0.8^15 * 0.2^5)/(0.5^15 * 0.5^5) > 8). For two hypotheses that differ by more than 0.3, fewer subjects would be required for this strength of evidence.
We could also hold sample size fixed, and let the parameter-values vary (like the function θ^5 * (1- θ)^5). Given this latter likelihood function, we could calculate the equivalent of a credibility interval. Going by Royall’s recommendation that a ratio of 32 constitutes strong evidence, then all the θ-hypotheses with a likelihood value of more than 1/32 of the highest likelihood would have roughly the same strength of evidence. The “likelihood interval” therefore indicates the range of strongly favored hypotheses.
Again, there are a couple of conceptual points to highlight:
- The ratio only requires two bits of information: the two likelihoods. It is unaffected by other changes in the population distribution (which is not the case of the Neyman-Pearson approach we will soon describe).
- Again, you can continue to collect data for as long as you want, since likelihoods only depends on the product of each event’s probability.
- The evidence, of course, can always be misleading, even though we interpret it in an appropriate way. However, the probability of obtaining a ratio of k in favor of the false hypothesis is bounded by 1/k, since this is the inverse.
- If a likelihood interval does not include a hypothesis, this is not simply evidence against it, since it is relative to some other hypothesis, so we cannot reject it on those grounds.