Physical Probability

1. Laplace’s Urn

It is standard to regard probabilities as real numbers satisfying Kolmogorov’s axioms. For Kolmogorov (1956: 2) a field of probability is some sets F and some numbers, say prob(A), satisfying the following.

Let E be a set of events, and F be a set of subsets of E that is a field containing E.

If A is an element of F then prob(A) is a nonnegative real number. Also, prob(E) = 1, and prob(AÈB) = prob(A) + prob(B), if A and B are disjoint.

Kolmogorov (1956: 6) also defined conditional probabilities, say prob(A | B), so that they equal prob(AÇB)/prob(B) if prob(B) > 0 (which yields, for independent A and B, the multiplication axiom).

And he (1956: 14) gave an axiom that yields countable additivity.

However, we cannot just take probabilities to be whatever satisfies such axioms because, for example, areas do (probability theory being part of measure theory). At the very least we must specify the intended interpretation of those axioms. But in fact, Kolmogorov’s axioms are not so obvious that we could not end up interpreting slightly different ones (e.g. the adoption of different axioms would be one way to resolve Humphreys’ paradox, see §8; and also see §10). Furthermore there appears to be no a priori reason why different sorts of probability (e.g. objectively physical probabilities, and subjectively personal ones) should be given the same axiomatization.

So the first thing to note, as we consider the possible interpretations of (something like) Kolmogorov’s axioms, is that the probabilities of interest here are those that would exist only under physical indeterminism—in particular they are not subjective probabilities (arising as a result of our ignorance of, or uncertainty about, initial conditions), such as might arise within statistical mechanics, as I shall reiterate when I address Eagle’s (2004) objections to propensity interpretations, in §9.

Within our physical theories, determinism amounts to the existence of a unique solution to the equations of motion of our mathematical model of the universe—specifying the instantaneous state of our model thereby dictates its states at all other times. Quantum mechanics is an indeterministic theory, and it also appears to be an essentially realistic model of reality. Since my concern is with metaphysics I shall therefore assume that, to a first approximation, physical indeterminism (arising quantum-mechanically) amounts to the possibility that, were the present universe copied exactly, such copies could have non-actual futures.

Now, the reason why we call |f|² a ‘probability’ distribution in the first place is that hypotheses like f are tested statistically, via empirical frequencies, and so frequency interpretations will naturally be considered first (in §2, §3, §5 and §7).

But intuitively the displayed frequencies merely provide evidence for the underlying probability (which need not be a rational number, in quantum mechanics), whereas under the most reasonable frequency interpretations (those in which the probability is identified with a limit frequency) the displayed frequencies could not even provide such evidence. So, since (as you will see) frequencies fail to capture the intended interpretation, propensities are considered (in §4, §6, §8 and §9).

Propensities are relatively obscure, philosophically (e.g. there is even a personalistic interpretation of them, rejected in §6), but we do at least have a simple picture of propensity-theoretic (or neo-classical) probability, as something analogous to the classical proportion inside Laplace’s urn (see below). I shall defend nothing more detailed than that picture in what follows because, although several theories of propensity already exist, the paradox of §10 indicates that such details may well have been premature.

Classically (e.g. Laplace 1995), probability was defined as the ratio of the number of favourable possibilities to the total number of possibilities, where those possibilities were all equally likely to occur.

(Since determinism was assumed, two outcomes were equally likely whenever there was no reason to expect one rather than the other, but given indeterminism we can regard them as equally likely in a more objective way; e.g., were a particle coming into being, with two possibilities for its spin, and were Nature completely indifferent about which one was actualised, then each would have an objective probability of ½.)

Anyway, were the possible outcomes of a coin toss (say, H and T) equally likely, the classical probability of heads, say prob(H), would be ½. The probability of getting two heads in two throws would similarly be prob(HH) = ¼, were all four possibilities (TT, TH, HT, and HH) equally likely. And more generally, if prob(A | B) = prob(A), i.e. if A and B are independent, then prob(A and B) = prob(A)·prob(B); and similarly for the other basic axioms.

The classical approach (and similarly Popper’s neo-classical approach, see §4) also accounts for the empirically important laws of large numbers, via such thought-experiments as Laplace’s urn:

Suppose in an urn a white balls, b black balls, and after having drawn a ball it is put back into the urn; the probability is asked that in n number of draws m white balls and n – m black balls will be drawn. It is clear that the number of cases that may occur at each drawing in a + b. (Laplace 1995: 28)

Since each ball is equally likely to be drawn, the singular probability, say p, of drawing a white ball on the first draw is the number of possibilities for drawing a white ball, i.e. a, divided by that total number, a + b, whence p = a/(a + b).

For 2 draws there are (a + b)² possibilities, and since the draws are independent, the binomial expansion of that number, i.e. (a + b)² = a² + 2ab + b², gives us the desired numerators; e.g. prob(m = n = 2) = a²/(a + b)².

Similarly, for n draws there are (a + b)ⁿ possibilities, whence the required probabilities are ⁿC_m·p^m·(1 – p)ⁿ^{– m} (where ⁿC_m is the binomial coefficient).

Consequently, were n tending towards infinity the frequency of white balls, amongst the drawn balls, would very probably tend towards the internal proportion p (essentially Bernoulli’s theorem).

Laplace (1995: 57) also allowed the basic laws of probability to hold for unfair coins, but unfortunately the classical (unlike the neo-classical) definition of probability could not accommodate them—e.g., if some observations had led us to conjecture that prob(H) = 11/20 then we could deduce that prob(HH) = 121/400, but while a classical model of such an unfair coin might be an urn containing 11 black balls and 9 white balls, the coin itself does not contain 11 equally likely heads.

Still, if prob(H) = 11/20 then, in a long run of tosses, we could expect to find that roughly 11/20 of them were heads (which is why, for propensity-theorists, such an observed frequency would support such a conjecture), whence some philosophers have considered Finite Frequentism, which regards physical probabilities as frequencies within empirical populations. But for example, Hájek (1997) has amassed 15 arguments against such a simplistic definition, some of which will be considered in section 2.