Frequentists usually define prob(A | B) to be the limit, as the number of the Bs tends to infinity, of the ratios that are the number of Bs that are As divided by the number of Bs—that is essentially von Mises’ definition, and also Reichenbach’s (1971: 69; see §5), although incidentally von Mises (1957: 83) thought that Finite Frequentism might also be viable, while only the empirical frequencies had any real meaning for Reichenbach (1971: 348). (Unfortunately much of the literature on probability suffers from such ambiguities, as you will see.)
Essentially, von Mises modelled sufficiently long (and apparently random, see below) empirical sequences with infinite idealised sequences, stipulating that their limits would obey laws of probability that were abstracted from the regularities of empirical statistics. In particular, since it is observed that the frequency of certain attributes often becomes increasingly stable as sample-size increases, he stipulated that the frequencies of the initial sub-sequences of his infinite sequences—which he called ‘collectives’—would indeed tend to limits. I take a collective to be a (random) w-sequence with a limit, which is the usual interpretation, but von Mises was a little unclear—e.g., although he (1957: 64) said that “A collective is an infinite sequence of observations,” he (1957: 12) earlier said, “the throws of dice made in the course of a game form a collective,” and also:
[A] collective is a mass phenomenon or a repetitive event, or, simply, a long sequence of observations for which there are sufficient reasons to believe that the relative frequency of the observed attribute would tend to a fixed limit if the observations were indefinitely continued. This limit will be called the probability of the attribute considered within the given collective. (von Mises 1957: 15)
But if we were given, say, the results of 1,000 throws of some apparently symmetrical and homogeneous die, whose results appear to be randomly distributed between the six alternatives, why would we believe that the relative frequency of getting a six would tend to a fixed limit of 1/6 if the observations were indefinitely continued? We surely have sufficient reasons to believe that the die would eventually wear away, and probably in such a way that the die would become asymmetrical, from time to time (and/or stickier on one side or another, etc.), so that the observed frequencies would, were the observations continued that far, probably not tend towards a fixed limit.
Von Mises might say that the collective only models the observations (much as the idealised bodies of physics model real ones), but not only are there problems even for ideal applications (see below), such a response would raise further questions, e.g. which aspects of our empirical situation are to be emphasised (in the model) and why? Popper’s (1983: 352-6) answers (which take us away from Frequentism, and towards propensities) will be given in §4 (and defended in §5), so for now consider how the absence of principled answers undermines von Mises’ theory.
Fine (1973: 103) notes that, even if we assume that the empirical frequency would converge to a limit, if we are given only the empirical frequency then we do not know how close to it we should expect the limit to be—not without knowing more about the sample’s underlying nature (and allowing a definitive role for that is the essence of a propensity-theoretic approach). Similarly:
Presumably the outcome of a long sequence of actual experiments should be considered an initial or early segment of the collective. But any finite segment is compatible with, and does not give any shred of indication of, any limiting value whatsoever. (Hacking 1965: 5)
The trouble is that no amount of experience can, in the literal terms of the theory, give any indication whatever of the limiting value. Nor, if ‘limiting value’ be taken literally, is there any reason for saying observed proportions even approach a limiting value. (Hacking 1965: 6)
Suppose, for example, that a coin is fairly tossed 1,000 times, yielding 251 heads. While the propensity-theoretic hypothesis that for each throw prob(H) = ¼ would explain those empirical frequencies (as they provide the evidence for it), on von Mises’ theory prob(H) = ¼ just means that each toss is one of a collective with a limit frequency (for H) of ¼. But such a collective might have any initial sub-sequence whatsoever. So, if we are given only that prob(H) = ¼, then we have been given no reason for preferring any of those sub-sequences, and so von Mises’ postulation of prob(H) = ¼ could hardly explain our observations (and why should a randomly selected sub-sequence not contain around 50% heads?).
Furthermore, suppose that the coin and the tossing mechanism are examined, and that on the basis of an enormous amount of (and range of) background experience, it is decided that those tosses were probably fair. Such a supposition is not unreasonable—e.g. the next 999,000 tosses could indicate that prob(H) = ½ by giving us a total of half a million heads. Hacking (1965: 114) notes that we might then prefer the hypothesis that the chances had changed, but surely some further physical evidence might indicate that they had not.
Anyway, the tosses appear to have been fair—and further suppose that no collective corresponds to this new hypothesis, that prob(H) = ½, since the relevant experience might not have included observations of coins like the one in question (e.g. it may have taken the form of a pattern of equations, only some of which were supported by sufficiently long sequences of observations). We therefore have two hypotheses, that prob(H) = ¼ (from observations of the tosses of the coin), and that prob(H) = ½ (from observations of the physics of the coin), and Frequentism gives us an a priori reason to choose the former (without giving us a good explanation of that choice), but to which value will future tosses conform? Since our answer clearly ought to depend upon the evidence somehow, we are again moving in the direction of §4 (cf. Popper 1983: 306).
To pick up a point mentioned earlier, the ideal application of von Mises’ theory would be to an infinite sequence of uniform observations, so consider drawing a ball blindly from Laplace’s urn (which contains an internal proportion p of white balls), noting its colour, returning it to the urn (which is then shaken), and repeating that process endlessly. As we repeatedly draw balls at random, we can expect the empirical frequency, of white balls amongst the drawn balls, to tend to p, by Bernoulli’s theorem (see §1). Still, it is surely possible (if extremely improbable) that the empirical frequencies would tend to a different limit, say f. And certainly, a merely very long sequence of draws might (much less improbably) yield the frequency f. Were we modelling such a sample using von Mises’ theory, we would take the probability (of drawing a white ball) to be f, but nonetheless we might have independent reasons for taking the probability (for every draw of this sample) to be p, e.g. we might have noted the proportion of white balls inside the urn, and we might know that the urn had been shaken thoroughly, etc. If so then, if we were to bet on the next equally blind draw from that urn (after that very long sequence), it would clearly be wise to use p, rather than f, to determine the fair odds.
In short, Frequentism does not allow us to properly utilise all of the available evidence. The solution is to introduce the concept of propensity (in the next section). But first (to round-off this section), note that collectives are also random, in the sense that (1957: 24-5) their limits “must remain the same in all partial [and infinite] sequences which may be selected from the original one in an arbitrary way” (a stipulation that corresponds to the observation of the impossibility of gambling systems).
However, surely our empirical observations could only indicate the unreliability of gambling systems, which could be realistically explained by the uncertainty that attaches to each single case. Furthermore since, although no finite sequence could exhibit von Mises’ ideal randomness, any finite sequence could be extended into an infinite sequence that did exhibit it, so von Mises would presumably model with a collective any empirical sequence that seemed sufficiently random—but not only is that a vague notion, it is surely possible (combinatorially) that selections made randomly (i.e. by chance) might not be particularly random-looking.
Frequentism is no more plausible without a criterion of randomness, however, because our concept of probability only exists because we have seen things that appeared to be random. Intuitively, randomness is evidence for an underlying (and logically prior) chanciness, much as the observed frequencies are evidence for the underlying probabilities. (Incidentally, the two are empirically related because, as Fine (1973: 93) notes, finite sequences that appear random are almost certain to appear to have convergent frequencies.)
And incidentally, we also learn from empirical statistics that we need variety within the collection, in order to avoid biased samples, and yet von Mises is silent on that point. Presumably samples should vary like the wider population, which contains those elements whose properties will be predicted via the statistics; but since that population usually includes future occurrences, which are unavailable empirically, hence the causes of our observations would need to be addressed more directly, e.g. as follows—section 4.