Bayes

Bayesian Probability Theory and Quantum Mechanics

John Baez

September 12, 2003

It's not at all easy to define the concept of probability. If you ask most people, a coin has probability 1/2 to land heads up if when you flip it a large number of times, it lands heads up close to half the time. But this is fatally vague!

After all what counts as a "large number" of times? And what does "close to half" mean? If we don't define these concepts precisely, the above definition is useless for actually deciding when a coin has probability 1/2 to land heads up!

Say we start flipping a coin and it keeps landing heads up, as in the play Rosencrantz and Guildenstern are Dead by Tom Stoppard. How many times does it need to land heads up before we decide that this is not happening with probability 1/2? Five? Ten? A thousand? A million?

This question has no good answer. There's no definite point at which we become sure the probability is something other than 1/2. Instead, we gradually become convinced that the probability is higher. It seems ever more likely that something is amiss. But, at any point we could turn out to be wrong. We could have been the victims of an improbable fluke.

Note the words "likely" and "improbable". We're starting to use concepts from probability theory - and yet we are in the middle of trying to define probability! Very odd. Suspiciously circular.

Some people try to get around this as follows. They say the coin has probability 1/2 of landing heads up if over an infinite number of flips it lands heads up half the time. There's one big problem, though: this criterion is useless in practice, because we can never flip a coin an infinite number of times!

Ultimately, one has to face the fact that probability cannot be usefully defined in terms of the frequency of occurence of some event over a large (or infinite) number of trials. In the jargon of probability theory, the frequentist interpretation of probability is wrong.

Note: I'm not saying probability has nothing to do with frequency. Indeed, they're deeply related! All I'm saying is that we can't usefully define probability solely in terms of frequency.

If you're not convinced yet, consider a statement like this: "Mr. X has a 60% chance of winning the next presidential election". There is no way to determine this by holding the next presidential election a large number of times and checking that Mr. X wins about 60% of the time. Nonetheless, I claim this statement is meaningful. If you don't believe me, argue with the British bookies who post odds and take bets on such events. They make their living by doing this!

Carefully examining such situations, we are lead to the Bayesian interpretation of probability.

The basic idea behind the Bayesian interpretation is that probability is not something we start by measuring. Instead, we must start by assuming some probabilities. Then we can use these to calculate how likely various events are. Then we can do experiments to see what actually happens. Finally, we can use this new data to update our assumptions - and the Bayesian interpretation gives some recipes for doing this. But everything starts by assuming some probabilities at the start. This is called the "prior probability distribution" or prior for short.

Subjective Bayesians argue that ones choice of prior is unavoidably subjective. Objective Bayesians try to find rules for choosing the "right" prior. Personally I don't think there's a serious conflict here. The choice of prior is subjective, but in some situations there are nice rules to help one choose it. For example, in a situation where your evidence suggests that your coin has a symmetry - the two sides don't seem very different - you can use this to pick a prior which says the chance of it landing heads up equals that of it landing tails up. Of course you could be neglecting the fact that some sneaky guy weighted the coin and it just looks symmetrical. That's life.

What follows is a collection of nice posts on the Bayesian interpretation of probability and its relevance to quantum theory. It turns out that a lot of arguments about the interpretation of quantum theory are at least partially arguments about the meaning of the probability! For example, suppose you have an electron in a state where the probability to measure its spin being "up" along the z axis is 50%. Then you measure its spin and find it is indeed up. The probability now jumps to 100%. What has happened? Did we "collapse the wavefunction" of the electron by means of some mysterious physical process? Or did we just change our prior based on new information? Bayesianism suggests the latter. This seems to imply that the "wavefunction" of a specific individual electron is just a summary of our assumptions about it, not something we can ever measure. Some people find this infuriating; I find it explains a lot of things that otherwise seem mysterious.

There are a lot of tricky issues here. Quantum theory is more general than classical probability theory. It presents a lot of new puzzles of its own. But, at the very least, we need a clear understanding of what "probability" means before we can tackle these quantum quandaries.

I believe the frequentist interpretation just isn't good enough for understanding the role of probability in quantum theory. This is especially clear in quantum cosmology, where we apply quantum theory to the entire universe. We can't prepare a large number of identical copies of the whole universe to run experiments on!

Besides reading the following posts, you should also look at the Stanford Encyclopedia of Philosophy's article on interpretations of probability theory, especially its comparison of the frequentist and subjective Bayesian interpretations. For more practical aspects of Bayesian statistics, also see Bill Jeffreys' course notes.

From: baez@guitar.ucr.edu (john baez)
Newsgroups: sci.physics
Subject: Re: Many-Worlds FAQ
Date: 8 Nov 1994 20:22:36 GMT
Organization: University of California, Riverside
Message-ID: <39ommc$lhj@galaxy.ucr.edu>

In article <1994Nov8.153821.9429@oracorp.com> daryl@oracorp.com (Daryl McCullough) writes:

>baez@guitar.ucr.edu (john baez) writes:
 
>>People sometimes feel like retroactively projecting the wavefunction
>>down onto the component that fits their observation, *after* they've
>>made the observation.  Any attempt to do this sort of thing is, IMHO,
>>a confused version of what you really are doing when applying quantum
>>mechanics correctly, namely, to *first* model all your assumptions
>>about the state of the universe by a wavefunction, and *then* compute
>>probabilities with that wavefunction.
>>Doing this systematically gives all the right answers to quantum
>>mechanics problems, so there's no need to do anything else.
 
>The "collapsarian" style for calculating amplitudes is to use one's
>current observations to get an up-to-the-minute best fit wave
>function. The Everett style seems to be to fix a wave function once
>and for all, and *never* change it in light of observations.

This is a really crucial issue, and it's probably due to long discussions with Daryl that I evolved my current position on this issue. I do *not* side with the - no doubt mythological - "Everettistas" described below:

>Here is a sample conversation between two Everettistas, who have fallen
>from a plane and are hurtling towards the ground without parachutes:
 
>    Mike: What do you think our chances of survival are?
 
>    Ron: Don't worry, they're really good. In the vast majority of
>         possible worlds, we didn't even take this plane trip.

Note that anyone who acted that way would be silly, and that their error would have little to do with QUANTUM MECHANICS, but mainly with PROBABILITY THEORY. Probability theory is the special case of quantum mechanics in which ones algebra of observables is commutative. (This becomes a theorem in the context of C*-algebra theory.) Quantum mechanics has special features due to the noncommutativity, but probability theory already exhibits certain subtleties of interpretation, which I believe are a large part of what's at stake here.

Part of the point of Bayesianism is that you start with a "prior" probability measure. Let me just call this the "prior" - I think Bayesians have some bit of jargon like this. (I wish some expert on Bayesianism would step in here and give a 3-paragraph description of its tenets, since I feel unqualified.) In any event, when I say you "*first* model all your assumptions about the state of the universe by a wavefunction, and *then* compute probabilities with that wavefunction," I mean that the wavefunction plays the role of the "prior".

Bayesianism is called "subjective" in that it applies no matter how you get your prior. In other words, you could be a pessimist and wake up in the morning assuming that sometime today a nuclear attack will devastate your town, and constantly be surprised as each hour goes by without an attack. This might or might not be smart, but if you are a good Bayesian you can correctly compute probabilities assuming this prior.

Similarly, you could do good experiments, crappy experiments, or no experiments, and guess a wavefunction based on what you know, guess, or hope, but if you know the rules of quantum mechanics, you can compute probabilities correctly *assuming* the wavefunction. You could even find the wavefunction written on a crumpled-up piece of paper in my trashcan! No problem!

When you compute probabilities, however, you don't just compute "straight" probabilities with respect to the prior, you also compute *conditional* probabilities. I.e., even if you are the above pessimist, you can consider the probability that you'd enjoy a fine afterdinner brandy *given* that the nuclear attack hasn't occured yet. You might think there is only a 5% chance that you will be alive at this point, but still think that *given* that the attack hasn't occured, there is a 95% chance that some brandy would be enjoyable.

So suppose Ron and Mike use as their prior some wavefunction based on (approximate) measurements they did of the positions and velocities of all the elementary particles in the world on Tuesday. When they are falling out of the plane on Wednesday, they *could* use the prior to compute the probability that they actually took that plane trip. It might indeed be very low. However, they might find another calculation infinitely more interesting: namely, the conditional probability that they will survive, *given* that they took the trip and fell out of the plane.

Note: if you want, you can think of this process of switching from computing probabilities using the prior to computing conditional probabilities as a mysterious PHYSICAL PROCESS - the "collapse of the wavefunction". This would be wrongheaded, because in fact it is simply a change on *your* part of what you want to compute! If you think of it as a physical process you will be very mystified about things like when and how it occurred!

As I said once upon a time, we can imagine a sleepy physics student attending a physics lecture. At the beginning of the class the professor is working out a problem in which an object has velocity v = 0 at time t = 0. At this point the student drifts off to sleep. Later, he wakes up and finds the professor working out a TOTALLY DIFFERENT PROBLEM in which an object has velocity v = 1 at t = 1. The student doesn't realize that he has been asleep for half and hour, and so he raises his hand and asks "Professor!! At what time t did the acceleration occur??" Asking when the wavefunction collapses is like this.

Now I admit that this view of quantum mechanics takes a while to get used to. In particular, there really *are* issues where quantum mechanics is funnier than classical probability theory. In classical probability theory there are pure states in which *all* observables have definite values. Subconsciously we expect this in quantum mechanics, even though it's not so. So we always want to ask what's "REALLY going on" in quantum mechanics - meaning that we secretly yearn for a wavefunction that is an eigenstate of all observables. If we had such a thing, and we used *it* as a prior, we wouldn't need to worry much about conditional probabilities and the like. But alas there is no such thing, as far as we can tell.

From: youssef@ibm7.scri.fsu.edu (Saul Youssef)
Newsgroups: sci.physics
Subject: Re: Many-Worlds FAQ
Date: 11 Nov 1994 06:43:01 GMT
Organization: Supercomputer Computations Research Institute

John Baez writes:

This is a really crucial issue, and it's probably due to long
discussions with Daryl that I evolved my current position on this issue.
I do *not* side with the - no doubt mythological - "Everettistas"
described below:

>Here is a sample conversation between two Everettistas, who have fallen
>from a plane and are hurtling towards the ground without parachutes:
>
>    Mike: What do you think our chances of survival are?
>
>    Ron: Don't worry, they're really good. In the vast majority of
>         possible worlds, we didn't even take this plane trip.

Part of the point of Bayesianism is that you start with a "prior"
probability measure.  Let me just call this the "prior" - I think
Bayesians have some bit of jargon like this.  (I wish some expert on
Bayesianism would step in here and give a 3-paragraph description of its
tenets, since I feel unqualified.)

:-). I have found this point of view to be very helpful for getting a better understanding of quantum mechanics and even for understanding why people argue about it so much. In fact, the spectrum of interpretations in quantum mechanics has a close analogue in probability theory. The "wave function is real" view is analogous to the "frequentist" view of probability theory where probabilities describe "random pheonomena" like rolling dice or radioactive decays and the "wave function represents what you know about the system" view is analogous to the Bayesian view where probability is just a consistent way of assigning liklihoods to propositions independent of whether they have anything to do with a "random process." Just as in quantum mechanics, arguments have raged for many (more than 100) years without any real resolution and, just as in quantum mehcanics, when the two camps actually solve the same problem, the mathematics is basically the same. A typical example of this sort of disagreement is Laplace's successful calculation of the probability that Jupiter's mass is within some interval. To a frequentist, the mass of Jupiter is a number. Admittedly, this number is unknown, but it is definitely not a random variable (since there is no "random process changing Jupiter's mass") and so it is utter nonsense to talk about it's p.d.f. This may seem like a silly kind of disagreement, but the consequences in terms of what problems can be solved and in terms of understanding what probability theory is all about couldn't be greater.

Most people are more familiar with the frequentist view where you say that if you perform a "random experiment" N times with n successes then the "probability of success" is the large N limit of n/N. You then assume that these probabilities obey Kolmogorov's axioms, and you're all set. The rest of probability theory is solving harder and harder problems. The Bayesian view of things is a bit different and starts this way. Suppose that we want to attach a non-negative real number to pairs of propositions (a,b) and this number is supposed to somehow reflect how likely it is that "b" is true if "a" is known. Let me write this number as a->b [*]. For "->" to be a useful likelihood measure one expects a few modest things of it. For example, if you know a->b, this should determine a->.not.b and the procedure to get from a->b to a->.not.b shouldn't depend on "a" or "b." It turns out that this and just a little bit more is enough to entirely fix probability theory, as shown in an obscure paper by Cox in Am.J.Phys. in 1946. One gets

(a -> b.and.c) = (a -> b)(a.and.b -> c) (a -> b) + (a -> .not.b) = 1 (a -> .not.a) = 0

which is the Bayesian form of probability theory. You can then trivially show (Bayes Theorem) that

(a.and.b -> c) = (a->c) {(a.and.c -> b)/(a->b)}

if (a->b) is nonzero. This is often used in the following context:

a = "stuff that you know"

b = "more stuff that you found out"

c = "something that you're interested in"

Then if you already know (a->c), Bayes theorem tells you how to find the probability that c is true, given your additional knowledge b, i.e. (a.and.b -> c). For example, suppose that you happen to know that the behavior of a random variable x obeys one of a family of pdfs f(x,t) where t is some unknown parameter. Given a sample of independent x values X = (x1,x2,...,xn), what can you say about t? Using Bayes theorem, it's easy like pie. If "e" is the initial knowledge of the experiment as just described, then you want to calculate

(e.and.X -> t) = (e->t) {(e.and.t->X)/(e->x)}

Here (e->t) is called the "prior" probability that the true pdf is f(.,t). If we have no reason to prefer one value of t over another, we can use the "uniform prior" (e->t) = const. Then, since (e.and.t->X) = (e.and.t->x1.and.x2.and.x3...xn) = f(x1,t)f(x2,t)...f(xn,t),

(e.and.X -> t) = const. Prod{j=1,n} f(xj,t)

and you're done. This is usually called the likelihood method. There are more sophisticated methods for chosing priors in various situations (e.g. "Maximum Entropy") but the basic idea is the same.

So far, I have left out one important point. In the frequentist view of probability you start of assuming that probabilities have a particular frequency meaning. In the Bayesian view, this must be derived by considering copies of a single experiment and considering the probability that n/N of them have success. You can then get the standard frequency meaning of probabilities provided that you assume (roughly) that probability zero events don't happen.

Note that because of this frequency meaning, probability theory is not just a piece of mathematics. It is really a physical theory about the world which might or might not be correct. From this point of view, it it tempting to try to explain quantum phenomena by modifying probability theory. As far as I can tell, this idea actually works and has more consequences than "just another interpretation" of quantum mechanics.

Bayesianism is called "subjective" in that it applies no matter how you
get your prior.  In other words, you could be a pessimist and wake up in
the morning assuming that sometime today a nuclear attack will
devastate your town, and constantly be surprised as each hour goes by
without an attack.  This might or might not be smart, but if you are a
good Bayesian you can correctly compute probabilities assuming this
prior.

That's right. Of course, you can get the wrong answer if you have the wrong prior, but this is viewed as a progress! From the Bayesian point of view, science progresses by finding out that your prior isn't working. For example, your prior may include a physical theory that is wrong.

When you compute probabilities, however, you don't just compute
"straight" probabilities with respect to the prior, you also compute
*conditional* probabilities.

In the Bayesian view, all probabilities are conditional since they all depend on what you know. This is also true in Kolmogorov's system but only within a fixed sample space.

Note: if you want, you can think of this process of switching from
computing probabilities using the prior to computing conditional
probabilities as a mysterious PHYSICAL PROCESS - the "collapse of the
wavefunction".  This would be wrongheaded, because in fact it is simply
a change on *your* part of what you want to compute!  If you think of
it as a physical process you will be very mystified about things like
when and how it occurred!

Yes! As I've said, probably too many times, it's like wondering what physical process causes a probability distribution to "collapse" when you flip a coin.

Note that anyone who acted that way would be silly, and that their error
would have little to do with QUANTUM MECHANICS, but mainly with
PROBABILITY THEORY.  Probability theory is the special case of quantum
mechanics in which ones algebra of observables is commutative.  (This
becomes a theorem in the context of C*-algebra theory.)

Could you post or email me the reference for this theorem?

Now I admit that this view of quantum mechanics takes a while to get
used to.  In particular, there really *are* issues where quantum
mechanics is funnier than classical probability theory.  In classical
probability theory there are pure states in which *all* observables have
definite values.

Notice that a statement like: the coin is in "state" (1/2,1/2) would be very bad language from the Bayesian point of view since (1/2,1/2) represents what you know and not some physical property of the coin. One of the reasons that this point of view "takes getting used to" in quantum mechanics is that the language of standard quantum theory constantly reinforces the idea that Psi is the "state of the system."

Subconsciously we expect this in quantum mechanics,
even though it's not so.  So we always want to ask what's "REALLY going
on" in quantum mechanics - meaning that we secretly yearn for a
wavefunction that is an eigenstate of all observables.  If we had such a
thing, and we used *it* as a prior, we wouldn't need to worry much about
conditional probabilities and the like.  But alas there is no such
thing, as far as we can tell.

That would be like yearning for the "true probability distribution" for coin flippage. But the fact that there isn't any such thing independent of your state of knowledge doesn't mean that their isn't something REALLY going on (e.g. a REAL copper penny being flipped by a real human being). In spite of non-commuting observables and Bell's theorem and all it's variations, I don't think that it has quite been shown that there can't be something "REALLY going on", as you say.

By the way, Ed Jaynes is writing a book on Bayesian probability theory which is easily readable by undergraduates. For some reason, it's current draft is available on www at:

http://www.math.albany.edu:8008/JaynesBook.html

Jaynes is a very interesting guy and his stuff is always worth paying attention to.

[*] (a->b) is often written P(b|a).

From: youssef@d0sgi6.fnal.gov (Saul Youssef)
Newsgroups: sci.physics
Subject: Re: Many-Worlds FAQ
Date: 20 Nov 1994 19:03:25 GMT
Organization: Fermi National Accelerator Laboratory, Batavia IL
Message-ID: <3ao6ht$jrk@fnnews.fnal.gov>

John Baez writes:

The Bayesian approach to this question - when is a
prior distribution "right"? - seems to me to be the most
clear-headed one... though I still have a lot more to learn about what
the Bayesians actually say, and the different flavors of Bayesianism.

I'm interested in flavors too and especially if someone could comment on how the Jaynesian version is related to the rest of the field or to add to my list of references:

"Maximum Entropy and Bayesian Methods", ed. J.Skilling, Kluwer, 1988 [this is one of a series of conferences]

"Physics and Probability", Essays in honor of E.T.Jaynes, ed: W.T.Grandy, and P.W. Milonni, Cambridge, 1993.

"Probability Theory - the Logic of Science," E.T.Jaynes, http://www.math.albany.edu:8008/JaynesBook.html

From: bill@clyde.as.utexas.edu (William H. Jefferys)
Newsgroups: sci.physics
Subject: Re: Many-Worlds FAQ
Date: 21 Nov 1994 15:12:06 GMT
Organization: McDonald Observatory, University of Texas @ Austin
Message-ID: <3aqdc6$22j@geraldo.cc.utexas.edu>

In article <3ao6ht$jrk@fnnews.fnal.gov>, Saul Youssef <youssef@d0sgi6.fnal.gov> wrote:

John Baez writes:

>The Bayesian approach to this question - when is a
>prior distribution "right"? - seems to me to be the most
>clear-headed one... though I still have a lot more to learn about what
>the Bayesians actually say, and the different flavors of Bayesianism.

I'm interested in flavors too and especially if someone could comment on
how the Jaynesian version is related to the rest of the field or to add
to my list of references:

The late Harold Jeffreys was a proponent of the "objectivist" school; Jaynes frequently mentions Jeffreys in his writings. Jeffreys wrote a treatise, _Theory of Probability_, which is sometimes in print (Oxford), in which he sets out his views.

Jack Good (I.J. Good) discusses the classification of Bayesians in an article reprinted in his book, _Good Thoughts_.

Bill

From: KJM@mfs1.ballarat.edu.au (Kevin Moore)
Newsgroups: sci.physics
Subject: Re: Bayesianism and QM... REALLY!
Date: Mon, 28 Nov 1994 15:47:16
Organization: University of Ballarat
Message-ID: <KJM.185.000FCA23@mfs1.ballarat.edu.au>

In article <1994Nov23.022750.27712@oracorp.com> daryl@oracorp.com (Daryl McCullough) writes:

[deletia]

It is this insistence on a unified, "natural" approach that I don't
like. The impression I get (and I guess I don't really know enough
about it to make any strong claims) is that once you've decided on
your subjective prior distribution(s), from then on, you just turn the
crank. I'd rather have the privilege of, at any time, deciding to toss
out my old idea of what the probabilities are, and use instead a new
(possibly unrelated) set of probabilities. I don't want to be told
how my probabilistic guesstimates are supposed to change with time.

It's true: Bayesianism requires you to update your probabilities every time you incorporate new data, and do so in a manner consistent with the data. Why is this seen by frequentists as a Bad Thing? Because of the different interpretation Bayesians and Frequentists put on the word "probability".

[deletia]

Maybe Bayesianism doesn't dictate what *prior* to use (although I
thought that some Bayesians insist that you use the maximum entropy
principle).

Me, for one.

>But it seems to me that they *do* dictate how my
>probability distributions can change with time.

Indeed, but at this point, you should ask what it is that you mean by probability? Do you interpret it as the degree of belief in a proposition, as the expected frequency of occurrence of an event in an infinite sequence of trials, or some loose mish-mash of the two?

Maybe I'm wrong about this, but if so, then what *does* Bayesianism do?
What is the "Bayesian approach", if it doesn't in any way
constrain how one does probability? It just tells you what kind of
attitude you should have towards probability?

Bayesianism uses the first of the above interpretations of probability, using Cox's theorem as validation, although the approach dates much further back, to the days of Laplace. From your comments objecting to being told how your probabilities should change with time (!) I suspect you use the second interpretation, which puts you firmly in the frequentist camp.

In a reasonable world, there is room for both approaches, provided people are clear about what they are calculating. The problems occur most often when people try to do things using a frequentist approach to data analysis which simply isn't appropriate. For further reading on the subject, I suggest Berger and Sellke's paper on the subject a few years ago. (Journal of the American Statistical Association, March 1987, Vol 82, No 397, 112 ff) A most stimulating exchange of views followed it. Some of which confirm that we don't live in a reasonable world.

Cheers,

Kevin Moore

kjm@ballarat.edu.au

Nice book: [1] Peter Walley, Statistical Reasoning with Imprecise Probabilities.

From: bweiner@electron.rutgers.edu (Ben Weiner)
Newsgroups: sci.physics
Subject: Re: Bayesianism and QM... REALLY!
Date: 29 Nov 1994 01:44:43 -0500
Organization: Rutgers University
Message-ID: <3beikr$rr4@electron.rutgers.edu>

daryl@oracorp.com (Daryl McCullough) writes:

It is this insistence on a unified, "natural" approach that I don't
like. The impression I get (and I guess I don't really know enough
about it to make any strong claims) is that once you've decided on
your subjective prior distribution(s), from then on, you just turn the
crank.

Pretty much. You assume a prior distribution and calculate the probability of the distribution given the data (well, you often calculate the posterior distribution, i.e. the probability distribution given your prejudices (your prior) and the data).

I'd rather have the privilege of, at any time, deciding to toss
out my old idea of what the probabilities are, and use instead a new
(possibly unrelated) set of probabilities. I don't want to be told
how my probabilistic guesstimates are supposed to change with time.

To quote somebody else (John Baez?) "Huh?" The change-with-time part seems to be simply the rules for how the addition of new data changes your posterior distribution. If you believe in what you're doing, then acquiring new data does change your beliefs. It should, after all, if you get enough data you might find out your prejudices (your prior) were wrong. Looking at the math you find that if you have enough data it doesn't matter what you take for the prior, which is comforting.

Of course, if you like, you can change your mind and substitute a new prior distribution, and recalculate. If that is what you want we've got it! Call now to order ...

[John Baez wrote:]

>Huh?  As Bill Taylor explained, the whole point of Bayesianism, or what
>he more descriptively calls "universal subjective priorism", is that 
>there's no point to saying your subjective probabilities are "wrong"!
>You seem in fact to be arguing FOR Bayesianism in what you just wrote.
>This was the whole point of my parables in which one is told ones prior
>distribution by a little bird, or a fox.  Given the prior one can
>compute probabilities, but without a prior one can't compute the
>probability that someone else's predictions are right or not, so one
>can't "judge" a prior without reference to some other prior.

Maybe Bayesianism doesn't dictate what *prior* to use (although I
thought that some Bayesians insist that you use the maximum entropy
principle). But it seems to me that they *do* dictate how my
probability distributions can change with time.

Only in the sense that it dictates how to calculate the posterior distribution given the prior distribution and the data. This part is plain arithmetic and is generally not the controversial part of Bayesianism (that is even an anti-Bayesian would say well if you must do it, do it that way). The anti-Bayesian critique usually centers on the subjectivity inherent in the choice of prior. Bayesians counter this with the "at least we have our prejudices up front" argument. (An example: using chi-squared to fit a multi-parameter model is equivalent to Bayesian inference on data with gaussian noise and uniform priors for the parameters.)

Maybe I'm wrong about
this, but if so, then what *does* Bayesianism do? What is the
"Bayesian approach", if it doesn't in any way constrain how one does
probability? It just tells you what kind of attitude you should have
towards probability?

Oddly enough, that change of attitude can change how you calculate, in certain examples. A common example is that if you use classical methods to fit for a parameter from the data, you are generally assuming a uniform prior on the parameter, i.e. 0 is as likely as any other value. If you don't know a priori that the effect represented by the parameter exists, that is you want to discriminate between the null hypothesis and "the parameter exists with value X," maybe that is the wrong prior to use. Maybe you should go Bayesian and put half the prior probability on the null hypothesis, parameter = 0, and distribute the rest as you see fit over nonzero values.

This can have a big effect on the results: data which show that a parameter is statistically significantly different from zero with a classical analysis can be consistent with the nonexistence of the parameter in a Bayesian analysis. Philip Morrison claims that this may be responsible for much of the hoo-ha over the ephemeral "fifth force" and the retired Princeton professor who thinks he's shown that ESP exists. (Take that, S*rf*tti!)

Bayesian analysis also makes your life infinitely simpler, in the sense that you don't have to run around remembering a zillion different classical-statistical formulae for the case of normal distribution with known mean and unknown variance, unknown mean and known variance, and so on.

All this Bayesian stuff used to intimidate me until I got hold of a good introductory book and worked out a few examples, then I realized it was actually easier than the other statistics. By now there are a number of sources for learning about it: I recommend Peter Lee's "Bayesian Statistics: An Introduction." (Oxford U.P.)

From: wft@math.canterbury.ac.nz (Bill Taylor)
Newsgroups: sci.physics
Subject: Re: Bayesianism and QM.
Date: 30 Nov 1994 06:08:27 GMT
Organization: Department of Mathematics, University of Canterbury
Message-ID: <3bh4sr$6ik@cantua.canterbury.ac.nz>

The most recent post, an excellent one by Ben Weiner, makes a good point more elegantly than I had done earlier...

Bayesian analysis also makes your life infinitely simpler, in the
sense that you don't have to run around remembering a zillion
different classical-statistical formulae for the case of normal
distribution with known mean and unknown variance, unknown mean and
known variance, and so on.

Quite so; this uniformity of approach is one of the most appealing things about it. And also is the simplicity of getting results; either numerically with hardware, or theoretically (when conjugate priors are used, though this smacks of objective Bayesianism, see below!)

But Oh Dear! Ben, I was sorry to see you using "infinitely" there, when you just mean "very much"! Surely the media-man-in-the-street hasn't corrupted your linguistic habits so terribly! You know how we *loathe* these non-technical "infinity" uses; hardly admissible hyperbole! ;-)

Many others have made good comments on this ongoing thread, but I'll just add a few answers to some outstanding points.

Firstly to deal with a couple of irritants...

Mike Price:

and it is not apparent (to me, anyhow) that the assumption of
probability isn't already lurking somewhere under a stone, in your
definition, even excluding the explicit use of "average".

Well this is surely no problem. There are plenty of places where "averages" are taken (i.e. arithmetic means), that have nothing to do with probability. Saul Youssef:

>about as useful as speaking of negative natural numbers!

That's an odd point to make since negative numbers are very useful!

You made this comment in another thread a while ago, Saul, very similarly. You'll note I did say negative *natural* numbers; not just negative numbers. Sure negative numbers are useful; but negative natural numbers, like negative probabilities, are about as useful as square circles.

However, all was forgiven, Saul, when I saw this..

it's those annoying Ayn Randians from news groups outside of our own
galaxy who are really getting to me...

<snort.> Right on; they really are from another universe, where greed masquerading as conscience is considered OK!

More seriously though, I was struck by this comment from John Baez:

You can make observations and use them to GUESS a probability
distribution, but this has an irreducible element of subjectivity in
it.

Highlighting the word "guess" seems a little unwise; as the use of this word suggests you vaguely feel there's *still* a *real* prior out there somewhere; one you can never know but try to come close to. As we've all insisted, this is totally wrong thinking; the true-blue (or subjective) Bayesian can make no sense of this idea. Your prior isn't a guess at anything, it's just your own quantification of your own uncertainty.

And this connects to another point, well emphasized by Saul:

Many people think of probability theory as a something applying to
problems that have "random" elements like dice, roulette wheels or
radioactive decays.  The main insight of the Bayesians is that [it has]
nothing in particular to do with "random" processes (whatever they
are) and apply in a vastly greater domain.

Exactly so; as many others have also observed. Bayesian ideas are used in any situation where there is *uncertainty*, whether of a probabilistic nature or otherwise.

I have a nice anecdote about this. Dennis Lindley was giving a talk to us; and started at a just-freshly-cleaned whiteboard, by writing up in large caps the word UNCERTAINTY. He turned to his notes, fiddled with them for a second or two, and turned back to the board to start talking - when he checked suddenly, amidst scattered chuckles. The ink in UNCERTAINTY had melted into the residual dampness on the whiteboard, and produced a delightfully fuzzy splodge that could still just be read as "uncertainty", but with overtones of Heiseneberg-like clouds of ink particle probabilities. Very apt!! Dennis said he must try and get it set up like that for any future talks...

---

Now there's a whole sub-thread stemming from this plaint by Daryl McCullough:

I don't want to be told
how my probabilistic guesstimates are supposed to change with time.

This has already been answered well:- it's the natural probabilistic way they ought to change, if you have any probability models at all. But I can see how you might still feel grumpy about this. Why shouldn't you go back and change your prior if it looks like the subsequent data a making it look *really* stupid!

This is tough to answer. For one thing, if your prior was so silly as to have zero probabilities in it, (or zero-density intervals, in the continuous case), then you may *have* to. F'rinstance, if you declared that there was *zero* prior chance of a six turning up on a dice - but then a six *did* turn up; well, you're completely stuffed! You just have to go back and start again without the silly zeros. And it'd be much the same if you had the prior not quite zero but about 10^(-35). It'd still take billions of sixes turning up before you'd posteriorly admit there was a reasonable chance of getting some sixes. Clearly that was a silly prior. (Not *wrong*, note, just silly; even by your own standards.) The Bayesian, like anyone else, has to use some common sense and start with reasonable priors that admit a fair chance to anything that could remotely happen. For a coin, for instance, one would have most of the probability peaked near .5, but with reasonable amounts smeared out toward 0 and 1, with perhaps some small (but not invisible) probability masses at 0 and 1; to allow for the outside chance of a two-header.

But I think Daryl might still have a complaint. Even after all this, he may still feel it sensible to be able to go back and change your prior *after* having seen some data. But I think this would be (reasonably) regarded as fairly irrational by a Bayesian. Of course, as Paul Budnik says:

However it is well known that
people do not make rational decisions about betting. That is why it is
possible to win at the race track if you do enough homework and have
enough self discipline.

Quite so! Though the way to win off this irrationality is to be the bookie! The thing is - the scientist is supposed to have got a bit beyond this kind of "hot streak" irrational thinking; and the Bayesian is just saying that a common-sense prior and Bayes-theorem posteriors is the way to do it.

Otherwise, we could have this scenario. Daryl is trying to determine a coin's propensity for coming up heads. He starts with a prior with a peak tight around 0.5 . Then the first three tosses come up tails, so he goes back and changes his prior to one much more spread out on the low end; but then 7 of the next 8 tosses come up heads, so he goes back and starts with a new prior spread out more on the upper side; then the next few tosses...

Clearly this is an extreme example. But I hope Daryl, you might agree that your desire to "go back and change the prior" would probably be essentially like this, even in messier situations. What you're really doing is (mentally) estimating posteriors from some (perhaps uncrytallized) subjective prior.

It may be, of course, that Daryl's real worries are as in his comment:

I thought that some Bayesians insist that you use the maximum entropy principle

Saul dealt with this - it *may* be useful sometimes to do this, but (the true-blue Bayesian would say) *only* if you think it is! No-one says you have to; except the red-hot objective Bayesian, who may be the nearest to a religious nutter in this debate. There is one big problem with "objective priors", which usually boil down to uniform - or as near as one can get to it when strict uniform would be "improper". John Baez was obliquely referring to it, I think:

any computation of probabilities is a computation w.r.t. some probability
measure [...]

entropy is defined relative to a prior distribution [...]

This is a variant of Laplace's paradox, about which much has been written. I've not read it myself; Jim Berger is an objective Bayesian, I gather, and when he was talking here I tried to get him to expound on it, but he skirted away from it. The trouble is, as Laplace first observed, that a unifrom prior is no longer uniform if you just re-parametrize the underlying "observation space" in some non-linear way. An objectivist may insist that there is usually only one "natural" way to do the parametrization, but this is far from clear! In estimating the spread in a population, is sigma-squared (the variance) or sigma (the standard deviation) more natural? It could make a substantial difference when you assign your "natural" objective prior.

That example is not a great one, but multivariate situations are notoriously susceptible to changes in parametrization. And there could be worse, if John's remarks above are extended in an obvious mathematical way:- it may even be that the "natural" parametrization is (measure-theoretically) singular with respect to the "obvious" one! Like a "thick Cantor" distribution. This would play havoc with things if you started with the "wrong" objective prior. This may seem mere silliness and nit-pickery, but maybe not. I have this haunting recollection (i.e. no hint of references!), of having seen a paper about the orbits of asteroids. (Falsely reminiscent of a classic paper of I.J.Good on Bode's law!) Apparently the gravitational resonance of Jupiter *not only* tends to judder them into bands where the orbital frequency is a simple ratio of Jupiter's, *but also* has the effect that the perturbational width of the band surrounding frequency p/q is of order 1/q^2. So when all these are unioned then intersected the result is a Cantor set of bands of positive measure! Well obviously I've recalled it all awry - but it was something like that, anyway. I remember it was the only case of a Cantor set I'd ever seen that looked remotely natural. Sorry for the vague nature of this example - maybe it rings a bell with someone else?

------

Both Saul Youssef and Daryl McCullough veered onto another sub-thread that I approached above:- ultra-tiny probabilities, and how to handle them.

Daryl:

The only meaning that can be given to the claim
that "heads and tails each have probability 1/2" is in terms of the
limits of infinite runs of coin-flips - it is roughly equivalent to
saying that the probability of an infinite run for which the relative
frequency of the occurrence of heads is not 1/2 is zero.  However,
that simply makes the meaning of "probability 1/2" dependent on the
meaning of "probability 0". But what is the meaning of *that*?

It doesn't really seem to have much to do with coherence, as you suggest. I don't think this *is* a problem for the Bayesian, though it may be a concern for the frequentist.

Saul:

Then the probability that n/N successes are observed is binomial and in
the large N limit concentrated around n/N=p.  This means nothing, yet,
of course, unless we add the assumption (roughly) that "probability
zero events don't happen.

Exactly so. This gets to the heart of what seems to me to be the *only* proper way to do frequentist statistics - well anyway it's the way I do it myself in the only context where I personally do statistics, i.e. testing the effectiveness of my own card-shuffling methods!

But there is a serious point. I had a colleague who used to say that when a client came to you with a staistical problem, he wanted an answer couched in similar terms to the way he presented it - and these would typically NOT include the words probability, likelihood, or whatever. He wants something more definite. Now such a client may be thought unreasonable; but I'm not so sure. Let's consider an easy example.

There is an observation taken from a probability distribution we all *know* to be Uniform on [A, A+1]. Only A is uncertain. One observation is taken, and it turns out to be 2.7 . We can safely declasre that A is in the range 1.7 to 2.7 ! No mucking about with confidence intervals or priors or significance or anything else the client doesn't want to hear about! Great.

Of course no client beyond kindergarten level is going to ask for our opinion in that scenario! But if we can make every statistical situation come out in a similarly definite way, we would really be onto something. And this is not necessarily absurd. In the standard situation where we observe X-bar of a sample of size n from Normal(mu, 1), we can do it. Just give a confidence interval of 99.9999 % confidence! (And this can be done by merely (say) tripling the usual 95% one.) This is now "certain to be right", as in the uniform example! "Certain", as in the excerpts above, meaning merely in the thermodynamic sense that "we'll never get it wrong"; or better, that other non-statistical effects (such as broken equipment, universal insanity, nuclear war, etc) will swamp this tiny statistical uncertainty. Of course, getting near-perfect hypothesis tests or confidence intervals like the above, may entail HUGE useless intervals, or whatever; but not necessarily always so.

Not once in your whole career will you ever make a mistake this way! Of course, it may be objected, as Daryl says:

there is no consistent way to treat small but nonzero probabilities
(say, a 1 in a million chance) as zero, because they have a tendency of
adding up.

But if you're only going to do a thousand of these in your career (and that's busy!), you'll still never be wrong! Daryl would want to complain that we might set up a computer simulation to do millions of these tests in a row; and *now* we'd be in trouble. But the frequentist merely answers - that's changing the experiment half-way through! We just reset the confidence level so that *not one* of these million simulations will come out with the wrong answer (with huge likelihood). This re-setting the levels is a bit reminiscent of the Copenhagenist constantly shifting the point where the "collapse" occurs, every time the opposition presses him with a tighter experiment.

-----

Well all this has little to do with QM or physics any more. But while I'm waffling on, I'll just pick up on another coment from Ben Weiner:

The one thing that still bugs me is the relation of Bayesian approaches
to non-parametric statistics.  Or the lack of same ...

Absolutely! This is a constant embarrassment to Bayesians, as Lindley is the first to admit. I did once attend a Bayesian non-parametric seminar, but it was rather disappointing. The matters attended to were far from the usual non-parametric concerns. Non-parametrics were of the few things that ever struck me as being a form of "absolute" statistics. They still strike me as having an almost magical way of getting something out of almost nothing.

Simple example: You are to observe 3 independent measurements from some probability distribution which you haven't a CLUE what it might be! (Very un-Bayesian idea!) You know only that it's continuous; (unnecessary technical convenience.) If you want to estimate the MEDIAN, (not the mean), of the underlying distribution, you can make an "exact" statement:-

With a "75% chance", the median is between the top one and the bottom one! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is, as I say, a beautiful "absolute" result; requiring not the slightest hint of any prior on the underlying distribution! If only all statistics could be as neat as this!

Naturally, the true-blue Bayesian would throw up his hands in horror at this example, for several reasons; but especially he would say I've changed...

P(3 observations straddle the true median | median) = .75 , (true), to...

P(true median is within 3 observations | observations) = .75 , (false).

True, from his every-probability-is-conditional standpoint I have committed heresy - the conditionals have changed.

But really, most people would think it sensible to regard these as being essentially the same statement. Certainly, after a hard day, in fading light, it's hard to distinguish...

P(3 observations straddle the true median) = .75 and

P(true median is within the 3 observations) = .75 .

I would say, (and I think all but fully committed Bayesians would say), the second is just a re-phrasing of the first...

-------------------------------------------------------------------------
Bill Taylor wft@math.canterbury.ac.nz -------------------------------------------------------------------------
Of COURSE you're entitled to your own opinions, just keep them to yourself.

From: youssef@d0sgi6.fnal.gov (Saul Youssef)
Newsgroups: sci.physics
Subject: Re: Bayesianism and QM.
Date: 30 Nov 1994 22:53:44 GMT
Organization: Fermi National Accelerator Laboratory, Batavia IL
Message-ID: <3bivpo$8o0@fnnews.fnal.gov>

John Baez writes:

Personally, I don't mind talking sometimes as if there were a "real
prior out there", because I'm perfectly aware of the limitations of
this way of talking.  This is exactly what I think I'm doing when I
write down a wavefunction psi and say to myself "this is the state
of the electron" instead of "this is the state I am assuming
for the electron".  The former is quicker and easier to say.  It's
also not clear that it ever meant anything other than the latter!

I know that this seems like a philosophical point, but I think that this particular issue has big consequences. For instance...

It seems to me that the "state" view of the wavefunction leads directly to people trying to invent dynamical collapse mechanisms, to "measure the wavefunction", to "make macroscopic superpositions", to think that QM has non-local effects in it, to believe in MWI, and so on. I think that one has to at least to admit that many papers have been written based on taking one side of this seemingly pedantic distinction. Also, if the state view really is correct, then maybe there really *are* many worlds; this may then have other consequences, etc.

I'm sure that none of this confuses you personally, but my impression is that most people learning QM take the state view without realizing that they are assuming something and then get all confused. Also...

If the wavefunction really has the same status as a Bayesian probability distribution, you would expect there to be a systematic way to "improve wavefunctions based on your prior knowledge" just as there is in Bayesian Inference. Understanding exactly the right way to do this has had tremendous consequences in statistics and it's easy to imagine it having important practical consequences in quantum mechanics as well. However, the "state" view may prevent you from improving your wavefunction at all, because you think it's not allowed! This is just like frequentists who have ineffective or non-existent solutions to certain problems because they think of their probability distributions as "real."

By the way, I know of a paper or two where people do take the Bayesian view of wavefunctions and try to improve them using maxent e.g. Casona, Rossignoli and Plastino, Phys.Rev.C 45(1992)1162. If anyone knows of others, I would be interested to hear about it.

Cheers, Saul

Date: Fri, 2 Dec 94 09:00:41 CST
From: bill@clyde.as.utexas.edu (William H. Jefferys)
Message-Id: <9412021500.AA04838@clyde.as.utexas.edu>
To: baez@ucrmath.ucr.edu
Subject: Re: Bayesianism and QM.
In-Reply-To: <3bmbrf$s3m@galaxy.ucr.edu>
Organization: McDonald Observatory, University of Texas @ Austin

In article <3bmbrf$s3m@galaxy.ucr.edu>, John Baez writes:

It's so nice when people get your oblique references.  Speaking of
references, do you (or anyone out there) have any good references to the
concept of an "improper prior" (by which I guess you must mean
an infinite measure, like Lebesgue measure on the real line, which can't
be normalized to give a probability measure) in probability or
statistics?  I happen to be working on a paper which deals with some
crazy ideas, some of which concern the role this notion plays in quantum
theory (where operator algebraists call it a "weight").  I
know Don Page has thought about this kind of thing in the context of
quantum cosmology, but what I'd like to see is what statisticians think
about it.  You can get into big trouble with them (e.g., the paradoxes
where you "pick a random real number"), but there still seem
to be cases (like the above) where one wants to think of them as a kind
of prior.

John, most books on Bayesian theory will discuss such "improper priors". They are very commonly used as indifference priors. See, e.g., Berger's book on decision theory. Jaynes' manuscript also discusses them.

Cheers, Bill

From: KJM@mfs1.ballarat.edu.au (Kevin Moore)
Newsgroups: sci.physics
Subject: Re: Bayesianism and QM.
Date: Tue, 6 Dec 1994 13:19:00
Organization: University of Ballarat
Message-ID: <KJM.193.000D517D@mfs1.ballarat.edu.au>

Someone was asking about Bayesian approaches to non-parametric statistics. In trying to find a reference for John Baez, I came across this:

Bayesian Non-Parametric Statistics Stephen F Gull and John Fielden in Maximum Entropy and Bayesian Methods in Applied Statistics James H Justice (ed.) Cambridge University Press 1986

I haven't read it yet myself, but the person who was asking might like to follow it up.

Kevin Moore

P.S. The book seems to have been typeset in Chi-writer (yuck!)

From: KJM@mfs1.ballarat.edu.au (Kevin Moore)
Newsgroups: sci.physics
Subject: Improper Priors (Attn John Baez))
Date: Tue, 17 Jan 1995 09:47:16
Organization: University of Ballarat
Message-ID: <KJM.207.0009C9FF@mfs1.ballarat.edu.au>
Summary: Reference to improper priors

I tried mailing this in reply to a saved news article, but it bounced a couple of times, so apologies all.

John asked for a reference to improper priors, a subject which is more argued about than published about, it seems.

An old one is by Ed Jaynes in "Bayesian Analysis in Econometrics and Statistics" ed. A. Zellner, North-Holland (Amsterdam) (1980).

Jaynes also has a bit about this in his chapter on "Paradoxes of Probability Theory" in his pre-print available on http://omega.albany.edu:8008/JaynesBook.html

It's a start :-)

Kevin Moore

kjm@ballarat.edu.au

John,

Some standard sources of conventional Bayesianism would include:

H. Jeffreys, _Theory of Probability_ (Oxford) Jeffreys was of the objectivist school, as is Jaynes.

J.O. Berger, _Statistical Decision Theory and Bayesian Analysis, Second Edition_ (Springer-Verlag)

L.J. Savage, _The Foundations of Statistics_, (Dover reprint) Savage was a leading subjectivist Bayesian.

Vic Barnett, _Comparative Statistical Inference_ (Wiley), which is a balanced view of both Bayesian and frequentist statistics.

Also, Ed Jaynes is in the process of writing a book (which of course flogs his point of view and, being by him, is fairly polemical). He encourages distribution of the manuscript and is interested in comments. I have the LaTeX source of a (I won't say _the_) recent version. Some chapters are fairly incomplete. If you are interested I could deposit it in our FTP area for you. It is pretty thick (1", double-sided).

Cheers, Bill

John,

I thought you might also be interested in the following books, which are written from a historical point of view:

First and foremost, Stephen M. Stigler's _The History of Statistics_ (Harvard/Belknap 1986). This only goes up to 1900.

Then, there's _The Empire of Chance_, by G. Gigerenzer and 5 other authors (Cambridge 1989).

Cheers, Bill

From: bill@bessel.as.utexas.edu (William H. Jefferys)
Subject: Re: Frankly,my dear......was: Fermat's Last Theorem
To: jbaez@math.mit.edu
Message-Id: <9301121600.AA15147@bessel.as.utexas.edu.as.utexas.edu>
Newsgroups: sci.math,sci.logic
Organization: McDonald Observatory, University of Texas @ Austin

John,

You might enjoy reading Persi Diaconis' article, "Bayesian Numerical Analysis", in _Statistical Decision Theory and Related Topics IV, Vol 1_, p. 163 (1988). Also, John Skilling's article, "The Eigenvalues of Mega-Dimensional Matrices" in _Maximum Entropy and Bayesian Methods_, p. 455 (1989). While these are a little off from what you propose, there is enough common ground to make them interesting, I think.

Cheers, Bill

With age and experience in research come the twin dangers of dwindling into a philosopher of science while being enlarged into a dotard. - C. Truesdell, An Idiot's Fugitive Essays on Science.