![]() |
|
![]() |
Last time I ended with a formula for the 'Gibbs distribution': the probability distribution that maximizes entropy subject to constraints on the expected values of some observables.
This formula is well-known, but I'd like to derive it here. My argument won't be up to the highest standards of rigor: I'll do a bunch of computations, and it would take more work to state conditions under which these computations are justified. But even a nonrigorous approach is worthwhile, since the computations will give us more than the mere formula for the Gibbs distribution.
I'll start by reminding you of what I claimed last time. I'll state it in a way that removes all unnecessary distractions, so go back to Part 20 if you want more explanation.
subject to the requirement that some integrable functions
(Unlike last time, now I'm writing
Furthermore, suppose
where
and
is the entropy of
Let's show this is true!
We can solve this problem using Lagrange multipliers. We need one Lagrange multiplier, say
To do this we need an extra Lagrange multiplier, say
So, that's what we'll do! We'll look for critical points of this function on
Here I'm using some tricks to keep things short. First, I'm dropping the dummy variable
Okay, now let's do the variational derivative required to find a critical point of this function. When I was a math major taking physics classes, the way physicists did variational derivatives seemed like black magic to me. Then I spent months reading how mathematicians rigorously justified these techniques. I don't feel like a massive digression into this right now, so I'll just do the calculations — and if they seem like black magic, I'm sorry!
We need to find
or in other words
First we need to simplify this expression. The only part that takes any work, if you know how to do variational derivatives, is the first term. Since the derivative of
The second and third terms are easy, so we get
Thus, we need to solve this equation:
That's easy to do:
Good! It's starting to look like the Gibbs distribution!
We now need to choose the Lagrange multipliers
we must choose
or in other words
Plugging this into our earlier formula
we get this:
Great! Even more like the Gibbs distribution!
By the way, you must have noticed the "1" that showed up here:
It buzzed around like an annoying fly in the otherwise beautiful calculation, but eventually went away. This is the same irksome "1" that showed up in Part 19. Someday I'd like to say a bit more about it.
Now, where were we? We were trying to show that
minimizes entropy subject to our constraints. So far we've shown
is a critical point. It's clear that
so
This is interesting! It's saying our Lagrange multipliers
where
There are two ways to show this: the easy way and the hard way. The easy way is to reflect on the meaning of Lagrange multipliers, and I'll sketch that way first. The hard way is to use brute force: just compute
subject to the constraint
for some smooth function
and constant
This works because the above equation says
Geometrically this means we're at a point where the gradient of
Thus, to first order we can't change
But also, if we start at a point where
and we begin moving in any direction, the function
Our situation is more complicated, since our functions are defined on the infinite-dimensional space
So, when we are at a solution
But this is just what we needed to show!
We start by solving our constrained entropy-maximization problem using Lagrange multipliers. As already shown, we get
Then we'll compute the entropy
Then we'll differentiate this with respect to
Let's try it! The calculation is a bit heavy, so let's write
so that
and the entropy is
\[
This is the sum of two terms. The first term
is
The second term is easier:
since
Putting together these two terms we get an interesting formula for the entropy:
This formula is one reason this brute-force approach is actually worthwhile! I'll say more about it later.
But for now, let's use this formula to show what we're trying to show, namely
For starters,
where we played a little Kronecker delta game with the second term.
Now we just need to compute the third term:
Ah, you don't know how good it feels, after years of category theory, to be doing calculations like this again!
Now we can finish the job we started:
Voilà!
Thus, they are rich in meaning. From what we've seen earlier, they are 'surprisals'. They are analogous to momentum in classical mechanics and have the meaning of intensive variables in thermodynamics:
Classical Mechanics | Thermodynamics | Probability Theory | |
q | position | extensive variables | probabilities |
p | momentum | intensive variables | surprisals |
S | action | entropy | Shannon entropy |
(We proved this formula with
This formula suggests that the logarithm of the partition function is important — and it is! It's closely related to the concept of free energy — even though 'energy', free or otherwise, doesn't show up at the level of generality we're working at now.
This formula should also remind you of the tautological 1-form on the cotangent bundle
It should remind you even more of the contact 1-form on the contact manifold
Here
So, it's clear there's a lot more to say: we're seeing hints of things here and there, but not yet the full picture.
You can read a discussion of this article on Azimuth, and make your own comments or ask questions there!
![]() |
|
![]() |