![]() |
|
![]() |
SURPRISE: it's called SURPRISAL!
This is a well-known concept in information theory. It's also called 'information content'.
Let's see why. First, let's remember the setup. We have a manifold
whose points
called the Shannon entropy, defined by
For each point
As mentioned last time, this is the analogue of momentum in probability theory. In the second half of this post I'll say more about exactly why. But first let's compute it and see what it actually equals!
Let's start with a naive calculation, acting as if the probabilities
so using the definition of the Shannon entropy we have
Now, the quantity
Of course 'surprise' is a psychological term, not a term from math or physics, so we shouldn't take it too seriously here. We can derive the concept of surprisal from three axioms:
It follows from work on Cauchy's functional equation that
for some constant
So far, so good. But what about the irksome "-1" in our formula?
Luckily it turns out we can just get rid of this! The reason is that the probabilities
The cotangent space is the dual of the tangent space. The dual of a subspace
is the quotient space
The cotangent space
Of course, we can identify the dual of
From this, you can see that a linear functional
if and only if its corresponding vector
So, we get
In words: we can describe cotangent vectors to
This suggests that our naive formula
is on the right track, but we're free to get rid of the constant 1 if we want! And that's true.
To check this rigorously, we need to show
for all
where in the second to last step we used our earlier calculation:
and in the last step we used
Now let's take stock of where we are. We can fill in the question marks in the charts from last time, and combine those charts while we're at it.
Classical Mechanics | Thermodynamics | Probability Theory | |
q | position | extensive variables | probabilities |
p | momentum | intensive variables | surprisals |
S | action | entropy | Shannon entropy |
What's going on here? In classical mechanics, action is minimized (or at least the system finds a critical point of the action). In thermodynamics, entropy is maximized. In the maximum entropy approach to probability, Shannon entropy is maximized. This leads to a mathematical analogy that's quite precise. For classical mechanics and thermodynamics, I explained it here:
These posts may give a more approachable introduction to what I'm doing now: now I'm bringing probability theory into the analogy, with a big emphasis on symplectic and contact geometry.
Let me spell out a bit of the analogy more carefully:
What's this? It's basically action:
There is a cotangent vector
The components of this vector are the intensive variables corresponding to the extensive variables.
Probability theory. In probability theory, we have a manifold
There is a cotangent vector
The components of this vector are the surprisals corresponding to the probabilities.
In all three cases,
There is also a contact manifold
We can then decree that
There's a lot more to do with these ideas, and I'll continue next time.
You can read a discussion of this article on Azimuth, and make your own comments or ask questions there!
![]() |
|
![]() |