Information Geometry (Part 4)

October 29, 2010

Information Geometry (Part 4)

John Baez

Before moving on, I'd like to clear up an important point, which had me confused for a while. A Riemannian metric must be symmetric:

$$ g_{ij} = g_{ji} $$

When I first started thinking about this stuff, I defined the Fisher information metric to be the so-called 'covariance matrix':

$$ g_{ij} = \langle (X_i - \langle X_i \rangle) \;(X_j- \langle X_j \rangle)\rangle$$

where $X_i$ are some observable-valued functions on a manifold $M$, and the angle brackets mean "expectation value", computed using a mixed state $\rho$ that also depends on the point in $M$.

The covariance matrix is symmetric in classical mechanics, since then observables commute, so:

$$ \langle AB \rangle = \langle BA \rangle $$

But it's not symmetric is quantum mechanics! After all, suppose $q$ is the position operator for a particle, and $p$ is the momentum operator. Then according to Heisenberg

$$ qp = pq + i $$

in units where Planck's constant is 1. Taking expectation values, we get:

$$ \langle qp \rangle = \langle pq \rangle + i$$

and in particular:

$$ \langle qp \rangle \ne \langle pq \rangle $$

We can use this to get examples where $g_{ij}$ is not symmetric.

However, it turns out that the real part of the covariance matrix is symmetric, even in quantum mechanics—and that's what we should use as our Fisher information metric.

Why is the real part of the covariance matrix symmetric, even in quantum mechanics? Well, suppose $\rho$ is any density matrix, and $A$ and $B$ are any observables. Then by definition

$\langle AB \rangle = \mathrm{tr} (\rho AB)$$

so taking the complex conjugate of both sides

$$ \langle AB\rangle^* = \mathrm{tr}(\rho AB)^* = \mathrm{tr}((\rho A B)^*) = \mathrm{tr}(B^* A^* \rho^*)$$

where I'm using an asterisk both for the complex conjugate of a number and the adjoint of an operator. But our observables are self-adjoint, and so is our density matrix, so we get

$$ \mathrm{tr}(B^* A^* \rho^*) = \mathrm{tr}(B A \rho) = \mathrm{tr}(\rho B A) = \langle B A \rangle $$

where in the second step we used the cyclic property of the trace. In short:

$$ \langle AB\rangle^* = \langle BA \rangle $$

If we take real parts, we get something symmetric:

$$ \mathrm{Re} \langle AB\rangle = \mathrm{Re} \langle BA \rangle $$

So, if we redefine the Fisher information metric to be the real part of the covariance matrix:

$$ g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle) \; (X_j- \langle X_j \rangle)\rangle $$

then it's symmetric, as it should be.

Last time I mentioned a general setup using von Neumann algebras, that handles the classical and quantum situations simultaneously. That applies here! Taking the real part has no effect in classical mechanics, so we don't need it there—but it doesn't hurt, either.

Taking the real part never has any effect when $i = j$, either, since the expected value of the square of an observable is a nonnegative number:

$$ \langle (X_i - \langle X_i \rangle)^2 \rangle \ge 0$$

This has two nice consequences.

First, we get

$$ g_{ii} = \langle (X_i - \langle X_i \rangle)^2 \rangle \ge 0 $$

and since this is true in any coordinate system, our would-be metric $g$ is indeed nonnegative. It'll be an honest Riemannian metric whenever it's positive definite.

Second, suppose we're working in the special case discussed in Part 2, where our manifold is an open subset of $\mathbb{R}^n$, and $\mathbb{\rho}$ at the point $x \in \mathbb{R}^n$ is the Gibbs state with $\langle X_i \rangle = x_i$. Then all the usual rules of statistical mechanics apply. So, we can compute the variance of the observable $X_i$ using the partition function $Z$:

$$ \langle (X_i - \langle X_i \rangle)^2 \rangle = \frac{\partial^2}{\partial \lambda_i^2} \ln Z $$

In other words,

$$ g_{ii} = \frac{\partial^2}{\partial \lambda_i^2} \ln Z $$

But since this is true in any coordinate system, we must have

$$ g_{ij} = \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \ln Z $$

(Here I'm using a little math trick: two symmetric bilinear forms whose diagonal entries agree in any basis must be equal. We've already seen that the left side is symmetric, and the right side is symmetric by a famous fact about mixed partial derivatives.)

However, I'm pretty sure this cute formula

$$ g_{ij} = \frac{\partial^2}{\partial \lambda_i \partial \lambda_j} \ln Z $$

only holds in the special case I'm talking about now, where points in $\mathbb{R}^n$ are parametrizing Gibbs states in the obvious way. In general we must use

$$ g_{ij} = \mathrm{Re} \langle (X_i - \langle X_i \rangle)(X_j- \langle X_j \rangle)\rangle $$

or equivalently,

$$ g_{ij} = \mathrm{Re} \, \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})$$

Okay. So much for cleaning up Last Week's Mess. Here's something new. We've seen that whenever $A$ and $B$ are observables (that is, self-adjoint),

$$ \langle AB\rangle^* = \langle BA \rangle $$

We got something symmetric by taking the real part:

$$ \mathrm{Re} \langle AB\rangle = \mathrm{Re} \langle BA \rangle $$

Indeed,

$$ \mathrm{Re} \langle AB \rangle = \frac{1}{2} \langle AB + BA \rangle $$

But by the same reasoning, we get something antisymmetric by taking the imaginary part:

$$ \mathrm{Im} \langle AB\rangle = —\mathrm{Im} \langle BA \rangle $$

and indeed,

$$ \mathrm{Im} \langle AB \rangle = \frac{1}{2i} \langle AB—BA \rangle $$

Commutators like $AB-BA$ are important in quantum mechanics, so maybe we shouldn't just throw out the imaginary part of the covariance matrix in our desperate search for a Riemannian metric! Besides the symmetric tensor on our manifold $M$:

$$ g_{ij} = \mathrm{Re} \, \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})$$

we can also define a skew-symmetric tensor:

$$ \omega_{ij} = \mathrm{Im} \, \mathrm{tr} (\rho \; \frac{\partial \ln \rho}{\partial \lambda_i} \frac{\partial \ln \rho}{\partial \lambda_j})$$

This will vanish in the classical case, but not in the quantum case!

If you've studied enough geometry, you should now be reminded of things like 'Kähler manifolds' and 'almost Kähler manifolds'. A Kähler manifold is a manifold that's equipped with a symmetric tensor $g$ and a skew-symmetric tensor $\omega$ which fit together in the best possible way. An almost Kähler manifold is something similar, but not quite as nice. We should probably see examples of these arising in information geometry! And that could be pretty interesting.

But in general, if we start with any old manifold $M$ together with a function $\rho$ taking values in mixed states, we seem to be making $M$ into something even less nice. It gets a symmetric bilinear form $g$ on each tangent space, and a skew-symmetric bilinear form $\omega$, and they vary smoothly from point to point... but they might be degenerate, and I don't see any reason for them to 'fit together' in the nice way we need for a Kähler or almost Kähler manifold.

However, I still think something interesting might be going on here. For one thing, there are other situations in physics where a space of states is equipped with a symmetric $g$ and a skew-symmetric $\omega$. They show up in 'dissipative mechanics'—the study of systems whose entropy increases.

To conclude, let me remind you of some things I said in week295 of This Week's Finds. This is a huge digression from information geometry, but I'd like to lay out the the puzzle pieces in public view, in case it helps anyone get some good ideas.

I wrote:

• Hans Christian Öttinger, Beyond Equilibrium Thermodynamics, Wiley, 2005.

I thank Arnold Neumaier for pointing out this book! It considers a fascinating generalization of Hamiltonian mechanics that applies to systems with dissipation: for example, electrical circuits with resistors, or mechanical systems with friction.

In ordinary Hamiltonian mechanics the space of states is a manifold and time evolution is a flow on this manifold determined by a smooth function called the Hamiltonian, which describes the energy of any state. In this generalization the space of states is still a manifold, but now time evolution is determined by two smooth functions: the energy and the entropy! In ordinary Hamiltonian mechanics, energy is automatically conserved. In this generalization that's also true, but energy can go into the form of heat... and entropy automatically increases!

Mathematically, the idea goes like this. We start with a Poisson manifold, but in addition to the skew-symmetric Poisson bracket {F,G} of smooth functions on some manifold, we also have a symmetric bilinear bracket [F,G] obeying the Leibniz law

[F,GH] = [F,G]H + G[F,H]

and this positivity condition:

[F,F] ≥ 0

The time evolution of any function is given by a generalization of Hamilton's equations:

dF/dt = {H,F} + [S,F]

where H is a function called the "energy" or "Hamiltonian", and S is a function called the "entropy". The first term on the right is the usual one. The new second term describes dissipation: as we shall see, it pushes the state towards increasing entropy.

If we require that

[H,F] = {S,F} = 0

for every function F, then we get conservation of energy, as usual in Hamiltonian mechanics:

dH/dt = {H,H} + [S,H] = 0

But we also get the second law of thermodynamics:

dS/dt = {H,S} + [S,S] ≥ 0

Entropy always increases!

Öttinger calls this framework "GENERIC"—an annoying acronym for "General Equation for the NonEquilibrium Reversible-Irreversible Coupling". There are lots of papers about it. But I'm wondering if any geometers have looked into it!

If we didn't need the equations [H,F] = {S,F} = 0, we could easily get the necessary brackets starting with a Kähler manifold. The imaginary part of the Kähler structure is a symplectic structure, say ω, so we can define

{F,G} = ω(dF,dG)

as usual to get Poisson brackets. The real part of the Kähler structure is a Riemannian structure, say g, so we can define

[F,G] = g(dF,dG)

This satisfies

[F,GH] = [F,G]H + G[F,H]

and

[F,F] ≥ 0

Don't be fooled: this stuff is not rocket science. In particular, the inequality above has a simple meaning: when we move in the direction of the gradient of F, the function F increases. So adding the second term to Hamilton's equations has the effect of pushing the system towards increasing entropy.

Note that I'm being a tad unorthodox by letting ω and g eat cotangent vectors instead of tangent vectors—but that's no big deal. The big deal is this: if we start with a Kähler manifold and define brackets this way, we don't get [H,F] = 0 or {S,F} = 0 for all functions F unless H and S are constant! That's no good for applications to physics. To get around this problem, we would need to consider some sort of degenerate Kähler structure—one where ω and g are degenerate bilinear forms on the cotangent space.

Has anyone thought about such things? They remind me a little of "Dirac structures" and "generalized complex geometry"—but I don't know enough about those subjects to know if they're relevant here.

This GENERIC framework suggests that energy and entropy should be viewed as two parts of a single entity—maybe even its real and imaginary parts! And that in turn reminds me of other strange things, like the idea of using complex-valued Hamiltonians to describe dissipative systems, or the idea of "inverse temperature as imaginary time". I can't tell yet if there's a big idea lurking here, or just a mess....

You can read a discussion of this article on Azimuth, and make your own comments or ask questions there!