![]() |
|
![]() |
While preparing this talk, I discovered a cool fact. I doubt it's new, but I haven't exactly seen it elsewhere. I came up with it while trying to give a precise and general statement of 'Fisher's fundamental theorem of natural selection'. I won't start by explaining that theorem, since my version looks rather different than Fisher's, and I came up with mine precisely because I had trouble understanding his. I'll say a bit more about this at the end.
Here's my version:
The square of the rate at which a population learns information is the variance of its fitness.
This is a nice advertisement for the virtues of diversity: more variance means faster learning. But it requires some explanation!
Let's start by assuming we have
I'll call them replicators of different species.
Let's suppose each population
Here
This equation is important, so we want a short way to write it. I'll often write
Next, let
Starting from our equation describing how the populations evolve, we can figure out how these probabilities evolve. The answer is called the replicator equation:
Here
In what follows I'll abbreviate the replicator equation as follows:
Okay, now let's figure out how fast the probability distribution
changes with time. For this we need to choose a way to measure the length of the vector
And here information geometry comes to the rescue! We can use the Fisher information metric, which is a Riemannian metric on the space of probability distributions.
I've talked about the Fisher information metric in many ways in this series. The most important fact is that as a probability distribution
as measured using the Fisher information metric can be seen as the rate at which information is learned. I'll explain that later. Right now I just want a simple formula for the Fisher information metric. Suppose
Using this we can calculate the speed at which
The answer has a nice meaning, too! It's just the variance of the fitness: that is, the square of its standard deviation.
So, if you're willing to buy my claim that the speed
Now, how is this related to Fisher's fundamental theorem of natural selection? First of all, what is Fisher's fundamental theorem? Here's what Wikipedia says about it:
It uses some mathematical notation but is not a theorem in the mathematical sense. It states:"The rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time."Or in more modern terminology:
"The rate of increase in the mean fitness of any organism at any time ascribable to natural selection acting through changes in gene frequencies is exactly equal to its genetic variance in fitness at that time".Largely as a result of Fisher's feud with the American geneticist Sewall Wright about adaptive landscapes, the theorem was widely misunderstood to mean that the average fitness of a population would always increase, even though models showed this not to be the case. In 1972, George R. Price showed that Fisher's theorem was indeed correct (and that Fisher's proof was also correct, given a typo or two), but did not find it to be of great significance. The sophistication that Price pointed out, and that had made understanding difficult, is that the theorem gives a formula for part of the change in gene frequency, and not for all of it. This is a part that can be said to be due to natural selection
Price's paper is here:
I don't find it very clear, perhaps because I didn't spend enough time on it. But I think I get the idea.
My result is a theorem in the mathematical sense, though quite an easy one. I assume a population distribution evolves according to the replicator equation and derive an equation whose right-hand side matches that of Fisher's original equation: the variance of the fitness.
But my left-hand side is different: it's the square of the speed of the corresponding probability distribution, where speed is measured using the 'Fisher information metric'. This metric was discovered by the same guy, Ronald Fisher, but I don't think he used it in his work on the fundamental theorem!
Something a bit similar to my statement appears as Theorem 2 of this paper:
and for that theorem he cites:
However, his Theorem 2 really concerns the rate of increase of fitness, like Fisher's fundamental theorem. Moreover, he assumes that the probability distribution
The key to generalizing Fisher's fundamental theorem is thus to focus on the speed at which
I explained this back in Part 7, but that explanation seems hopelessly technical to me now, so here's a faster one, which I created while preparing my talk.
The information of a probability distribution
It says how much information you learn if you start with a hypothesis
Now suppose you have a hypothesis that's changing with time in a smooth way, given by a time-dependent probability
for all times
To first order, you're never learning anything.
However, as long as the velocity
so we can say
To second order, you're always learning something... unless your opinions are fixed.
This lets us define a 'rate of learning'---that is, a 'speed' at which the probability distribution
In other words:
where the length is given by Fisher information metric. Indeed, this formula can be used to define the Fisher information metric. From this definition we can easily work out the concrete formula I gave earlier.
In summary: as a probability distribution moves around, the relative information between the new probability distribution and the original one grows approximately as the square of time, not linearly. So, to talk about a 'rate at which information is learned', we need to use the above formula, involving a second time derivative. This rate is just the speed at which the probability distribution moves, measured using the Fisher information metric. And when we have a probability distribution describing how many replicators are of different species, and it's evolving according to the replicator equation, this speed is also just the variance of the fitness!
![]() |
|
![]() |