
I'll start by explaining 'Nash equilibria' for 2person games. These are situations where neither player can profit by changing what they're doing. Then I'll introduce 'mixed strategies', where the players can choose among several strategies with different probabilities. Then I'll introduce evolutionary game theory, where we think of each strategy as a species, and its probability as the fraction of organisms that belong to that species.
Back in Part 9, I told you about the 'replicator equation', which says how these fractions change with time thanks to natural selection. Now we'll see how this leads to the idea of an 'evolutionarily stable strategy'. And finally, we'll see that when evolution takes us toward such a stable strategy, the amount of information the organisms have 'left to learn' keeps decreasing!
We can describe a certain kind of twoperson game using a payoff matrix, which is an $n \times n$ matrix $A_{ij}$ of real numbers. We think of $A_{ij}$ as the payoff that either player gets if they choose strategy $i$ and their opponent chooses strategy $j.$
Note that in this kind of game, there's no significant difference between the 'first player' and the 'second player': either player wins an amount $A_{ij}$ if they choose strategy $i$ and their opponent chooses strategy $j.$ So, this kind of game is called symmetric even though the matrix $A_{ij}$ may not be symmetric. Indeed, it's common for this matrix to be antisymmetric, meaning $A_{ij} =  A_{ji},$ since in this case what one player wins, the other loses. Games with this extra property are called zerosum games. But we won't limit ourselves to those!
We say a strategy $i$ is a symmetric Nash equilibrium if
$$ A_{ii} \ge A_{ji} $$
for all $j.$ This means that if both players use strategy $i,$ neither gains anything by switching to another strategy.
For example, suppose our matrix is
$$ \left( \begin{array}{rr} 1 & 12 \\ 0 & 3 \end{array} \right) $$
Then we've got the Prisoner's Dilemma exactly as described last time! Here strategy 1 is cooperate and strategy 2 is defect. If a player cooperates and so does his opponent, he wins
$$ A_{11} = 1 $$
meaning he gets one month in jail. We include a minus sign because 'winning a month in jail' is not a good thing. If the player cooperates but his opponent defects, he gets a whole year in jail:
$$ A_{12} = 12$$
If he defects but his opponent cooperates, he doesn't go to jail at all:
$$ A_{21} = 0$$
And if they both defect, they both get three months in jail:
$$ A_{22} = 3$$
You can see that defecting is a Nash equilibrium, since
$$ A_{22} \ge A_{12}$$
So, oddly, if our prisoners know game theory and believe Nash equilibria are best, they'll both be worse off than if they cooperate and don't betray each other.
So far we've been assuming that with 100% certainty, each player chooses one strategy $i = 1,2,3,\dots, n.$ Since we'll be considering more general strategies in a minute, let's call these pure strategies.
Now let's throw some probability theory into the stew! Let's allow the players to pick different pure strategies with different probabilities. So, we define a mixed strategy to be a probability distribution on the set of pure strategies. In other words, it's a list of $n$ nonnegative numbers
$$ p_i \ge 0 $$
that sum to one:
$$ \displaystyle{ \sum_{i=1}^n p_i = 1 } $$
Say I choose the mixed strategy $p$ while you, my opponent, choose the mixed strategy $q.$ Say our choices are made independently. Then the probability that I choose the pure strategy $i$ while you chose $j$ is
$$ p_i q_j$$
so the expected value of my winnings is
$$ \displaystyle{ \sum_{i,j = 1}^n p_i A_{ij} q_j }$$
or using vector notation
$$ p \cdot A q $$
where the dot is the usual dot product on $\mathbb{R}^n.$
We can easily adapt the concept of Nash equilibrium to mixed strategies. A mixed strategy $q$ is a symmetric Nash equilibrium if for any other mixed strategy $p,$
$$ q \cdot A q \ge p \cdot A q $$
This means that if both you and I are playing the mixed strategy $q,$ I can't improve my expected winnings by unilaterally switching to the mixed strategy $p.$ And neither can you, because the game is symmetric!
If this were a course on game theory, I would now do some examples. But it's not, so I'll just send you to page 6 of Sandholm's paper: he looks at some famous games like 'hawks and doves' and 'rock paper scissors'.
We're finally ready to discuss evolutionarily stable strategies. To do this, let's reinterpret the 'pure strategies' $i = 1,2,3, \dots n$ as species. Here I don't necessarily mean species in the classic biological sense: I just mean different kinds of selfreplicating entities, or replicators. For example, they could be different alleles of the same gene.
Similarly, we'll reinterpret the 'mixed strategy' $p$ as describing a mixed population of replicators, where the fraction of replicators belonging to the $i$th species is $p_i.$ These numbers are still probabilities: $p_i$ is the probability that a randomly chosen replicator will belong to the $i$th species.
We'll reinterpret the payoff matrix $A_{ij}$ as a fitness matrix. In our earlier discussion of the replicator equation, we assumed that the population $P_i$ of the $i$th species grew according to the Lotka–Volterra equation
$$ \displaystyle{ \frac{d P_i}{d t} = f_i(P_1, \dots, P_n) P_i }$$
where the fitness function $f_i$ is any smooth function of the populations of each kind of replicator.
But in evolutionary game theory it's common to start by looking at a simple special case where
$$ \displaystyle{f_i(P_1, \dots, P_n) = \sum_{j=1}^n A_{ij} p_j }$$
where
$$ \displaystyle{ p_j = \frac{P_j}{\sum_k P_k} }$$
is the fraction of replicators who belong to the $j$th species.
What does this mean? The idea is that we have a wellmixed population of game players—or replicators. Each one has its own pure strategy—or species. Each one randomly roams around and 'plays games' with each other replicator it meets. It gets to reproduce at a rate proportional to its expected winnings.
This is unrealistic in all sorts of ways, but it's mathematically cute, and it's been studied a lot, so it's good to know about. Today I'll explain evolutionarily stable strategies only in this special case. Later I'll go back to the general case.
Suppose that we select a sample of replicators from the overall population. What is the mean fitness of the replicators in this sample? For this, we need to know the probability that a replicator from this sample belongs to the $i$th species. Say it's $q_j.$ Then the mean fitness of our sample is
$$ \displaystyle{ \sum_{i,j=1}^n q_i A_{ij} p_j }$$
This is just a weighted average of the fitnesses in our earlier formula. But using the magic of vectors, we can write this sum as
$$ q \cdot A p $$
We already saw this type of expression in the last section! It's my expected winnings if I play the mixed strategy $q$ and you play the mixed strategy $p$.
John Maynard Smith defined $q$ to be evolutionarily stable strategy if when we add a small population of 'invaders' distributed according to any other probability distribution $p,$ the original population is more fit than the invaders.
In simple terms: a small 'invading' population will do worse than the population as a whole.
More precisely:
$$ q \cdot A ((1\epsilon)q + \epsilon p) > p \cdot A ((1\epsilon)q + \epsilon p) $$
for all mixed strategies $p \ne q$ and all sufficiently small $\epsilon > 0 .$ Here
$$ (1\epsilon)q + \epsilon p $$
is the population we get by replacing an $\epsilon$sized portion of our original population by invaders.
Puzzle: Show that $q$ is an evolutionarily stable strategy if and only these two conditions hold for all mixed stategies $p$:
$$ q \cdot A q \ge p \cdot A q$$
and also, for all $p \ne q,$
$$ q \cdot A q = p \cdot A q \; \Rightarrow \; q \cdot A p > p \cdot A p$$
The first condition says that $q$ is a symmetric Nash equilibrium. In other words, the invaders can't on average be better playing against the original population than members of the original population are. The second says that if the invaders are just as good at playing against the original population, they must be worse at playing against each other! The combination of these conditions means the invaders won't take over.
Again, I should do some examples... but instead I'll refer you to page 9 of Sandholm's paper, and also these course notes:
Now comes the punchline... but with a slight surprise twist at the end. Last time we let
$$ P = (P_1, \dots , P_n)$$
be a population that evolves with time according to the Lotka–Volterra equation, and we let $p$ be the corresponding probability distribution. We supposed $q$ was some fixed probability distribution. We saw that the relative information
$$ I(q,p) = \displaystyle{ \sum_i \ln \left(\frac{q_i}{ p_i }\right) q_i } $$
obeys
$$ \displaystyle{ \frac{d}{dt} I(q,p) = (p  q) } \cdot f(P) $$
where $f(P)$ is the vector of fitness functions. So, this relative information can never increase if
$$ (p  q) \cdot f(P) \le 0 $$
for all $P$.
We can adapt this to the special case we're looking at now. Remember, right now we're assuming
$$ \displaystyle{f_i(P_1, \dots, P_n) = \sum_{j=1}^n A_{ij} p_j }$$
so
$$ f(P) = A p $$
Thus, the relative information will never increase if
$$ (p  q) \cdot A p \le 0$$
or in other words,
$$ \forall p \quad q \cdot A p \ge p \cdot A p \quad \qquad \qquad \qquad \qquad \qquad \quad (1) $$
Now, this looks very similar to the conditions for an evolutionary stable strategy as stated in the Puzzle above. But it's not the same! That's the surprise twist.
Remember, the Puzzle says that $q$ is an evolutionarily stable state if
$$ \forall p \quad q \cdot A q \ge p \cdot A q \quad \qquad \qquad \qquad \qquad \qquad \quad (2)$$
and also
$$ \forall p \ne q \quad q \cdot A q = p \cdot A q \; \Rightarrow \; q \cdot A p > p \cdot A p \qquad \; (3)$$
Note that condition (1), the one we want, is neither condition (2) nor condition (3)! This drove me crazy for almost a day.
I kept thinking I'd made a mistake, like mixing up $p$ and $q$ somewhere. You've got to mind your p's and q's in this game!
But the solution turned out to be this. After Maynard Smith came up with his definition of 'evolutionarily stable state', another guy came up with a different definition:
For him, an evolutionarily stable strategy $q$ is one such that
$$ \forall p \quad q \cdot A q \ge p \cdot A q \quad \qquad \qquad \qquad \qquad \qquad \quad (2)$$
and also
$$ \forall p \quad q \cdot A p \ge p \cdot A p \quad \qquad \qquad \qquad \qquad \qquad \quad (1) $$
Condition (1) is stronger than condition (3), so he renamed Maynard Smith's evolutionarily stable strategies weakly evolutionarily stable strategies. And condition (1) guarantees that the relative information $I(q,p)$ can never increase. So, now we're happy.
Except for one thing: why should we switch from Maynard Smith's perfectly sensible concept of evolutionarily stable state to this new stronger one? I don't really know, except that
and
So, it's a small mystery for me to mull over. If you have any good ideas, let me know.
You can read a discussion of this article on Azimuth, and make your own comments or ask questions there!
