Price 1.0

A very simple model

Suppose that we have a population of 2 individuals. We depart from a population in which individual no 1 has "the" gene, while the other, individual no 2, does not. In Price equation terms, that means that the variable qi, which denotes the frequency of the gene in individual i becomes: q1 = 1, q2 = 0.

The next generation of 2 individuals is drawn in a very simple way. First we draw the first, and then we draw the second. That makes it a Wright-Fisher process. At both draws the new individual could be the offspring of individual no 1 or of individual no 2. Reproduction is asexual, hence the offspring is a perfect copy of its parent. Now here comes the important thing. We assume that at both draws, the probability of no 1 from the parent population - the one with the gene - to be drawn for reproduction is p, while the probability of no 2 from the parent population - the one without the gene - to be drawn for reproduction is 1 − p. That gives us four possible transitions.

Four possible transitions

What the new generation will be is a random variable. What the current generation is, is not; the current generation is simply given. We can however think of a hypothetical chance experiment such that, in combination with the actual random transition, we get two random variables, between which we can define a covariance. So here is the recipe.

1) Draw one of the individuals from the parent population (this is the hypothetical random variable).

2) Draw the next generation (this is the actual random thing that happens in the the transition).

The random variable X is now defined as the genotype of the parent. That is a simple, albeit hypothetical random variable:

$\mathbb{P}\left( X=1\right) =\mathbb{P}\left( X=0\right) =\frac{1}{2}$.

Then we define the random variable Y, which is the number of offspring that the parent that has been drawn begets. This is a slightly more complicated random variable, because the chances depend on which one of the two possible parents has been chosen. So we get a (not too long) list of conditional probabilities.

$\mathbb{P}\left( Y=0 | X=0\right) =p^{2}$

$\mathbb{P}\left( Y=1 | X=0\right) = 2p \left( 1-p\right)$

$\mathbb{P}\left( Y=2 | X=0\right) = \left( 1-p\right) ^{2}$

The other conditional probabilities are the reverse, because the parent of a new individual is the one if it is not the other, and vice versa.

$\mathbb{P}\left( Y=0|X=1\right) = \left( 1-p\right) ^{2}$

$\mathbb{P}\left( Y=1|X=1\right) =2p\left( 1-p\right)$

$\mathbb{P}\left( Y=2|X=1\right) =p^{2}$

The X and Y are pretty well defined, so we can also compute the covariance of these two actual random variables.

Let us first look at expectations...

${E}\left[ X\right]=\frac{1}{2}$

[show details]

For obvious symmetry reasons, the expectation of Y is 1; the total number of offspring is 2, and either one could be the parent, so ...

$\mathbb{E}\left[ Y\right] =1$

[show details]

... and ...

$\mathbb{E}\left[ XY\right] = p$

[show details]

The covariance is then

$Cov\left( X,Y\right) =\mathbb{E}\left[ XY\right] -\mathbb{E}\left[ X\right] \mathbb{E}\left[ Y\right] =p-\frac{1}{2}\cdot 1=p-\frac{1}{2}$

The Price equation for this very simple model

An interesting question would be: is it good to have the gene?

Or, if we do statistics: suppose we know that the process has this form, but we don't know the value of p. Then the question becomes: is p>1/2, or, equivalently, is Cov( X,Y) >0? Below we will try to answer that question. But first we will look at what the Price equation does to all of this.

For a given transition, the Price equation constitutes an identity. The frequency of the gene in the parent population is, in general, defined as

$Q_{1}=\frac{1}{N}\sum_{i}q_{i}$

where N is the number of individuals, and qi is the frequency of the gene in individual i (For this particular model this quantity is 0.5).

The frequency of the gene in the offspring population is

$Q_{2}=\frac{\sum_{i=1}^{N}z_{i}q_{i}}{\sum_{i=1}^{N}z_{i}}$

where zi is the number of times individual i in the parent population was drawn for reproduction. The Price equation 1.0 then is the following identity:

$\triangle Q=Q_{2}-Q_{1}=\frac{N}{\sum_{i}z_{i}}\left[ \frac{\sum_{i}z_{i}q_{i}}{N}-\left( \frac{\sum_{i}z_{i}}{N}\right) \left( \frac{\sum_{i}q_{i}}{N}\right) \right]$

[show details]

In this simple case for N = 2 this amounts to

$\triangle Q=\left[ \frac{\sum_{i}z_{i}q_{i}}{2}-\frac{\sum_{i}q_{i}}{2}\right]$

Confusion

The point where confusion starts, is that George Price named the right hand side of this equation Cov(z,q). The claim of the Price equation - which we contend - is that the equation unveils that a change in frequency - $\triangle Q$ - is explained by the covariance between having the gene and the number of offspring - Cov(z,q). The central issue here is that this Cov(z,q) is discussed as if it were a real covariance. Is that correct?

Try yourself!

If you tried a few times, you may have found out that $\triangle Q$ can attain different values; the next generation is a random draw, which implies that the frequency Q can go up by 1/2, remain what it is, or go down by 1/2. The value of this so-called Covariance just follows that change in frequency (and because it is not a real covariance, we will use quotation marks from now on). This creates an illusion of understanding, because whatever happens, the covariance changes along with the outcome of the chance experiment. In other words: the data explain the data.

This is, of course, not OK; the true and well-defined covariance is

$Cov\left( X,Y\right) = p-\frac{1}{2}$

This does not depend on the realisation. Cov(X,Y) defines the properties of the chance experiment, and the realisation of this chance experiment defines "Cov(z,q)".

What would a probability theorist do?

A probability theorist would try to figure out the transition probabilities for the suggested model. That is exactly what we did in the description of the model, where we determined the transition probabilities and Cov(X,Y).

What would a statistician do?

If you are interested in statistics, which is only natural if you care for evolution, and you do not know the true model that generates the transitions, then you would want to try to answer the question what p is. Is p>1/2? Then having the gene is good for reproduction. Is p<1/2? Then not. Is p=1/2? Then it is good nor bad. So how do we find out? Well, the thing with statistics is that you will never find out for sure. All we can do is try to find ways to estimate this p, or design tests on if and how p differs from 1/2, that will help us make statements about the true model with some confidence. Here are a few such statements.

Estimation of p

One of the reasons that so many are convinced of the value of the Price equation is that it almost hits the nail on the head (even though it misses the head completely). So here we go with what sometimes feels as an indication that the Price equation is not so wrong after all. If you sense this in yourself, just return to the interactive part above.

If you draw a new generation, then that new generation comes with a "Cov(z,q)" . With probability p2 you draw two 1's, which implies "Cov(z,q)"=+1/2. With probability (1 − p)2 you draw two 0's, which implies "Cov(z,q)"=-1/2. So"Cov(z,q)" is actually a random variable itself, the characteristics of which we can look at. Two natural characteristics are its expected value and its variance. Now the expectation of this so-called variance is, luckily,

$\mathbb{E}\left[ ^{\prime \prime }Cov\left( z,q\right) ^{\prime \prime }\right] =p-\frac{1}{2}.$

[show details]

Unfortunately, the realisation varies quite a bit, as we have seen in interactive part above. This is reflected in the variance of "Cov(z,q)";

$Var\left[ ^{\prime \prime }Cov\left( z,q\right) ^{\prime \prime }\right] = \frac{1}{2}p\left( 1-p\right)$

[show details]

This variance can be reduced though if we take a large sample. Suppose that we have individuals i=1,...,n with the gene, and individuals i=n+1,...,2n without it. Now we draw the next generation in the same way as before, which looks a bit more general; 2n individuals are drawn, and at every draw individual i has a probability with which it is drawn to reproduce. For i=1,...,n this probability is $\frac{p_{1}}{np_{1}+np_{0}}$.

For i=n+1,...,2n this probability is $\frac{p_{0}}{np_{1}+np_{0}}$. For this transition, one can show that

$\mathbb{E}\left[ \frac{\sum_{i=1}^{2n}z_{i}q_{i}}{2n}-\frac{\sum_{i=1}^{2n}q_{i}}{2n}\right] =p-\frac{1}{2}$

and

$\lim_{n\rightarrow \infty }Var\left[ \frac{\sum_{i=1}^{2n}z_{i}q_{i}}{2n}-\frac{\sum_{i=1}^{2n}q_{i}}{2n}\right] =0$

If we indeed define

$^{\prime \prime }Cov\left( z,q\right)^{\prime \prime }$ as $\frac{1}{2n}\left(\sum_{i=1}^{2n}z_{i}q_{i}- \sum_{i=1}^{2n}q_{i}\right)$

then

$^{\prime\prime }Cov\left( z,q\right) ^{\prime \prime }+\frac{1}{2}$

is an unbiased estimator of p - because

$\mathbb{E}\left[ ^{\prime \prime }Cov\left( z,q\right) ^{\prime \prime }\right] =p-\frac{1}{2}$.

The quantity $^{\prime\prime }Cov\left( z,q\right) ^{\prime \prime }$ is also regularly referred to as the sample covariance. It is very important to realise that this is not the true covariance; the covariance is a fixed property of two random variables, while the sample covariance is a random variable itself.

Testing properties of p

The sample covariance is typically also used as input for in tests of the type:

$H_{0} : p=\frac{1}{2}$

$H_{1} : p\neq \frac{1}{2}.$

So how does the Price equation help here?

Not. For determining the properties of the model, the Price equation did not help, and in fact cannot even be of use. We found the transition probabilities just by doing probability calculations. For doing statistics the Price equation did not help either. And yes, the Price equation applied to one actual draw does have the sample covariance of that draw in it, and the sample covariance is a useful thing, but we do not need the Price equation for knowing that the sample covariance is useful. The sample covariance has been around since the 1880's, when Karl Pearson (1857 – 1936) introduced it, along with the distinction between covariance and sample covariance. It was there well before George Price was born, and it is used in parameter estimation and hypothesis testing in all other fields of science, where no one has ever heard of G.R. Price. And, most importantly, using the sample covariance is obviously not the same as using the Price equation. It is using the sample covariance.