3. Inbreeding

Inbreeding occurs when mates are more closely related than they would be if they had been chosen at random from the population. Related individuals have one or more ancestors in common, so the extent of inbreeding is related to the amount of ancestry that is shared by the parents of the inbred individuals. Alternatively stated, the degree of inbreeding of an individual is determined by the proportion of genes that his parents have in common.

An immediate consequence of this sharing of parental genes is that the inbred individual will frequently inherit the same gene from each parent (identical by descent). Thus inbreeding increases the amount of homozygosity. One observable effect is that recessive genes, previously hidden by heterozygosity with dominant alleles, will be expressed.

Since most such genes are harmful in one way or another, inbreeding usually leads to a decrease in size, fertility, vigor, yield, and fitness (inbreeding depression). There may also be loci segregating in a population where a heterozygote is fitter than either homozygote. In this case, inbreeding also leads to decreased fitness.

Another consequence of consanguineous mating is greater genetic variability within the population due to gene concentration in individuals. This often increases phenotypic variability. Inbreeding may follow two patterns: random consanguineous mating within a population or subdivision into smaller groups.

An extreme example is continued self-fertilization, dividing the population into one-individual subpopulations. Similarly, repeated sib mating may lead to isolated populations of size two. Random mating within isolated subpopulations increases overall homozygosity. Despite random mating, homozygosity increases due to drift in allele frequencies.

As an extreme case, self-fertilization (random combination of gametes in a population of one) causes gene frequencies at previously heterozygous loci to shift from $\frac{1}{2}$ to 0 or 1.

Inbreeding can be measured using Wright's (1922) inbreeding coefficient $f$, which reflects the proportion of heterozygosity lost or the probability that two alleles at a locus are identical by descent. Other properties of the population can also be related to $f$.

3.1 Decrease in Heterozygosity with Inbreeding

The effect of continued inbreeding is best illustrated through self-fertilization. In such populations, homozygous parents produce identical progeny, and heterozygous parents produce $\frac{1}{2}$ heterozygotes and $\frac{1}{4}$ each of the two homozygotes.

\[ H_t = \frac{H_0}{2^t} \]

And the homozygosity increases as:

\[ \begin{align} \text{Proportion AA} &= D + \frac{H}{2} \\ \text{Proportion aa} &= R + \frac{H}{2} \end{align} \]

While genotype frequencies change with inbreeding, allele frequencies remain constant. Inbreeding restructures but does not shift the gene pool:

\[ \begin{array}{c c c c} \hline \text{Generation} & AA & Aa & aa \\ \hline 0 & D & H & R \\ 1 & D+\frac{H}{4} & \frac{H}{2} & R+\frac{H}{4} \\ 2 & D+\frac{3H}{8} & \frac{H}{4} & R+\frac{3H}{8} \\ 3 & D+\frac{7H}{16} & \frac{H}{8} & R+\frac{7H}{16} \\ 4 & D+\frac{15H}{32} & \frac{H}{16} & R+\frac{15H}{32} \\ \text{Limit} & D+\frac{H}{2} & 0 & R+\frac{H}{2} \\ \hline \end{array} \]

If $H_0$ is the initial heterozygosity, then after $t$ generations:

\[ \frac{H_0}{2^t} \]

For a panmictic population with genotypes $AA, Aa, aa$ in proportions $p^2, 2pq, q^2$, the individual lines become homozygous with frequencies $p$ and $q$. There is no change in allele frequency—only the genotype composition changes. Less extreme forms of inbreeding show similar, slower patterns.

For continued brother-sister mating:

\[ \begin{array}{c|c|c|c|c} \hline \text{Generation} & \text{Relative Heterozygosity} & \text{Decrease in Heterozygosity} & \text{Rate}~\frac{H_t}{H_{t-1}} & \frac{H_{t-1}-H_t}{H_{t-1}} \\ \hline 0 & 1 & 0 & 1 & 0 \\ 1 & \frac{2}{2} & 0 & 1 & 0 \\ 2 & \frac{3}{4} & \frac{1}{4} & 0.750 & 0.250 \\ 3 & \frac{5}{8} & \frac{3}{8} & 0.833 & 0.167 \\ 4 & \frac{8}{16} & \frac{8}{16} & 0.800 & 0.200 \\ 5 & \frac{13}{32} & \frac{19}{32} & 0.812 & 0.188 \\ 6 & \frac{21}{64} & \frac{43}{64} & 0.808 & 0.192 \\ \text{Limit} & 0 & 1 & \lambda=0.809 & 0.191 \\ \hline \end{array} \]

The heterozygosity follows a Fibonacci pattern in the numerator with denominator doubling each generation. Wright (1951) called $\frac{H_t}{H_0}$ the panmictic index $P$; then $1 - P = f$, the inbreeding coefficient. We use lowercase $f$ here to reserve $F$ for multilocus cases. The ratio $\frac{H_t}{H_{t-1}}$ quickly approaches a limit $\lambda$.

Feature	Selfing	Sib mating
Heterozygosity decline	Halves each generation	Declines ~20%/generation
Genotype frequency change	Fast	Slower
Allele frequency change	None	None
Population structure effect	Becomes a set of fully homozygous lines	Subgroups drift toward homozygosity

Generation	Relative Heterozygosity (H_t / H_0)	Decrease in Heterozygosity (f)	Rate of Change (H_t / H_{t-1})	Delta Rate (1 - H_t / H_{t-1})

3.2 Wright's Inbreeding Coefficient, $f$

Wright's (1922) original derivation of the inbreeding coefficient, $f$, was through correlation analysis. An alternative approach using only probability rules has been developed by Haldane and Moshinsky (1939), Cotterman (1940), and Malécot (1948). They distinguish between two ways in which an individual can be homozygous for a given locus. The two homologous genes may be:

Alike in state, that is to say, indistinguishable by any effect they produce (or perhaps, when molecular genetics has become sufficiently precise, alike in their nucleotide sequence).
Identical by descent, in that both are derived from the same gene in a common ancestor.

We follow the notation of Cotterman in designating an individual whose two homologous genes are identical by descent as autozygous. If the two alleles are of independent origin (as far as known from our pedigree information), the individual is allozygous. The effect of inbreeding is to increase that part of the homozygous without being autozygous, if the two homologous genes are alike in state but not identical by descent. Conversely, an autozygous individual can be heterozygous for this locus if one of the two alleles has mutated since their common origin, although this is negligibly rare if only a small number of generations is being considered. The inbreeding coefficient, $f$, is defined as the probability that the individual is autozygous for the locus in question. Alternatively stated, it is the probability that a pair of alleles in the two gametes that unite to form the individual are identical by descent. An individual with inbreeding coefficient $f$ has a probability $f$ that the two genes at a particular locus are identical and a probability $1-f$ that they are not identical, and therefore independent. If they are independent, the frequencies of the genotypes will be given by the binomial formula. If they are identical, the frequencies of the gene pairs will be simply the frequencies of the alleles in the population. Thus, for two alleles, $A_1$ and $A_2$, with frequencies $p_1$ and $p_2 \: (p_1 + p_2 = 1)$, the genotype frequencies are:

\[ \begin{array}{cc} \hline & \text{Allozygous} & \text{Autozygous} \\ \hline \text{Homogygous, } A_1A_1: & p_1^2(1-f) & p_1 f \\ \text{Homogygous, } A_1A_2: & 2p_1p_2(1-f) & \\ \text{Homogygous, } A_2A_2: & p_2^2(1-f) & p_2f \\ \text{Total} & 1-f & f \\ \hline \end{array} \]

Notice that when $f = 0$ these formulas reduce to the usual Hardy-Weinberg proportion. When $f=1$ the population is completely homozygous. Thus $f$ ranges from 0 in a randomly mating population to 1 with complete homozygosity. How to compute $f$ from a pedigree will be shown later. Multiple alleles introduce no difficulty. The genotype frequencies are a natural extension of the results for two alleles. The frequencies are:

\[ A_iA_i: p_i^2 (1-f) + p_i f \]

For homozygous genotypes, and:

\[ A_iA_j: 2p_ip_j (1-f) \]

For heterozygous genotypes. The expected proportion of heterozygous genotypes with inbreeding coefficient $f$, $H_f$, is given by:

\[ H_f = \sum_{i \ne j} p_i p_j (1-f) = H_0 (1-f) \]

Given that:

\[ f = \frac{H_0 - H_f}{H_0} \]

Where $H_0$ is a constant equal to the proportion of heterozygotes expected with random mating $(f=0)$. The summation is over all combinations of values of $i$ and $j$ except when these are equal. This proves the assertion made earlier that the inbreeding coefficient measures the fraction by which the heterozygosity has been reduced. We have written the formula as if, when $f=0$, the population is in Hardy-Weinberg proportions. However, for any measured $f$ (as determined, for example, from a pedigree), the heterozygosity, $H$, is $H_0(1-f)$, where $H_0$ is whatever the heterozygosity would have been in the absence of the observed inbreeding. To be concrete, the inbreeding coefficient for the child of a cousin marriage is $\frac{1}{16}$ (as we shall show later); therefore the child of such a marriage is $\frac{15}{16}$ as heterozygous as if his parents had the same relationship as a random pair in this population. There is a simple relationship between the correlation coefficient, $r$, and the inbreeding coefficient, $f$. If we assign numerical values to each allele, then the inbreeding coefficient, $f$, is the correlation between these values in a pair of uniting gametes. In fact, Wright's original derivation of the inbreeding coefficient was through correlation methods. The relationship between $r$ and $f$ can be shown in the following way. For convenience, we assign the value 1 to $A_1$ and 0 to allele $A_2$, though we would get the same result with any values. The calculations are shown below. Since the sum of the genotype frequencies is equal to 1, the weighted sum and the mean of any are the same. For example, the sum (and mean) of egg value, $X$, is $[p_2^2(1-f)+p_2f](0)+[p_1p_2(1-f)](1)+ [p_1p_2(1-f)](0)+[p_1^2(1-f)+p_1f](1)$, which after some algebraic simplification reduces to $p_1$. The other calculations are given in the table, using the standard formula for calculation of $r$.

\[ \begin{array}{c c c c c c c c} \hline \text{Egg} & \text{Sperm} & \text{Frequency} & X & Y & X^2 & Y^2 & XY\\ \hline A_1 & A_2 & p_2^2(1-f)+p_2f & 0 & 0 & 0 & 0 & 0 \\ A_1 & A_2 & p_1p_2(1-f)& 1 & 0 & 1 & 0 & 0\\ A_2 & A_1 & p_2p_1(1-f) & 0 & 1 & 0 & 1 & 0\\ A_1 & A_1 & p_1^2(1-f)+p_1f & 1 & 1 & 1 & 1 & 1\\ \hline \text{Sum or mean} & 1 & p_1 & p_1 & p_1 & p_1 & p_1 & p_1^2(1-f)+p_1f\\ \hline \end{array} \] \[ \bar X = p_1 p_2(1 - f) + p_1^2 (1 - f) + p_1 f = p_1 \]

Likewise, $\hat Y = \text{mean}(X^2) = \text{mean}(Y^2) = p_1$

\[ \begin{align} r_{xy} &= \frac{\bar{XY}-\bar X \bar Y}{\sqrt{(\text{mean}(X^2)- \bar X^2)(\text{mean}(Y^2)- \bar Y^2)}} \\ r_{xy} &= \frac{p_1^2(1-f)+p_1f-p_1^2}{p_1 - p_1^2} = f \end{align} \]

Demonstration without restriction as to number of alleles and letting the contribution of the alleles differ. Furthermore, if the genic values are summed over $k$ loci the covariance will be:

\[ f \sum_k \sum_i p_{ik} a_{ik}^2 \]

Where $p_{ik}$ and $a_{ik}$ are the frequency and value of the $i$th allele at the $k$th locus. This is $f$ times the variance. Hence $f$ is the expected value of the correlation between the genetic values of two uniting gametes, regardless of the number of loci and number of alleles under consideration. The equivalence of $r$ and $f$ suggests an interpretation of the correlation coefficient. If a measurement can be thought of as being the sum of a number of elements, then the correlation coefficient is the measure of the fraction of these elements that are common to the two measurements, the other elements being chosen at random. This interpretation is useful in many branches of science. In quantitative genetics, the elements can obviously be interpreted as cumulative acting genes.

\[ V_x = \sum_i p_i a_i^2 \]

Since the variance of the egg value is the sum of the squares of the allele values, each weighted by its frequency. $V_y$ is the same.

\[ COV_{XY} = (1-f) \left[ \sum_i p_i^2 a_i^2 + \sum_{i \ne j }p_i p_j a_i a_j\right] + f \sum_i p_i a_i^2 \]

But the quantity in brackets is equal to $\left[\sum_i p_i a_i \right]^2$ which is equal to 0, because the sum of the deviations from the mean is 0. Therefore,

\[ COV_{XY} = f \sum_i p_i a_i^2 \]

The correlation coefficient, being the ratio of the covariance to the geometric mean of the two variances (which in this case are the same), is $f$, as was to be shown:

\[ r_{XY} = \frac{COV_{XY}}{V_x} = f \]

In the following plot we can observe the equivalence of Inbreeding Coefficient (f) and Correlation Coefficient (r).

3.3 Coefficients of Consanguinity and Relationship

We have used the inbreeding coefficient of an individual $ I $, denoted $ f_I $, to give the probability that two homologous genes in that individual are identical by descent. Or, as just shown, this is the correlation between the genetic value of the two gametes that united to produce the individual. Since inbreeding of the progeny depends on the consanguinity of the parents, we can use the inbreeding coefficient as a measure of this.

We define the coefficient of consanguinity, $ f_{IJ} $, of two individuals $ I $ and $ J $ as the probability that two homologous genes drawn at random, one from each of the two individuals, will be identical. The answer to this is clearly the same as the inbreeding coefficient of a progeny produced by these two individuals. Hence, the inbreeding coefficient of an individual is the same as the coefficient of consanguinity of its parents (Malécot, 1948).

There is a bewildering plethora of alternative names for this coefficient. Malécot, who introduced the idea, called it the coefficient de parenté. Falconer (1960) calls it the coancestry. Kempthorne (1957) translated parenté into parentage. Malécot himself has, on at least one occasion, translated it into kinship.

A different measure of relatedness, introduced much earlier and still widely used, is Wright's (1922) coefficient of relationship, $ r_{IJ} $, defined as:

\[ r_{IJ} = \frac{2f_{IJ}}{\sqrt{(1+f_I)(1+f_J)}} \]

For two individuals that are not inbred, the coefficient of relationship is exactly twice the coefficient of consanguinity.

As we shall show later, the coefficient of relationship is the correlation between the genic, or genetic, values of the two individuals. If the genes act without dominance or epistasis, and there is no effect of the environment on the trait being measured, this is the expected correlation. We shall also show later the effect of dominance on the correlation between relatives.

Pedigree Tree

IBD simulator and genetic relationship

Insert your pedigree (JSON format):

Pedigree Simulation

Pedigree JSON:

Number of loci:

Results

3.4 Computation of \ f \ from pedigrees

The procedure for computing the inbreeding or consanguinity coefficient from a pedigree follows directly from the definition of $f$. In this pedigree individual $I$ is inbred because both his parents are descended from a single common ancestor, $A$. All unrelated ancestors, which are irrelevant to the inbreeding of $I$ are omitted from the pedigree. We ask for the probability that $I$ is autozygous; i.e., that homologous genes contributed to $I$ by gametes $b$ and $e$ are both descended from the same gene in ancestor $A$. We shall use the notation $Prob(c=b)$ to mean the probability that $c$ and $b$ carry identical genes for the locus under consideration. $Prob(b=c)=1/2$, since the gene in $b$ has an equal chance of having come from $C$ or from $B$'s other parent. Likewise, $Prob(c=a)=1/2$. The probability that $a$ and $a's$ carry identical genes may be obtained as follows: Let the two alleles in $A$ be called $W$ and $Z$. Then there are four equally likely possibilities for gametes $a$ and $a$'s: (1) $W$ and $W$, (2) $Z$ and $Z$, (3) $W$ and $Z$, and (4) $Z$ and $W$. In the first two cases they are identical, so the probability is $1/2$ that $a$ and $a's$ get the same gene from $A$. However, there is an additional possibility if $A$ is inbreed, for in this case two alleles $W$ and $Z$ may both be descended from some more remote ancestor not shown in the figure. The probability that $A$ is autozygous, is, by definition, in inbreeding coefficient of $A$, $f_A$. If $A$ is inbreed

\[ Prob(a=a')=\frac{1}{2} + \frac{1}{2}f_A=\frac{1}{2}(1+f_A) \]

If $A$ is not inbreed

\[ Prob(a=a')=\frac{1}{2} \]

Continuing around the path $BCADE$, $Prob(a'=d)=Prob(d=e)=1/2$. Summarizing, $b$ and $e$ will carry identical genes only if $b, c, a, a', d, and e$ do so. Therefore, since all these probabilities are independent

\[ f_I = f_{BE}=Prob(b=e)=\frac{1}{2} \cdot \frac{1}{2} \cdot \frac{1}{2}(1+f_A) \cdot \frac{1}{2} \cdot \frac{1}{2}= \left(\frac{1}{2}\right)^5(1+f_A) \]

For b=c, c=a, a=a', a'=d, d=e, respectively. If $A$ is not inbred (and according to information given in this pedigree she is not) the inbreeding coefficient of I is simple (1/2)^5. Notice that whether B,C,D, E is inbred is irrelevant, since, for example, the probability that $c$ and $b$ are identical is independent of the gene contributed by $B$'s other parent. The general rule is that the contribution of a path of relationship through a common ancestor is $(1/2)^n (1+f_a)$ where $n$ is the number of individuals in the path from one parent to the ancestor and back through the other parent. In more complicated pedigrees there may be multiple paths through an ancestor or more than one common ancestor. The contributions to the inbreeding coefficient of I from the various paths are as follows. The common ancestor in a path is underlined.

\[ \begin{array}{|c|c|} \hline \text{Path} & \text{Contribution to f} \\ \hline ABD\underline{E}GH & \left(\frac{1}{2}\right)^6 = \frac{1}{64} \\ ACD\underline{E}GH & \left(\frac{1}{2}\right)^6 = \frac{1}{64} \\ ABD\underline{F}H & \left(\frac{1}{2}\right)^5 = \frac{1}{32} \\ ACD\underline{F}H & \left(\frac{1}{2}\right)^5 = \frac{1}{32} \\ \hline \end{array} \]

As we are considering only a single locus, the paths are all mutually exclusive; if $I$ is autozygous for a pair of genes inherited through one path it cannot at the same time be autozygous for a pair inherited through another. Therefore the total probability for autozygosity is the sum of the probabilities for the separate paths, in this case $3/32$. This pedigree was not complicated by the common ancestor of any path being inbred. Individual $A$ is inbred, but this is irrelevant since $A$ is not a common ancestor. Only inbreeding of $E$ or $F$ would matter. This complication arises in the pedigree, where there are several inbred individuals in the pedigree and two of these, $B$ and $D$, are common ancestors of one or more paths. We begin by noting the inbreeding coefficients of $D$ and $B$. The inbreeding coefficient of $D$, $f_D$, is $\left(\frac{1}{2}\right)^3(1+f_D)=\frac{9}{64}$. The components of $f_I$ through the various paths are:

\[ \begin{array}{|c|c|c|} \hline \text{Path} & \text{Contribution to f} & \text{Value} \\ \hline AEF & \left(\frac{1}{2}\right)^3 & 0.1250 \\ ABF & \left(\frac{1}{2}\right)^3(1+f_B) & 0.1426 \\ ABEF & \left(\frac{1}{2}\right)^4 & 0.0625 \\ AEBF & \left(\frac{1}{2}\right)^4 & 0.0625 \\ ABCDEF & \left(\frac{1}{2}\right)^6(1+f_D) & 0.0176 \\ AEDCBF & \left(\frac{1}{2}\right)^6(1+f_D) & 0.0176 \\ \hline & f_I = f_{AF} & 0.4278 \\ \hline \end{array} \]

Notice that the inbreeding of $B$ is taken into consideration in path $ABF$ where B is a common ancestor, but ignored in the other paths where $B$ is not the common ancestor. A path such as $ABCDEBF$ is not included because $B$ enters twice; the contribution to this path is included in the path $ABF$ by the term $(1+f_B)$. To summarize: The inbreeding coefficient of an individual $I$, or the coefficient of consanguinity of his parents, $J$ and $K$, is the sum of a series of terms, one for each path leading from a parent to a common ancestor and back through the other parent. The general formula is

\[ f_I = f_{JK} = \sum\left[\left(\frac{1}{2}\right)^n(1+f_A)\right] \]

Where the summation is over all possible paths, $n$ is the number of individuals in the path (counting $J$ and $K$ but not $I$) and $f_A$ is the inbreeding coefficient of the common ancestor at the apex of this path. A path cannot pass through the same individual twice. No reversal of direction is permitted except at the common ancestor; always go against the arrows when going from one parent to the ancestor, and with them when coming back through the other. It is helpful in avoiding counting the same path twice to adopt the convention of starting all paths with the same parent (the male, say) and ending with the other. In the earlier literature the procedure given for computing $f$ was to count the number of steps between individuals in a path rather than the number of individuals. The results are of course the same either way. We use it because it follows more naturally from our derivation than the earlier form, and because it is easily adapted to X-linked genes. An X-chromosome gene that is in a gamete produced by a male must be the same as was in the egg from which this male came. Therefore the probability of identity by descent in these two gametes is $1$, rather than $1/2$ as it would be for a female or an autosomal locus. Hence each male in a path multiplies the probability of identity through this path by $1$ rather than $1/2$ as it would be for a female or an autosomal locus. Hence each male in a path multiplies the probability of identity through this path by $1$ rather than $1/2$, and the effect is as if the males were not counted at all. Furthermore, a male does not receive an X-chromosome from his father, so a path involving two successive males makes no contribution to the probability of identity of X-chromosomal loci. Therefore the rule for obtaining the inbreeding coefficient for a sex-linked locus in females is: Proceed as usual except that only females in a path are counted and any path with two successive males is omitted entirely.

3.5 Phenotypic Effects of Consanguineous Matings

In the previous section, the frequency of the recessive gene causing phenylketonuric feeble-mindedness was given as approximately $1/100$. Therefore with random mating the frequency of persons homozygous for the gene is the square of this or $1/10000$. We now inquire how much this is enhanced with consanguineous marriage. The probability of an affected child is $p^2(1-f)+pf$ where $p$ is the frequency of the recessive allele and $f$ is the inbreeding coefficient of the child. If the parents are cousins, their coefficient of consanguinity of the inbreeding coefficient of their child is $1/16$. With $p=1/100$ and $f=1/16$, the expected frequency of homozygous recessives is $115/16000$ or approximately $7/10000$, a 7-fold increase compared with the risk when the parents are unrelated.

\[ \begin{array}{c|cc|c} \hline & \text{Frequency of affected when} & & \\ \text{Gene Frequency} & f=0 & f=1/16 & \text{Ratio} \\ \hline 0.1 & 0.01 & 0.016 & 1.6 \\ 0.01 & 0.0001 & 0.00072 & 7.2 \\ 0.005 & 0.000025 & 0.000335 & 13.4 \\ 0.001 & 0.000001 & 0.000063 & 63 \\ \hline \end{array} \]

What proportion of the persons affected with recessive traits come from consanguineous marriages? This proportion, $k$, may be obtained by dividing the number of affected from consanguineous marriages by the total number of affected. If $c$ is the proportion of consanguineous marriages in the population of size $N$, the number of affected from consanguineous marriages is $Nc[p^2(1-f)+pf]$, where $f$ is the coefficient of consanguinity of the parents and $p$ is the recessive gene frequency. The total number of affected in the population is $Nc[p^2(1-\bar{f})+p\bar{f}]$, where $\bar{f}$ is the mean inbreeding coefficient in the population. Therefore

\[ K = \frac{c[p^2(1-f)+pf]}{c[p^2(1-\bar{f})+p\bar{f}]} = \frac{c[p+(1-p)f]}{p-p\bar{f}+\bar{f}} \]

or, since $p\bar{f}$ is usually very small

\[ K = \frac{c[p+(1-p)f]}{p+\bar{f}} \]

approximately.

The most common consanguineous marriage is between first cousins. For $f=1/16$

\[ K = \frac{c(1+15p)}{16[p+\bar{f}(1-p)]} \]

or approximately

\[ \frac{c(1+15p)}{16(p+\bar{f})} \]

This shows that, even though consanguineous marriages are very rare, a substantial fraction of diseases caused by recessive genes comes from such marriages if $p \le 0.1$. Consanguinity of the parents is one of the strongest kinds of evidence of recessive inheritance. On the other hand, if the recessive gene is common, the increased incidence with consanguinity is very slight. Cystic fibrosis of the pancreas appears to be due to a simple recessive factor, yet there is no appreciable rise in incidence from consanguineous marriages, because the allele frequency is so high. The parental-consanguinity rate is much higher for recessive traits where the gene frequency is low. Most recessive genes are carried concealed in the heterozygous condition. We can get some idea of the total number of such genes carried by normal persons through a study of consanguineous marriages. A study by Jan A. Book showed that about 16% of the children of first cousin marriages in Sweden had a genetic disease, and if diseases of more doubtful etiology were included the number rose to 28%. The corresponding figures for the control population with unrelated parents were 4% and 6%. Thus cousin marriage, by these data, entails an increased risk of 12% to 22% of having a child with a detectable genetic defect. Since the child of a cousin marriage has an inbreeding coefficient of $1/16$, we reason that a completely homozygous individual would have 16 times as many diseases, or approximately 2 to 3.5. This is the number of recessive factors per gamete (since a homozygous individual may be regarded as a doubled gamete), so the number per zygote is between 4 and 7. These figures are based on rather limited data, but they furnish a rough idea of the magnitude. The conclusion is that the average human carries hidden the equivalent of some half a dozen deleterious recessive genes that, if made homozygous, would cause a detectable disease. We can also estimate the amount of genetic weakness that is carried hidden in a heterozygous individual, but which would be expressed as invariability if he were made homozygous. Sutter and Tabar (1958 and earlier) found from a demographic study in two rural provinces in France that children of cousin marriages died before adulthood about 25% of the time, whereas the death rate from unrelated parents was about 12%. Thus, in this environment, cousin marriage increased the risk of death by about 0.13. Making the same calculations above (i.e., multiplying by 16 x 2) we estimate that the average individual in this population carries 32 x 0.13 or about 4 hidden "lethal equivalents." We say "lethal equivalents" because one cannot distinguish between 4 full lethal genes and 8 genes with 50% probability of causing death, or any system where the product of the number of genes and the average effect is 4. The data on human inbreeding effects have not been very reproducible. The large body of data from Japan shows significant heterogeneity effects from city to city. There is danger of confounding inbreeding effects with the effects of social concomitants of consanguineous marriages. For all these reasons, we cannot place too much reliance on the numerical values of the previous paragraph. It is also to be expected that what is lethal in one environment may be detrimental in a better one. In much of the world, there has been a substantial rise in the standard of living and a decrease in the death rate. This means that the number of lethal equivalents is decreasing. In Drosophila, where the measures are precise and reproducible, there are about two lethal equivalents per fly. About $2/3$ of the viability depression from inbreeding is attributable to monogenic lethals; the rest is the cumulative effect of a much larger number of genes with individually small effects.

3.6 The Effect of Inbreeding on Quantitative Characters

We consider first a theoretical model that is applicable to any measurable trait, such as height, weight, yield, survival, or fertility. For initial simplicity, a single locus with only two alleles is assumed. The model is summarized as follows:

\[ \begin{array}{cc} \hline \text{Genotype} & A_1A_1 & A_1A_2 & A_2A_2 \\ \text{Frequency} & p_1^2(1-f)+p_1f & 2p_1p_2(1-f) & p_2^2(1-f)+p_2f \\ \text{Phenotype} & Y-A & Y+D & Y+A \\\hline \end{array} \]

In this model $ Y $ is the residual phenotype when A locus is not considered. Genotype $ A_2A_2 $ adds an amount $ A $ to the phenotype, and $ A_1A_1 $ subtracts an equal amount. (We could just as well assume that both genotypes add to the residual, or that both subtract; this model is chosen arbitrarily and for algebraic simplicity. The same result would be obtained in any case.) If there were no dominance, the phenotype $ A_1A_2 $ would be $ Y $. Under this circumstance, the amount by which the phenotype is changed by substituting an $ A_2 $ for an $ A_1 $ is always $ A $, which we can call the additive effect of the $ A $ locus. $ D $ is a measure of dominance. When $ D=0 $, there is no dominance; when $ D=A $, $ A_2 $ is completely dominant; when $ D=-A $, $ A_2 $ is completely recessive, or $ A_1 $ dominant; when $ D>A $, there is overdominance, the heterozygote having a higher phenotypic value than either homozygote. We now obtain an expression for the mean phenotype $ \bar{Y} $. This will be given by summing the products of each phenotype and its frequency. (Since the frequencies add up to one, the sum is the same as the mean.)

\[ \bar{Y} = (Y-A)[p_1^2(1-f)+p_1f] + (Y+D)2p_1p_2(1-f) + (Y+A)[p_2^2(1-f)+p_2f] \]

Which leads after some algebraic rearrangement to

\[ \begin{align} &\bar{Y} = Y + A(p_2 - p_1) + 2p_1p_2 \cdot D - 2p_1p_2 \cdot Df \\ &\bar{Y} = G - Hf \end{align} \]

When $ G = Y + A(p_2 - p_1) + 2 p_1 p_2 D $ and $ H = 2p_1 p_2 D $. $ G $ is the average phenotype with random mating and $ G-H $ is the average with complete homozygosity. Notice that $ H $ is positive if $ D $ is positive. This equation brings out two important facts about the phenotypic consequences of inbreeding. The first is that in the absence of dominance $ (D=0) $, there is no mean change with inbreeding. This means that whatever the level of dominance, as measured by $ D $, the change with inbreeding is proportional to the inbreeding coefficient. As long as $ D $ is positive (i.e., the heterozygote has a larger phenotype than the mean of the two homozygotes), inbreeding will produce a decline. There are two possible causes of the inbreeding decline that is so universally observed: (1) Favorable genes tend to be dominant or partially dominant $ (0 < A < D) $, and (2) the heterozygote has a higher phenotype than either homozygote, $ (0 < A < D) $. Notice that the observation of a linear decline in a quantitative trait cannot discriminate between these possibilities, for it would be expected with either type of gene action, or any mixture of the two. To discriminate between them will require other kinds of evidence. The extension to more than two alleles is straightforward and will be given here. We shall consider the extension to more than one locus. Consider a model with two loci, each with two alleles. We shall let $ A $ stand for the additive effect of the $ A $ locus and $ B $ for that of the $ B $ locus. If there is no interaction between loci $ A $ and $ B $, the model is as follows:

\[ \begin{array}{cc|ccc} \hline \text{Genotype} & & A_1A_1 & A_1A_2 & A_2A_2 \\ \hline & \text{Frequency} & p_1^2(1-f)+p_1f & 2p_1p_2(1-f) & p_2^2(1-f)+p_2f\\ \hline B_1B_1 & r_1^2(1-f)+r_1f & Y-A-B & Y+D_A-B & Y+A-B\\ B_1B_2 & 2r_1r_2(1-f) & Y-A+ D_B & Y+D_A+D_B & Y+A+D_B\\ B_2B_2 & r_2^2(1-f)+r_2f & Y-A+ B & Y+D_A+B & Y+A+B \\ \hline \end{array} \]

If the $ A $ and $ B $ loci are independent and in gametic phase equilibrium, the frequency of any of the nine classes in the table is given by the product of the frequencies at the borders. We get the mean phenotype as before, by multiplying each phenotype value within the table by its frequency and summing the nine products. After simplification this leads to

\[ \bar{Y} = Y+A(p_2 - p_1) + 2p_1p_2D_A + B(r_2 - r_1) + 2 r_1 r_2 D_B - 2(p_1 p_2 D_A + r_1 r_2 D_B)f \]

As before, there is a linear relation to the inbreeding coefficient (unless $ D_A = D_B = 0 $), as might be expected from knowledge that this is true for either locus by itself. We now complicate the model by assuming an interaction between the two loci. In population genetics, the word epistasis is given a meaning broader than its classical meaning so as to include all levels of nonadditive effects between loci. Any circumstance where a substitution at the $ A $ locus has a different effect depending on the genotype at the $ B $ locus is an example of epistasis. A simple way to construct such a model is to add interaction terms to each value already given so that the phenotypes are now:

\[ \begin{array}{cccc} \hline \text{Genotype} & A_1A_1 & A_1A_2 & A_2A_2 \\ \hline B_1B_1 & Y-A-B+I & Y+D_A-B-L & Y+A-B-I\\ B_1B_2 & Y-A+ D_B -K & Y+D_A+D_B+J & Y+A+D_B+K\\ B_2B_2 & Y-A+ B-I & Y+D_A+B+L & Y+A+B+I \\ \hline \end{array} \]

Notice that there are nine parameters, $ Y, A, B, D_A, D_B, I, J, K, L $ which correspond to the nine phenotypes so there is a complete specification of the phenotypes when the parameters are given, and vice versa.

\[ \bar{Y} = \sum (\text{frequency} \times \text{phenotype}) \]

The formula for $ \bar{Y} $ may be written as

\[ \bar{Y} = G - Hf + Mf^2 \]

Where

\[ \begin{align} G =\ & Y + A(p_2 - p_1) + B(r_2 - r_1) + 2 D_A \cdot p_1 p_2 + 2 D_B \cdot r_1 r_2 \\ & + I(r_1 - r_2)(p_1 - p_2) + 2K \cdot r_1 r_2 (p_2 - p_1) \\ & + 2L \cdot p_1 p_2(r_2 - r_1) + 4 J \cdot p_1 p_2 \cdot r_1 r_2 \end{align} \]

\[ M = 4J \cdot p_1 p_2 \cdot r_1 r_2 \]

\[ \begin{align} H =\ & 2\big[ p_1 p_2 D_A + r_1 r_2 D_B \\ & + p_1 p_2 (r_2 - r_1)L + r_1 r_2 (p_2 - p_1)K \\ & + 4 p_1 p_2 r_1 r_2 J \big] \end{align} \]

In this model, $ A $ and $ B $ represent the additive effects of the two loci, and $ D_A $ and $ D_B $ the dominance effects, as before. $ I $ is the effect of pure epistasis without dominance; in other words, the interaction of the additive effect of $ A $ with the additive effect of $ B $. $ K $ is a measure of interaction and dominance; it is the effect of the $ A $ locus on the dominance of the $ B $ locus. Likewise, $ L $ is the effect of the $ B $ locus on the dominance of the $ A $ locus. Another way of saying this is that this is the interaction of the additive effect of the $ B $ locus with the dominance effect of the $ A $. Finally, $ J $ is the epistatic effect of the two dominance; it is the interaction of dominance at the $ A $ locus with dominance at the $ B $ locus. From the equation, we can see that if all the terms involving dominance effects $ D_A, D_B, J, K, L $ are zero, there is no inbreeding effect since the coefficients of $ f $ and $ f^2 $ are $ 0 $. Thus epistasis alone, without dominance, does not produce an inbreeding decline. If $ J=0 $, the inbreeding change is linear in $ f $. In order for the coefficient of $ f^2 $ to be other than $ 0 $, there must be interaction between the two dominance effects. If $ A, B, D_A, D_B, K, L $ are all positive, then the genes with subscript 2 are associated with increased performance (or yield, or fitness, or whatever is being measured). This also generally means that the alleles with subscript 2 will be more frequent than those with subscript 1. Then if $ J $ is positive there will be diminishing epistasis during inbreeding. By this, we mean that the curve is concave upward and that homozygosity for two loci reduces performance by less than the sum of the individual effects. Conversely, if $ J $ is negative (assuming $ H>0 $), the epistasis is reinforcing. That is, the deleterious effect of two loci is more than cumulative. This is sometimes called synergistic.

3.7 Some Examples of Inbreeding Effects

The following examples illustrate numerical examples and show how specific levels of dominance and epistasis may be constructed by choosing appropriate values for the parameters. In these examples, the gene frequencies are all $ \frac{1}{2} $. The values are contrived so that fully homozygous individuals ($ f=1 $) have an average phenotype of 10.

Models 1 and 2 show that, whether or not there is epistasis, there is no inbreeding effect without dominance; the phenotype is 10, independent of $ f $. Models 3 and 4 show that the rate of decline with inbreeding by itself cannot distinguish among different levels of dominance. Model 5 is an extreme form of reinforcing epistasis; model 6 is an extreme form of diminishing epistasis. Models 6 and 9 show that quite different systems of gene action can give the same quadratic equation.

Models 7 and 8 show that epistasis, though a necessary condition for a nonlinear inbreeding effect, is not sufficient. In both, there is epistasis, yet the equations are the same; moreover, they are the same as models 3 and 4. Only when $ J \neq 0 $ is there a nonlinear effect, as models 5, 6, and 9 show. The examples below assume equal frequency for two alleles at each locus. Usually, this is not the case. Let us assume that the higher phenotypic value is desirable, which means that for most models $ A_2 $ and $ B_2 $ will tend to be more frequent than their alleles. Consider, as an example, that $ p_1 = r_1 = 0.1 $. Then the equations of models 5 and 6 become:

Model 5, complementary recessive genes: $ \bar{Y} = G - 0.007f - 0.032f^2 $
Model 6, complementary dominant genes: $ \bar{Y} = G - 0.356f + 0.032f^2 $

Model 5 shows that with a trait depending on simultaneous homozygosity for two rare recessives, there will be very little inbreeding effect when $ f $ is small—because the coefficient of $ f $ is small and $ f^2 $, to be appreciable, requires a larger coefficient to become important. Thus, for detrimental traits depending on multiple homozygosities, inbreeding effects tend to be nonlinear: very little effect of slight inbreeding, but with an accelerating effect at very high levels.

With model 6, on the other hand, the quadratic term never becomes important and the linear term dominates for all values of $ f $. Thus, for rare genes with duplicate effects, the inbreeding effect is linear for all practical purposes. When the trait considered is survival, it is often natural to measure epistasis as deviations from independence rather than from additivity. Survival probabilities are multiplicative if the genes act independently. It is often advantageous to transform to logs, or to measure fitness in Malthusian parameters.

We summarize this section by three conclusions:

If there is no dominance, there is no mean change in phenotype with inbreeding regardless of the amount of epistasis (models 1 and 2). If $ D_A, D_B, J $, and $ L $ are 0, then $ \bar{Y} $ is not a function of $ f $.
If there is dominance, but no epistasis, the effect of inbreeding on the phenotype is linear in $ f $. Usually, inbreeding leads to a change in quantitative measures, and if there is no epistasis, this change is proportional to $ f $ (models 3 and 4). If $ I=J=K=L=0 $, the term in $ f^2 $ drops out.
If there is both dominance and epistasis, the inbreeding effect may be quadratic in $ f $ (or higher order if more than two loci are involved). With reinforcing epistasis, the inbreeding effect is greater than if the loci were additive; with diminishing epistasis, the effect is less (models 5 and 6). However, the change in average phenotype with inbreeding is not necessarily quadratic (models 7 and 8); as long as $ J=0 $, the inbreeding effect is linear.

Reliable data on the results of inbreeding uncomplicated by the effects of selection are rare. There is also difficulty in choosing an appropriate scale of measurement if the linearity of the inbreeding effect is to be tested. Some of the best data come from maize. The yield was measured for inbred lines, crosses, and randomly mated progeny from a field of crosses.

	Hybrid average f = 0 (G)	Hybrid average f = 1 (G - H)	f	Expected Yield (G - Hf)	Observed Yield
10 two-way hybrids	62.8	23.7	0.500	43.3	44.2
4 three-way hybrids	64.2	23.8	0.375	49.1	49.3
10 four-way hybrids	64.1	25.0	0.250	54.3	54.0

Three kinds of crosses were tested: two-way, three-way, and four-way (or double-cross) hybrids. Let $ A, B, C, D $ stand for four lines that have been self-fertilized long enough to be regarded as completely homozygous.

Two-way crosses: first generation hybrids between two lines, e.g., $ A \times B $.
Three-way crosses: between a hybrid and a different line, e.g., $ (A \times B) \times C $.
Four-way crosses: between two hybrids, e.g., $ (A \times B) \times (C \times D) $.

In a field of two-way hybrids allowed to pollinate at random, the probability that two alleles have come from the same parental inbred line is $ \frac{1}{2} $. Assuming the parent line to be autozygous, two alleles from the same line are identical; hence, the progeny from random pollination have an inbreeding coefficient of $ \frac{1}{2} $. For a four-way cross, the probability is $ \frac{1}{4} $. For the three-way cross, the probability of two alleles both coming from $ C $ is $ \frac{1}{4} $, from $ A $ is $ \frac{1}{16} $, and from $ B $ is $ \frac{1}{16} $; $ f $ is the sum or $ \frac{3}{8} $. The data show close agreement with expected values. Since the inbreeding effect is so nearly linear with the inbreeding coefficient, this implies that epistasis is not very important in corn yield. Either the genes at different loci act additively on yield, or opposite interactive effects cancel each other.

3.8 Regular Systems of Inbreeding

The described methods can be used to derive recurrence relations for $ f $ in successive generations with regular systems of inbreeding.

1. Self-fertilization

\[ f_t = \frac{1}{2}(1 + f_{t-1}) \]

Where $ f_t $ is the inbreeding coefficient in generation $ t $ and $ f_{t-1} $ is the coefficient one generation earlier. To obtain the change in heterozygosity, we use the relation $ f_t = \frac{H_0 - H_t}{H_0} $, where $ H_t $ and $ H_0 $ are the proportions of heterozygosity in generation $ t $ and initially ($ t = 0 $). This leads to:

\[ H_t = \frac{1}{2} H_{t-1} \]

\[ H_t = \left( \frac{1}{2} \right)^2 H_{t-2} \]

\[ H_t = \left( \frac{1}{2} \right)^t H_0 \]

This confirms the result stated previously: with self-fertilization, the amount of heterozygosity is reduced by one-half each generation. After 10 generations, only $ \frac{1}{1024} $ of the loci that were previously heterozygous remain heterozygous.

2. Sib Mating

The inbreeding coefficient $ f_I $ is the probability that two homologous genes in an individual $ I $ are identical by descent. The coefficient of consanguinity $ f_{IJ} $ is the probability that two homologous genes, one from individual $ I $ and one from individual $ J $, are identical.

Let $ f_t $ be the inbreeding coefficient in generation $ t $ and $ g_t $ the coefficient of consanguinity between siblings. Then:

\[ f_t = g_{t-1} \]

\[ g_t = \frac{1}{4}(1 + f_{t-1}) + \frac{1}{2} g_{t-1} \]

Eliminating $ g $, we get:

\[ f_t = \frac{1}{4}(1 + 2f_{t-1} + f_{t-2}) \]

Letting $ f_t = \frac{H_0 - H_t}{H_0} $, we derive a recurrence for heterozygosity:

\[ H_t = \frac{1}{2} H_{t-1} + \frac{1}{4} H_{t-2} \]

If $ H_0 = H_1 = \frac{1}{2} $, we get the sequence: $ \frac{3}{8}, \frac{5}{16}, \frac{8}{32}, \frac{13}{64}, \ldots $

To generalize, let $ f_t = 1 - h_t $ and $ g_t = 1 - k_t $, then:

\[ h_t = k_{t-1} \]

\[ k_t = \frac{1}{4} h_{t-1} + \frac{1}{2} k_{t-1} \]

In matrix form:

\[ \begin{pmatrix} h_t \\ k_t \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ \frac{1}{4} & \frac{1}{2} \end{pmatrix} \begin{pmatrix} h_{t-1} \\ k_{t-1} \end{pmatrix} \]

\[ \begin{pmatrix} h_t \\ k_t \end{pmatrix} = \left( \begin{pmatrix} 0 & 1 \\ \frac{1}{4} & \frac{1}{2} \end{pmatrix} \right)^t \begin{pmatrix} h_0 \\ k_0 \end{pmatrix} \]

To solve directly, we compute the characteristic equation of the matrix:

\[ \left| \begin{array}{cc} -\lambda & 1 \\ \frac{1}{4} & \frac{1}{2} - \lambda \end{array} \right| = 0 \Rightarrow \lambda^2 - \frac{1}{2} \lambda - \frac{1}{4} = 0 \]

The largest (and only positive) root is:

\[ \lambda = \frac{1 + \sqrt{5}}{4} \approx 0.809 \]

As $ t $ grows, we have:

\[ H_t \approx \lambda H_{t-1} \]

That is, the heterozygosity decreases approximately as:

\[ H_t = 0.809 H_{t-1} \]

Assuming a constant rate of decrease, we get:

\[ \frac{H_t}{H_{t-1}} = \frac{H_{t-1}}{H_{t-2}} = \lambda \]

To compare two mating systems, equating their decay over time:

\[ \lambda_1^{t_1} = \lambda_2^{t_2} \Rightarrow \frac{t_1}{t_2} = \frac{\log \lambda_1}{\log \lambda_2} \]

Example: For self-fertilization $ \lambda = 0.5 $, sib mating $ \lambda = 0.809 $:

\[ \frac{t_1}{t_2} = \frac{\log 0.5}{\log 0.809} = \frac{-0.3010}{-0.0921} \approx 3.27 \]

So, asymptotically, one generation of self-fertilization equals about 3.27 generations of sib mating.

This method of comparing mating systems was introduced by Fisher (1949). Other properties of mating systems can also be analyzed, though it quickly becomes complex unless matrix methods (Haldane 1937a; Fisher 1949) are applied.

4. Partial Self-fertilization

All the examples discussed thus far lead eventually to complete homozygosity. This is not always the case, and we shall now consider one such example. This is a simple, yet important, case where a certain fraction of individuals in each generation are self-fertilized and the remainder are mated at random — a situation common in several plant species.

Let $ S $ be the fraction of the population produced by self-fertilization; then $ 1 - S $ is the fraction produced by random mating.

The recurrence for inbreeding becomes:

\[ f_t = S \left[ \frac{1 + f_{t-1}}{2} \right] + (1 - S)(0) \]

\[ f_t = \frac{S}{2}(1 + f_{t-1}) \]

This assumes that the plants undergoing self-fertilization each generation are a random sample of the population — for example, there's no tendency for the progeny of self-fertilized plants to also be self-fertilized.

Substituting $ f_t = \frac{H_0 - H_t}{H_0} $, we get:

\[ f_t = H_0(1 - S) + \frac{S}{2} H_{t-1} \]

Subtracting $ \frac{2(1 - S)}{2 - S} H_0 $ from both sides and simplifying:

\[ H_t - \frac{2(1 - S)}{2 - S} H_0 = \frac{S}{2} \left[ H_{t-1} - \frac{2(1 - S)}{2 - S} H_0 \right] \]

\[ H_t - \frac{2(1 - S)}{2 - S} H_0 = \left( \frac{S}{2} \right)^2 \left[ H_{t-2} - \frac{2(1 - S)}{2 - S} H_0 \right] \]

\[ H_t - \frac{2(1 - S)}{2 - S} H_0 = \left( \frac{S}{2} \right)^t \left[ H_0 - \frac{2(1 - S)}{2 - S} H_0 \right] \]

Since $ \left( \frac{S}{2} \right)^t \to 0 $ as $ t \to \infty $, the heterozygosity approaches a limit where it is a fraction $ \frac{2(1 - S)}{2 - S} $ of its original value.

The rate of approach is such that the departure from equilibrium decreases by a factor of $ 1 - \frac{S}{2} $ each generation. Notice that when $ S = 1 $, we recover the classical formula for complete self-fertilization.

This situation is striking in that unless $ S $ is large, there is almost no cumulative effect. Most of the change occurs in the first generation. For example, with 10% self-fertilization, heterozygosity is reduced by 5% in the first generation, but even at equilibrium the reduction is only 5.3%!

Heterozygosity decreases each generation but stabilizes at equilibrium:

\[ \lim_{t \to \infty} H_t = \frac{2(1 - S)}{2 - S} H_0 \]

Repeated Backcrossing to the Same Strain

Frequently a plant breeder may wish to introduce one or more dominant genes from an extraneous source into a standard variety. For example, they may have a highly desirable variety except for its susceptibility to a disease. The resistant gene may exist in another strain that is less desirable in other respects. The breeder can introduce this gene by crossing the two strains and then repeatedly backcrossing resistant plants to the susceptible strain. In this way, the resistant gene is inserted into a genetic background that becomes more like the susceptible strain with each backcross. As another example, a mouse breeder may wish to introgress a new histocompatibility gene into a standard inbred strain.

In recurrent backcrossing, the number of loci containing genes from both strains is reduced by half each generation. Thus, after $ t $ generations, a fraction equal to $ 1 - 0.5^t $ of the genome is derived from the recurrent parental strain. After seven generations, less than 1% of the loci contain a gene from the other parental strain. If the recurrent parental strain is homozygous, heterozygosity reduces by half each generation, as in self-fertilization.

However, genes that are linked to the resistance or histocompatibility gene will tend to remain heterozygous. The question of how large a linked region remains after several generations of backcrossing was investigated by Haldane (1936) and Fisher (1949).

Consider a chromosome segment on one side of the selected factor with length $ 100x $ map units. If there is no interference, the probability of no crossover in one generation is $ e^{-x} $. The probability of no crossover in $ t $ generations is $ e^{-tx} $. The chance of a crossover occurring in a small interval $ x $ to $ x + dx $ is $ dx $, assuming $ dx $ is small enough to ignore multiple crossovers. Therefore, the probability of a crossover in the interval $ dx $ sometime during $ t $ generations, but not earlier, is $ e^{-tx} t \cdot dx $. The expected value of the intact interval $ x $ is:

\[ \bar{x} = \int_0^{\infty} e^{-tx} dx = \frac{1}{t} \]

or $ \frac{100}{t} $ map units.

For example, after 20 generations, the average segment remaining intact would be 5 units on each side of the selected locus, or 10 units total. The derivation assumes no interference, but interference has little effect over short regions.

A closely related question is: what is the mean number of backcross generations that a gene will remain linked to the selected locus, when the recombination probability is $ r $? The probability that the genes remain linked for $ t $ generations and recombine in the next is $ (1 - r)^t r $. Then the expected number of generations until recombination is:

\[ \bar{t} = r + 2(1 - r)r + 3(1 - r)^2 r + 4(1 - r)^3 r + \ldots \]

Letting $ y = 1 - r $, we write:

\[ \bar{t} = r (1 + 2y + 3y^2 + 4y^3 + \ldots) \]

The expression in parentheses is the derivative of the geometric series: \[ 1 + y + y^2 + y^3 + \ldots = \frac{1}{1 - y} \]

Taking the derivative:

\[ \bar{t} = r \cdot \frac{d}{dy} \left( \frac{1}{1 - y} \right) = r \cdot \frac{1}{(1 - y)^2} \]

Since $ y = 1 - r $, we substitute back:

\[ \bar{t} = \frac{1}{r} \]

The values of $ x $ and $ r $ are not necessarily the same, but become increasingly similar as $ x $ becomes small enough that multiple crossovers are negligible.

Inbreeding with Two Loci

The idea of the inbreeding coefficient can be extended to cover multiple loci. The principal item of interest is that inbreeding may cause the association of two or more recessive traits. This can happen in either of two ways: (1) with unlinked loci but nonuniform inbreeding, and (2) with uniform inbreeding and linkage — and, of course, because of both linkage and nonuniform inbreeding. Our treatment follows very closely the method of Haldane (1949). We consider first unlinked loci in linkage equilibrium.

Let $ p_i $ be the frequency of allele $ A_i $ and $ r_k $ that of the independent allele $ B_k $. The frequency of $ A_i A_i $ homozygotes is $ p_i^2(1-f) + p_i f = p_i^2 + p_i(1-p_i)f $ with a similar formula for $ B_k B_k $, $ r_k^2 + r_k(1-r_k)f $.

Since the loci are in linkage equilibrium the frequency of $ A_i A_i B_k B_k $ is the product:

\[ (p^2 + pqf)(r^2 + rsf) = p^2r^2+(pqr^2 + p^2rs)f + pqrsf^2 \]

In a population with different values of $ f $ from individual to individual the frequency of the double homozygote is:

\[ p^2 r^2 + (pqr^2 + p^2rs) \bar{f} + pq rs \bar{f^2} \]

Using the variance identity:

\[ \bar{f^2} = \bar{f}^2 + V_f \]

Hence:

\[ P(AABB) = (p^2 + pq \bar{f})(r^2 + r s \bar{f}) + pqrs V_f \]

Suppose that in the human population two recessive genes each have a frequency of 0.1, and that 1% of the marriages are between cousins while the rest are random. Then $ \bar{f} = \frac{1}{1600} $, $ V_f = \frac{1}{25600} - \left(\frac{1}{1600}\right)^2 = 3.87 \cdot 10^{-5} $.

To consider the second case, extend the inbreeding coefficient $ f $ to two loci. Let $ F $ be the probability that both loci carry identical alleles.

The probability of homozygosity across gamete types is:

\[ \begin{aligned} &P(AABB) = prF + pr^2(f - F) + p^2 r(f - F) + p^2r^2(1 -2f + F) \\ &P(AABB) = (p^2 + fpq)(r^2 + frs) + \phi pq rs \end{aligned} \]

Where $ \phi = F - f^2 $.

Comparing $ \phi $ with $ V_f $, the effects are approximately additive:

\[ P(AABB) = (p^2 + \bar{f}pq)(r^2 + \bar{f} rs) + (\phi + V_f) pq \cdot rs \]

Computing $ F $ from a pedigree requires special methods. For example:

$ F = \frac{1}{2}(c^2 + d^2) $, $ f = \frac{1}{2} $, $ \phi = \frac{1}{4}(c - d)^2 $ for identical parents
$ F = \frac{1}{4}(c^2 + d^2)d $, $ f = \frac{1}{4} $, $ \phi = \frac{1}{16}(d - c)(3d^2 + c^2) $ for parent-offspring
$ F = \frac{1}{8}(c^2 + d^2)d^2 $, $ f = \frac{1}{8} $, $ \phi = \frac{1}{64}(8d^2 + 8c^2d^2 - 1) $ for half-sibs

Summary Table:

Relationship of parents	F	f	$ \phi = F - f^2 $
Identical	$ \frac{1}{2}(c²+d²) $	$ \frac{1}{2} $	$ \frac{1}{4}(c-d)^2 $
Parent-offspring	$ \frac{1}{4}(c²+d²)d $	$ \frac{1}{4} $	$ \frac{1}{16}(d - c)(3d^2 + c^2) $
Full sibs	$ \frac{1}{8}(2d^4+ 2c^2d^2 + c^2) $	$ \frac{1}{4} $	$ \frac{1}{16}(4d^4 + 4c^2d^2 + 2c^2 - 1) $
Half sibs	$ \frac{1}{2}(c²+d²)d^2 $	$ \frac{1}{8} $	$ \frac{1}{64}(8c^2d^2 + 8d^4 - 1) $
Uncle-niece	$ \frac{1}{16}(2d^4+ 2d^2c^2+ c^2)d $	$ \frac{1}{8} $	$ \frac{1}{64}(8d^5 + 8c^2d^3 + 4c^2d - 1) $
First cousins	$ \frac{1}{32}(2d^4 + 2c^2d^2 + c^2)d^2 $	$ \frac{1}{16} $	$ \frac{1}{256}(16d^6 + 16c^2d^4 + 8c^2d^2 -1) $
Double half-cousins	$ \frac{1}{128}(8d^6+ 8c^2d^4 + c^2) $	$ \frac{1}{16} $	$ \frac{1}{256}(16d^6+ 16c^2d^4 + 2c^2 - 1) $

For example, assuming $ p = r = 0.01 $, frequency of $ AA $ or $ BB $ in cousin progeny is 0.00072 compared with 0.00010 in random mating. With 10% recombination frequency, $ P(AABB) = 5.2 \cdot 10^{-7} + \phi pqrs = 34.6 \cdot 10^{-7} $.

Effect of Inbreeding on the Variance

In previous section, we considered the effect of inbreeding on quantitative traits, especially as the mean is affected. In addition to its effect on the population mean, inbreeding also has an effect on the variance. Consider again the single-locus model. It is convenient to let $ Y = 0 $, as this does not change the conclusion and saves some troublesome algebra. To assess the total effect we consider first a single locus.

Genotype	A₁A₁	A₁A₂	A₂A₂
Frequency	$ p_1^2(1-f) + p_1f $	$ 2p_1p_2(1-f) $	$ p_2^2(1-f)+ p_2f $
Contribution to total phenotype	$ -A $	$ D $	$ A $

We write the mean as a function of $ f $:

\[ \bar{Y}_f = (1-f)(p_2^2A + 2p_1p_2 D - p_1^2A) + f(p_2A - p_1A) \]
\[ \bar{Y}_f = (1-f)\bar{Y}_0 + f\bar{Y}_1 \]

Where $ \bar{Y}_0 $ is the mean with random mating $ (f = 0) $, and $ \bar{Y}_1 $ with complete homozygosity $ (f = 1) $.

This can be written as:

\[ \bar{Y} = \bar{Y}_0 + (\bar{Y}_1 - \bar{Y}_0)f \]

Illustrating that the phenotype is a linear function of $ f $. We now write the expression for the contribution of this locus to the total population variance:

\[ V_f = (1-f)(p_1^2A^2 + 2p_1p_2D^2 - p_2^2A^2) + f(p_1A^2 + p_2A^2) - \bar{Y}^2 \]
\[ V_f = (1-f)(V_0 + \bar{Y}_0^2) + f(V_1 + \bar{Y}_1^2) - [(1-f)\bar{Y}_0 + f\bar{Y}_1]^2 \]
\[ V_f = (1-f)V_0 + fV_1 + f(1-f)(\bar{Y}_0 - \bar{Y}_1)^2 \]

This shows that, unlike the mean, the variance is not a linear function of $ f $, but quadratic.

No dominance case:

If $ D = 0 $:

\[ \bar{Y}_0 = \bar{Y}_1 = A(p_2 - p_1) \]
\[ V_0 = 2p_1p_2A^2,\quad V_1 = 4p_1p_2A^2 = 2V_0 \]

Then:

\[ V_f = (1-f)V_0 + fV_1 = V_0(1+f) \]

This shows a linear increase in variance when there is no dominance. If the population is subdivided into inbred strains, the within-group variance decreases and the between-group variance increases.

However, with dominance, the equations become more complex. A special case is when there are rare recessive alleles. In this case, variance may initially increase due to the unmasking of recessive alleles during inbreeding before eventually declining as homozygosity becomes prevalent.

The Inbreeding Effect of a Finite Population

As was mentioned in the introduction to this chapter, there is a decrease of heterozygosity in a finite population even if there is random mating within the group. Each generation may be regarded as being made up of $2N$ gametes drawn from the previous generation and which combine to make the $N$ individuals of this generation. The gene frequency will therefore change somewhat, the amount depending on the smallness of the population. The change will have a variance of $ \frac{p(1-p)}{2N} $, where $p$ is the frequency of the gene under consideration in the parents.

It might not seem obvious that random changes in gene frequency, which can be in either direction, will on the average cause a net decrease in heterozygosity and increase in homozygosity. One way to visualize this is to regard each of the $2N$ genes at a locus in the $N$ parents as individually labeled. The effect of random processes in drawing (with replacement) from these is that some will be omitted entirely while others will be drawn more than once. Thus there will be a certain amount of identity next generation. If the process continues long enough, all the genes will be descended from a single individual gene and complete autozygosity will be attained.

The distribution of the probability of various gene frequencies during this process is a difficult problem. On the other hand, the average change in heterozygosity is uniform and easily derived. It can be expressed as a function of the inbreeding coefficient.

Consider first a population with completely random mating, including self-fertilization. As before, we consider a single locus. Imagine that the progeny are produced by drawing random pairs of gametes from an infinite pool to which each parent had contributed equally (or, if the pool is finite, the drawing is with replacement). Two gametes then have a chance $ \frac{1}{2N} $ of carrying identical genes, since the $N$ diploid parents have $2N$ genes at this locus. Two gametes have a chance of $ 1 - \frac{1}{2N} $ of carrying different parental genes.

In the first case, the probability of the genes being identical is of course 1. In the second case, the probability of their being identical is $ f_{t-1} $, the inbreeding coefficient of an average individual in the previous generation. This is because two alleles were drawn at random from the parent generation, and since the parents were the result of random mating, the probability of any two alleles being identical is the same as that for two in the same zygote; the latter is, by definition, $ f_{t-1} $.

Therefore:

\[ f_t = \frac{1}{2N} + \left( 1 - \frac{1}{2N} \right) f_{t-1} \]

Recalling that $ H_t $, the heterozygosity at time $t$, is $ H_0(1 - f_t) $, we obtain by substitution:

\[ H_t = \left( 1 - \frac{1}{2N} \right)^t H_0 \]

The result is very simple: despite the complexities in the changes in gene frequencies, the average heterozygosity decreases by a fraction $ \frac{1}{2N} $ each generation.

As stated earlier, this formula assumes completely random mating (i.e., random combination of gametes) including the possibility of self-fertilization. Note that when $N = 1$, the results agree with those for self-fertilization.

Hierarchical Structure of Populations

At the beginning of this chapter we emphasized that inbreeding can occur under two quite different circumstances. In both cases there is a decrease in heterozygosity measured by $ f $, but some of the consequences are quite different.

In the first case we can have a large population within which isolated consanguineous mating occur. The inbreeding coefficient measures the average decrease in heterozygosity and the shown equations give the expected genotypic frequencies for given gene frequencies. As soon as random mating occurs the inbreeding coefficient returns to zero. An example is the case of partial self-fertilization; as soon as self-fertilization is prevented, the original heterozygosity is restored.

On the other hand, there may be inbreeding because of restriction of population number, even though mating is at random within the population. The average heterozygosity within the population is reduced and the individuals become more closely related, both measured by $ f $ (i.e., $ f_I $ and $ f_{IJ} $). But the increased homozygosity within the population is not due to departure from Hardy-Weinberg ratios, but to changes in the gene frequencies. The change is such as to make the value of $ 2p_1p_2 $ (or $ \sum p_ip_j $ in case of multiple alleles) decrease in proportion to $ f $. The formulae hold only in the sense of giving the genotype frequencies averaged over a whole series of such populations.

In this situation the loss of heterozygosity within the population is permanent and could be restored only by crossing with other populations. Repeated self-fertilization and sib mating constitute extreme cases of small populations.

There are circumstances in which the two effects are combined. For example, we might inquire about the inbreeding coefficient of an animal whose parents were sibs in an isolated population of effective size $ N $. His homozygosity will be greater than if his parents were sibs in a large population. We now derive a procedure to handle this situation. The conclusions were first reached by Wright (1943, 1951) by a different method.

Let $ S $ be a subpopulation derived by isolating a finite number of individuals from a large total population $ T $. For example, $ S $ could be a breed or a strain isolated from a foundation stock $ T $. Or, $ S $ could be one of a series of geographically isolated subpopulations of a large population $ T $. Let $ I $ be an individual within subpopulation $ S $.

We now define $ f_{IS} $ as the probability that two homologous genes in $ I $ are derived from the same gene in a common ancestor within the subpopulation. Let $ f_{ST} $ be the probability that two homologous genes, chosen at random from the subpopulation, are both descended from a gene in the subpopulation. We let $ f_{IT} $ be the overall probability of identity in individual $ I $.

The probability of nonidentity is then the product of two terms: $ (1 - f_{IS}) $, the probability that the two genes do not both come from a gene in a known common ancestor in the population, and $ (1 - f_{ST}) $, the probability that if the two genes are randomly chosen from within the population they will not be identical because of some remote relationship. Therefore,

\[ 1 - f_{IT} = (1 - f_{IS})(1 - f_{ST}) \]

\[ f_{IT} = f_{ST} + (1 - f_{ST})f_{IS} \]

For example, what is the inbreeding coefficient of a child whose parents were cousins on an island whose population is descended from a shipwreck 10 generations ago? Assume for simplicity that there were 50 survivors, equally divided between the two sexes, and that the population has remained of this size and sex distribution since that time.

\[ 1 - f_{ST} = \left(1 - \frac{1}{100}\right)^{10} = 0.9045 \]

\[ 1 - f_{IS} = 1 - \frac{1}{16} = 0.9375 \]

\[ \begin{aligned} 1 - f_{IT} &= (1 - f_{IS})(1 - f_{ST}) \\ &= (0.9045)(0.9375) = 0.848 \\ f_{IT} &= 0.152 \end{aligned} \]

Compared with $ 0.0625 $ from cousin marriage in an infinite population.

Students of animal breeding will enjoy Wright's application of these methods to the history of Shorthorn cattle. He showed that there was a substantial increase in the inbreeding coefficient $ f_{IT} $ of British Shorthorns, almost entirely due to $ f_{ST} $ and hardly at all because of consanguineous mating within the breed $ (f_{IS}) $.

Similar analyses are possible in human isolates if accurate pedigree records are available. If they are not, it is sometimes possible to get reasonably satisfactory information from marriage records. Since a person's name is inherited as if it were linked to the father's $ Y $ chromosome, marriages between persons of the same surname (isonymous) can be used as indications of consanguinity. Using this procedure on the Hutterite population, Crow and Mange (1965) were able to show that $ f_{ST} $ in this population is quite appreciable, 4%, but that $ f_{IS} $ is not significantly different from 0. In other words, the increased homozygosity is due almost entirely to the small effective size of the population and not at all to nonrandom marriage within the isolate.

\[ 1 - f_{IS} = \frac{H_0(1 - f_{IT})}{H_0(1 - f_{ST})} \]

Which shows that $ 1 - f_{IS} $ is a measure of the heterozygosity of an individual in the subpopulation relative to that of an individual derived from random mating in the same subpopulation.

As was mentioned earlier, the inbreeding coefficient can be described in terms of correlation as well as in terms of gene identity. In fact, Wright's original definition and derivation was through correlation analysis. One advantage of the correlation interpretation, as opposed to the probabilistic, is that the negative values have a meaning. This is especially useful in this section. $ F_{IS} $ is the correlation between homologous genes in an individual relative to genes chosen at random from his subpopulation. $ F_{IT} $ is the correlation between homologous genes in an individual relative to the whole population. $ F_{ST} $ is the correlation between randomly chosen genes in the subpopulation relative to the total population.

$ F_{ST} $ is necessarily positive, but the others need not be. For example, if there is specific avoidance mating between related individuals within the subpopulation, $ F_{IS} $ may be negative.

Wright defined $ f_{ST} $ as the correlation between two random gametes from the same subpopulation and derived the relation

\[ f_{ST} = \frac{V_p}{\bar{p}(1 - \bar{p})} \]

Where $ \bar{p} $ and $ V_p $ are the mean and variance of the gene $ A $ among the subpopulations. We can derive this as follows:

Consider a pair of alleles, $ A $ and $ A' $, and let $ p $ be the frequency of $ A $. Then the frequency of a heterozygote in the subgroup with frequency $ p $ is $ 2p(1 - p)(1 - f_{IS}) $. Thus the frequency of heterozygotes in the total population is

\[ H = E\{2p(1 - p)(1 - f_{IS})\} \]

Where $ E\{\} $ designates taking the expectation or average over all subpopulations. If the variation of gene frequencies among subgroups is independent of $ f_{IS} $, then

\[ H = 2(1 - f_{IS})E\{p(1 - p)\} \]

But

\[ \begin{aligned} E\{p(1 - p)\} &= E\{p - p^2\} \\ &= \bar{p} - \bar{p}^2 - V_p \end{aligned} \]

Thus

\[ H = 2(1 - f_{IS})E\{p(1 - p)\} \]

On the other hand, from the definition of $ f_{IT} $

\[ H = 2(1 - f_{IT})\bar{p}(1 - \bar{p}) \]

Equating these two expressions gives

\[ (1 - f_{IT}) = (1 - f_{IS}) \left( 1 - \frac{V_p}{\bar{p}(1 - \bar{p})} \right) \]

Notice that if there is no inbreeding in the subpopulation $ (f_{IS} = 0) $, then

\[ H = 2\bar{p}(1 - \bar{p}) - 2V_p \]

This, although expressed in terms of heterozygotes rather than homozygotes, is the same as we described as Wahlund's principle. Nei (1965) extended this procedure to cover the case of multiple alleles and has shown that

\[ f_{ST} = \frac{-\text{Cov}_{ij}}{\bar{p}_i\bar{p}_j} \]

Where $ \bar{p}_i $ and $ \bar{p}_j $ are the mean frequencies of the alleles $ A_i $ and $ A_j $ and $ \text{Cov}_{ij} $ is the covariance of their frequencies.

Nei and Imaizumi (1966) applied these formulae to study the differentiation of $ ABO $ blood group gene frequencies among the prefectures of Japan. They obtained 6 values of $ f_{ST} $ that were in close agreement, suggesting that the local differentiation of gene frequencies was mainly random.

Extending to include subdivisions of a subdivision in a hierarchy. For example,

\[ 1 - f_{IT} = (1 - f_{IR})(1 - f_{RS})(1 - f_{ST}) \]

Where $ R $ is a subpopulation of $ S $ which is a subpopulation of $ T $. Then $ f_{RS} $ is the probability that two randomly chosen homologous genes in $ R $ are identical because of common ancestry within $ R $ and $ f_{ST} $ is the probability that two randomly chosen genes in $ S $ are identical because of common ancestry within $ S $.

Additional Resources

Read more about the discussed topics at the cited material.