Computation of conditional probabilities that appear in PAWE paper 

In this section we document how genotype frequencies for the SNP locus are computed, conditional on affection status, and we state the haplotype frequencies as functions of the marker SNP allele and disease allele frequencies, and the proportion of total disequilibrium . We note that the formulas provided here are identical to those presented in Sham, chapter 4, section 6 (Sham 1998). In this appendix, we provide a framework for deriving those formulas.  All notation presented here is taken from our paper (Gordon et al. 2002). In what follows, we assume that a SNP marker locus has two alleles, labeled 1 and 2. 

Notation

Probability parameters:

p1 = allele frequency of SNP marker 1 allele

p2 = allele frequency of SNP marker 2 allele = 1- p1

pd = allele frequency of disease locus d allele

p+ = allele frequency of disease wild-type allele = 1- pd

 = frequency of SNP marker genotype j in the case group (j=0 for 11 genotype, j=1 for 12 genotype, j= 2 for 22 genotype)

= frequency of SNP marker genotype j in the control group (j=0 for 11 genotype, j=1 for 12 genotype, j= 2 for 22 genotype)

 

Disequilibrium parameters

D= disequilibrium (non-scaled as defined in (Hartl and Clark 1989)) [Note: max (-p1 pd, -p2 p+) £ D £ min (p1 p+, p2 pd)]

Dmax = min (p1 p+, p2 pd) (we assume that disequilibrium is positive)

D’ = proportion of maximum disequilibrium (or scaled disequilibrium) = D/ Dmax (see (Lewontin 1964))

Penetrances:

Prevalence and other parameters:

(Note: We assume Hardy-Weinberg equilibrium (HWE) at the disease locus; no such assumption is made for the marker locus)

hij =  frequency of haplotype bearing the i allele at the disease locus  (i = + or d) and j allele at marker locus (j = 1 or 2).  This frequency is a simple function of the allele frequency parameters at the respective loci (marker and disease) and D (or equivalently, D’):

Example derivation of one conditional probability

We shall derive one conditional probability in detail, and list the other five probabilities, which are derived in a similar fashion. Consider the term =Pr(11| affected). By the definition of conditional probability, we have:

 

The probability of being affected, Pr(affected) is just the prevalence, f. The numerator may be rewritten as:

and affectedor  and affectedor  and affected

or  and affected, (1)

where the notation  refers to the two-locus haplotype (SNP marker on top, disease locus on bottom) with the allele a at the SNP locus and the allele x at the disease locus.

            Using basic probability definitions we have that numerator (1) may be rewritten as:

Pr(affected | ) ´ Pr () + Pr(affected | ) ´ Pr ()

+ Pr(affected | ) ´ Pr () +  Pr(affected | ) ´ Pr () (2).

Each of the conditional probabilities “Pr(affected | haplotype pair)” in equation (2) only uses genotype information of the disease locus, and thus is one of the three values, or  [see (Gordon et al. 2002) Methods – Notation]. For example,

Pr(affected | ) = Pr (affected | + + at disease locus) = .

Furthermore, each of the two-locus haplotype probabilities are, by definition, functions of the terms , where, as above, a is the allele at the SNP locus and x is the allele at the disease locus. Under Mendel’s Law of Independent Assortment, each haplotype is transmitted independently, and it follows that numerator (2) becomes:

Thus, the conditional probability  may be written:

The other conditional probabilities (listed below) are computed similarly. Note that Pr(unaffected) = 1- f, and that the following relations hold:

Pr(unaffected | ij at disease locus) = 1 – Pr(affected | ij at disease locus),

where ij Î {+ +, +d , dd }.

Acknowledgements

The authors gratefully acknowledge SJ Kang, who pointed out multiple inconsistencies in previous versions of this Help file. We have corrected all inconsistencies. 

References

Gordon D, Finch SJ, Nothnagel M, Ott J (2002) Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Human Heredity 54:22-33

Hartl DL, Clark AG (1989) Principles of population genetics. Sinauer Associates, Sunderland

Lewontin RC (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:49-67

Sham P (1998) Statistics in Human Genetics. J. Wiley and Sons, Inc., New York