Probability parameters:
p1 = allele frequency of SNP marker 1 allele
p2 = allele frequency of SNP marker 2 allele = 1- p1
pd
= allele frequency of disease locus d allele
p+ = allele frequency of disease wild-type allele = 1- pd
= frequency of SNP marker genotype j
in the case group (j=0 for 11 genotype, j=1 for 12 genotype, j=
2 for 22 genotype)
=
frequency of SNP marker genotype j in the control group (j=0 for
11 genotype, j=1 for 12 genotype, j= 2 for 22 genotype)
D= disequilibrium (non-scaled as defined in (Hartl and Clark 1989)) [Note: max (-p1 pd, -p2 p+) £ D £ min (p1 p+, p2 pd)]
Dmax = min (p1 p+, p2 pd) (we assume that disequilibrium is positive)
D’ = proportion of maximum disequilibrium (or scaled disequilibrium) = D/ Dmax (see (Lewontin 1964))
Penetrances:

Prevalence and other parameters:
![]()
(Note: We assume Hardy-Weinberg equilibrium (HWE) at the disease locus; no such assumption is made for the marker locus)
hij
= frequency of haplotype bearing the i allele at the disease
locus (i = + or d)
and j allele at marker locus (j = 1 or 2). This frequency is a simple function of the
allele frequency parameters at the respective loci (marker and disease) and D
(or equivalently, D’):

Example
derivation of one conditional probability
We shall derive
one conditional probability in detail, and list the other five probabilities,
which are derived in a similar fashion. Consider the term
=Pr(11|
affected). By the definition of conditional probability, we have:
The probability of being affected, Pr(affected) is just the prevalence, f. The numerator may be rewritten as:
and
affected
or
and affected
or
and affected![]()
or
and affected![]()
, (1)
where the
notation
refers to the two-locus haplotype (SNP marker
on top, disease locus on bottom) with the allele a at the SNP locus and
the allele x at the disease locus.
Using basic probability definitions we have that numerator (1) may be rewritten as:
Pr(affected |
) ´ Pr (
) +
Pr(affected |
) ´ Pr (
)
+ Pr(affected |
) ´ Pr (
)
+ Pr(affected |
) ´ Pr (
) (2).
Each of the
conditional probabilities “Pr(affected | haplotype pair)” in equation (2) only
uses genotype information of the disease locus, and thus is one of the three
values![]()
, or
[see (Gordon
et al. 2002) Methods – Notation]. For example,
Pr(affected |
) = Pr
(affected | + + at disease locus) =
.
Furthermore,
each of the two-locus haplotype probabilities are, by definition, functions of
the terms
,
where, as above, a is the allele at the SNP locus and x is the
allele at the disease locus. Under Mendel’s Law of Independent Assortment, each
haplotype is transmitted independently, and it follows that numerator (2)
becomes:
![]()
Thus, the
conditional probability
may be written:
![]()
The other conditional probabilities (listed below) are computed similarly. Note that Pr(unaffected) = 1- f, and that the following relations hold:
Pr(unaffected | ij at disease locus) = 1 – Pr(affected | ij at disease locus),
where ij Î {+ +, +d , dd }.

![]()
Acknowledgements
The authors gratefully acknowledge SJ Kang, who pointed out multiple inconsistencies in previous versions of this Help file. We have corrected all inconsistencies.
References
Gordon D, Finch SJ, Nothnagel M, Ott J (2002) Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Human Heredity 54:22-33
Hartl DL, Clark AG (1989) Principles of population genetics. Sinauer Associates, Sunderland
Lewontin RC (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:49-67
Sham P (1998) Statistics in Human Genetics. J. Wiley and Sons, Inc., New York