Calculating Undetected Errors (CUE)

Joshua Tauberer, Derek Gordon, Jurg Ott

CUE determines the error detection rate for genotype data of nuclear families for a multi-allelic locus when checking for Mendelian consistency.
    Assumptions:
  1. Nuclear families have been randomly selected from a population of nuclear families having the same number of children.
  2. The locus in question is in Hardy-Weinberg equilibrium.
  3. Accurate estimates of allele frequencies at the locus are known.
  4. The genotypes of each member in the nuclear family are known.
  5. Errors are introduced randomly and independently at a rate of a using the error model described below.
The Error Model
The collection of genotypes for the members of a family is called an ntuple, written with the parents first and the children following. For example, (1/2, 2/3, 1/3, 2/2) would represent the genotypes for a two-child family for a three-allele locus.
An error in an ntuple is defined as a change from 1 to 2, from A to A - 1 (where A is the number of alleles at the locus), or from a to a + 1 or a - 1 (with equal probability for each) when 1 < a < A. These errors occur at the rate of a.

CUE, when downloaded and run locally, is limited to processing up to approximately 15-member families with two alleles, or 3-member families with up to 15 alleles. Because running high-family-size high-number-of-alleles calculations takes several minutes, these calculations may not run over the web.


The Calculations

To calculate b or a, CUE uses the formulas

(Gordon et al. 1999, 2000) where n is the number of members in the nuclear family and a is the true error rate. Also used are the frequencies of each allele in the polymorphism. B is the probability density function evaluated at i for a binomial distribution with constant success rate a in each of 2n experiments.

To calculate Pr(undetected errors | i errors in family), CUE enumerates all possible genotypes of families and introduces all possible errors. Pr(undetected errors | i errors in family) is equal to the percentage of genotypes with i errors that displays Mendelian consistency.

In addition to solving for b, CUE can approximate a, the true error rate, based on the observed error rate, ao = (1 - (1-a)2n)(1 - b), which can be found by running data through programs such as PEDCHECK (O'Connell and Weeks 1998).


Downloading CUE

CUE is available in three formats. Please read the documentation before using CUE.


CUE from the Web...

You may access the CUE program directly from the web by filling out the following form and clicking Calculate.

Be advised, CUE may take anywhere between 1 second and 2 minutes. Calculations that take longer than 2 minutes will be stopped, and you will need to download CUE in order to find the results.

n Enter the number of members in the family:
Enter the...
true error rate, a
observed error rate, ao
AEnter the number of alleles in the polymorphism:
FaEnter the frequency of each allele separated with a space:
If there are three alleles enter .2 .4 .4, for example.
You may omit the frequency of the last allele as it will be calculated
as 1 minus the sum of the first A-1 frequencies.


References

Please cite the following references when using CUE:

Gordon D, Heath SC, Ott J (1999) True pedigree errors more frequent than apparent errors for single nucleotide polymorphisms. Hum Hered 49:65-70

Gordon D, Leal SM, Heath SC, Ott J (2000) An analytic solution to single nucleotide polymorphism error-detection rates in nuclear families: implications for study design. In Pacific Symposium on Biocomputing 5 (Altman, Dunker, Hunter, Lauderdale, Klein, eds.). Honolulu, Hawaii.

The reference for PEDCHECK is:
O'Connell JR, Weeks DE (1998) PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am J Hum Genet 63:259-66

All comments should be directed to tauberer@for.net.