Calculating Undetected Errors (CUE)
Joshua Tauberer, Derek Gordon, Jurg Ott
CUE determines the error detection rate for genotype data of nuclear families for a multi-allelic locus when checking for Mendelian consistency.
Terminology:
- The error detection rate, 1-b, is defined as the probability that a nuclear family with errors is inconsistent with Mendel's Laws.
- The true error rate, a, is defined as the actual rate of experimental error (the probability that there is a genotyping error, see the error model below).
- The observed error rate, ao, is defined as the observed proportion of pedigrees (all of which have 2 parents and n-2 children) with errors. This is the probability that a pedigree of size n has at least one error and is inconsistent.
Calculations are based on...
- the size of the family, n
- the number of alleles in the locus, A, and the frequency of each, Fa,
- the true error rate, a, or the observed error rate, ao
Assumptions:
- Nuclear families have been randomly selected from a population of nuclear families having the same number of children.
- The locus in question is in Hardy-Weinberg equilibrium.
- Accurate estimates of allele frequencies at the locus are known.
- The genotypes of each member in the nuclear family are known.
- Errors are introduced randomly and independently at a rate of a using the error model described below.
- The Error Model
The collection of genotypes for the members of a family is called an ntuple, written with the parents first and the children following. For example, (1/2, 2/3, 1/3, 2/2) would represent the genotypes for a two-child family for a three-allele locus.
An error in an ntuple is defined as a change from 1 to 2, from A to A - 1 (where A is the number of alleles at the locus), or from a to a + 1 or a - 1 (with equal probability for each) when 1 < a < A. These errors occur at the rate of a.
CUE, when downloaded and run locally, is limited to processing up to approximately 15-member families with two alleles, or 3-member families with up to 15 alleles. Because running high-family-size high-number-of-alleles calculations takes several minutes, these calculations may not run over the web.
The Calculations
To calculate b or a, CUE uses the formulas
(Gordon et al. 1999, 2000) where n is the number of members in the nuclear family and a is the true error rate. Also used are the frequencies of each allele in the polymorphism. B is the probability density function evaluated at i for a binomial distribution with constant success rate a in each of 2n experiments.
To calculate Pr(undetected errors | i errors in family), CUE enumerates all possible genotypes of families and introduces all possible errors. Pr(undetected errors | i errors in family) is equal to the percentage of genotypes with i errors that displays Mendelian consistency.
In addition to solving for b, CUE can approximate a, the true error rate, based on the observed error rate, ao = (1 - (1-a)2n)(1 - b), which can be found by running data through programs such as PEDCHECK (O'Connell and Weeks 1998).
Downloading CUE
CUE is available in three formats. Please read the documentation before using CUE.
CUE from the Web...
You may access the CUE program directly from the web by filling out the following form and clicking Calculate.
Be advised, CUE may take anywhere between 1 second and 2 minutes. Calculations that take longer than 2 minutes will be stopped, and you will need to download CUE in order to find the results.
References
Please cite the following references when using CUE:
Gordon D, Heath SC, Ott J (1999) True pedigree errors more frequent than
apparent errors for single nucleotide polymorphisms. Hum Hered 49:65-70
Gordon D, Leal SM, Heath SC, Ott J (2000) An analytic solution to single
nucleotide polymorphism error-detection rates in nuclear families:
implications for study design. In Pacific Symposium on Biocomputing 5
(Altman, Dunker, Hunter, Lauderdale, Klein, eds.). Honolulu, Hawaii.
The reference for PEDCHECK is:
O'Connell JR, Weeks DE (1998) PedCheck: a program for identification of
genotype incompatibilities in linkage analysis. Am J Hum Genet 63:259-66
All comments should be directed to tauberer@for.net.