CUE Documentation ----------------- CUE is available on the web at: http://linkage.rockefeller.edu/derek/cue When using results from CUE, please cite the following: Gordon D, Heath SC, Ott J (1999) True pedigree errors more frequent than apparent errors for single nucleotide polymorphisms. Hum Hered 49:65-70 Gordon D, Leal SM, Heath SC, Ott J (2000) An analytic solution to single nucleotide polymorphism error-detection rates in nuclear families: implications for study design. In Pacific Symposium on Biocomputing 5 (Altman, Dunker, Hunter, Lauderdale, Klein, eds.). Honolulu, Hawaii. The reference for PEDCHECK (mentioned later in this text) is: O'Connell JR, Weeks DE (1998) PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am J Hum Genet 63:259-66 Any questions or comments regarding CUE should be directed to tauberer@for.net. CUE determines the error detection rate for genotype data of nuclear families for a multi-allelic locus when checking for Mendelian consistency. TERMINOLOGY The error detection rate, 1-beta, is defined as the probability that a nuclear family with errors is inconsistent with Mendel's Laws. The true error rate, alpha, is defined as the actual rate of experimental error (the probability that there is a genotyping error, see the error model below). The observed error rate, alpha_o, is defined as the observed proportion of pedigrees (all of which have 2 parents and n-2 children) with errors. This is the probability that a pedigree of size n has at least one error and is inconsistent. CALCULATIONS ARE BASED ON... the size of the family, n the number of alleles in the locus, A, and the frequency of each, Fa the true error rate, alpha, or the observed error rate, alpha_o ASSUMPTIONS Nuclear families have been randomly selected from a population of nuclear families having the same number of children. The locus in question is in Hardy-Weinberg equilibrium. Accurate estimates of allele frequencies at the locus are known. The genotypes of each member in the nuclear family are known. Errors are introduced randomly and independently at a rate of a using the error model described below. THE ERROR MODEL The collection of genotypes for the members of a family is called an ntuple, written with the parents first and the children following. For example, (1/2, 2/3, 1/3, 2/2) would represent the genotypes for a two-child family for a three-allele locus. An error in an ntuple is defined as a change from 1 to 2, from A to A - 1 (where A is the number of alleles at the locus), or from a to a + 1 or a - 1 (with equal probability for each) when 1 < a < A. These errors occur at the rate of a. USAGE CUE takes in all of the information that it needs from command line arguments and outputs the results to the console. To run CUE, change to the directory that contains CUE and type... in DOS: cue DISPLAY N ERROR-TYPE ERROR-RATE A F(1)...F(A-1) in UNIX: ./cue DISPLAY N ERROR-TYPE ERROR-RATE A F(1)...F(A-1) DISPLAY is either "web", "terse", or "verbose". "web" mode displays no status information. "terse" mode displays status information. "verbose" mode dispalys every calculation made. N is the number of members in the family (3-15). ERROR-TYPE is either "true" or "observed". "true" indicates the next argument is the true error rate. "observed" indicates the next argument is the observed error rate. Note that the precision of calculations with the "observed" ERROR_TYPE is only four decimal places, where as six decimal places are used for the "true" ERROR_TYPE. ERROR-RATE is the rate of errors, depending on what was specified for ERROR-TYPE. A is the number of alleles at the locus (2-15). F(1)...F(A-1) are the frequencies of the first A-1 alleles at the locus. You do not need to provide the frequency of the last allele as it will be calcuated as 1 minus the sum of the alleles provided. EXAMPLES If you have data on four member families (2 parents, 2 children) for a locus with 3 alleles of frequencies .2, .3, and .5, and the true error rate is .005, run CUE like this: cue web 4 true .005 3 .2 .3 CUE outputs the same information you put in for reference, these results: error detection rate, 1-beta = 0.488891 proportion of pedigrees with errors = 0.039307 observed error rate, alpha_o = 0.019217 and citation information. Although 3.9% of pedigrees will have errors, only 48.9% of those pedigrees will display Mendelian inconsistency, so only 1.9% of the pedigrees in the data will appear to have errors. PEDCHECK analyzes genotype data for errors and produces the following output: PedCheck has found XXX pedigrees with Level 0 errors. This number, XXX, divided by the number of pedigrees in the data is the observed error rate. To use this number in CUE, specify "observed" as the ERROR_TYPE. For example, for data on 150 4-child families (N=6) and a locus with 2 alleles (frequencies .3, .7), and if PEDCHECK reported 3 pedigrees with errors (alpha_o = .02), start CUE by typing: cue terse 6 observed .02 2 .3 CUE will display the progress of the calculations, and then: true error rate, alpha = 0.0041 error detection rate, 1-beta = 0.4227 proportion of pedigrees with errors = 0.0475 Based on the observed error rate (2%), CUE found that 4.8% of the pedigrees in the data (about seven) probably had errors (only 42.3% of those were detected). And, the rate of genotyping errors is 0.41%. DATA WITH FAMILIES OF DIFFERENT SIZES To find the overall error detection rate for data that is composed of families of different numbers of children, find error detection rate for each set of family sizes. Then weight detection rate by the proportion of families in the data with that number of children. Sum these weighted detection rates. Example: data on 200 families, 3 alleles (frequencies .1, .4, .5), and a true error rate of 5%. 50 pedigrees have 1 child. 100 pedigrees have 2 children. 25 pedigrees have 3 children. 25 pedigrees have 4 children. Run CUE for each of these four sets of families of the same size. Hypothetically, CUE determines 1-beta is 35%, 45%, 55%, and 65% for the four sets respectively. The overall detection rate is: .35(50/200) + .45(100/200) + .55(25/200) + .65(25/200) = 46.25% The same process can be used to find the overall true error rate, overall observed error rate, or overall proportion of pedigrees with errors.