Power and sample size calculations are a critical part of the study design for genetic association analysis. Traditionally, statistical power for linkage or association analysis is computed by specifying genetic model parameters such as the disease allele frequency and the conditional probabilities Pr(affected | j copies of disease allele), where j = 0,1, or 2 for a di-allelic disease locus (1-5). The conditional probabilities are often referred to as penetrances (6). Equivalently, one can specify the genotype relative risks (7) and the prevalence of the disease (3, 8). While these values can usually be estimated with a high degree of accuracy for Mendelian disorders, they are typically unknown for complex diseases (9). One statistical method to deal with such uncertainty regards considering a range of values for parameters. One can then either report the "worst-case scenario" (i.e., the smallest power or largest required sample size observed over the range) or median power and/or sample size values (10). This approach has been considered in several genetic applications over the past several years (11-16). One advantage is that researchers can observe a distribution of power values for the range of parameter values considered, including minimum, median, average, and maximum power.
Visualing the power
We have implemented a method to visualize power and sample size for two commonly-used statistical tests for genetic association, the linear trend test (17, 18) and the genotypic test of association [e.g., see (1)]. Power and/or sample size are computed through derivation of the respective test's non-centrality parameter (19, 20). Power for a fixed sample size of cases and controls and minimal sample size for a fixed power, each at a specified significance level, are computed as functions of genotype relative risks for the heterozygote (R1) and for the homozygous risk allele (R2), disease allele frequency (Pd), marker allele frequency (P1) of SNP allele in coupling with disease allele, measure of disequilibrium between disease and SNP locus [(D' ) (21) or r2 (22, 23)], and disease prevalence (K). Alternatively, if one is studying a quantitative trait locus (QTL) and one specifies lower and upper cutoffs for definition of affected and unaffected individuals, then power and/or sample size are calculated as functions of QTL variance, the dominance/additive ratio, the frequency of the QTL "increaser" allele, marker allele frequency (P1) of the SNP allele in coupling with disease allele, and a measure of disequilibrium between the QTL and SNP locus [(D' ) (21) or r2 as above] [e.g., (3)].
Futhermore, because of work documenting the effects of diagnostic misclassification on the power of the linear trend test (24) and the genotypic test of association (25, 26), we also include misclassification probabilities θ (the probability of misclassifying a true affected as an observed unaffected) and φ (the probability of misclassifying a true unaffected as an observed affected).
In total, there are eight disease model parameters required for the determination of power and/or sample size at a given significance level, assuming a di-allelic disease and a marker locus in disequilibrium. Our webtool, PAWE-3D, allows one to perform power calculations considering a range of values for any subset of the eight parameters (with the remaining parameters specified at a single value). If we consider a range for only one parameter, the resulting figure is a graph. If we consider a range for exactly two parameters, the resulting figure is a contour plot. If we consider a range for three or more parameters, the resulting figure is a histogram. The figures are created by randomly sampling 100,000 data points assuming either a Uniform or Beta prior distribution for values in the n-dimensional cube determined by the endpoints of the n user-specified intervals. For the Beta distribution, the user specifies a mean and a variance for the distribution. These values are then transformed into parameters necessary for determination of the particular Beta distribution on the [0,1] interval. A simple linear transformation maps the Beta distribution on [0,1] to the interval for the specific parameter. For more details, see the PAWE-3D Helpfile.
Please cite the following two references when reporting results obtained
from PAWE-3D.
Gordon D, Haynes C, Blumenfeld J, Finch SJ (2005) PAWE-3D: visualizing
Power for Association With Error in case/control genetic studies of
complex traits. Bioinformatics 21:3935-3937.
Gordon D, Finch SJ, Nothnagel M, Ott J (2002) Power and sample size
calculations for case-control genetic association tests when errors are
present: application to single nucleotide polymorphisms. Hum Hered
54:22-33.
If power and/or sample size calculations are performing using phenotype
misclassification error, then please also cite the following papers
(depending upon the test statistic considered):
Linear Trend Test
Zheng G, Tian X (2005) The impact of diagnostic error on testing genetic
association in case-control studies. Stat Med 24:869-882.
Genotypic Test
Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D (2005) Power and
sample size calculations in the presence of phenotype errors for
case/control genetic association studies. BMC Genet 6:18.
References
- Gordon, D., Finch, S.J., Nothnagel, M. and Ott, J. (2002) Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms. Hum Hered, 54, 22-33.
- Boehnke, M. (1986) Estimating the power of a proposed linkage study: a practical computer simulation approach. Am J Hum Genet, 39, 513-27.
- Purcell, S., Cherny, S.S. and Sham, P.C. (2003) Genetic power calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics, 19, 149-150.
- Weeks, D.E., Ott, J. and M., L.G. (1990) SLINK: a general simulation program for linkage analysis. Am J Hum Genet, 47, A204 (Supplement).
- De La Vega, F.M., Gordon, D., Su, X., Scafe, C., Isaac, H., Gilbert, D.A. and Spier, E.G. (2005) Power and sample size calculations for genetic case/control studies using gene-centric SNP maps: application to Human Chromosomes 6, 21, and 22 in three populations. Hum Hered. 60(1):43-60.
- Ott, J. (1999) Analysis of Human Genetic Linkage. Johns Hopkins, Baltimore.
- Schaid, D.J. and Sommer, S.S. (1993) Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am J Hum Genet, 53, 1114-26.
- Sham, P. (1998) Statistics in Human Genetics. J. Wiley and Sons, Inc., New York.
- Ulgen, A., Yoo, Y.J., Gordon, D., Finch, S.J. and Mendell, N.R. (2004) Percentiles of the null distribution of 2 maximum lod score tests. Hum Hered, 57, 39-48.
- Cox, D.R. and Hinkley, D.V. (1979) Theoretical Statistics. CRC Press, Boca Raton.
- Vieland, V.J. (1998) Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage. Am J Hum Genet, 63, 947-54.
- Gordon, D., Finch, S.J., Jacobs, A.L., Mendell, N.R., Single, R.M. and Marr, T.G. (1997) Association of posterior p-values of S.A.G.E. SIBPAL proportion-IBD and Haseman-Elston statistics for ACTHR112. Genet Epidemiol, 14, 629-34.
- Cousin, E., Genin, E., Mace, S., Ricard, S., Chansac, C., del Zompo, M. and Deleuze, J.F. (2003) Association studies in candidate genes: strategies to select SNPs to be tested. Hum Hered, 56, 151-9.
- Abreu, P.C., Greenberg, D.A. and Hodge, S.E. (1999) Direct power comparisons between simple LOD scores and NPL scores for linkage analysis in complex diseases. Am J Hum Genet, 65, 847-57.
- Zheng, G., Joo, J., Ganesh, S.K., Nabel, E.G. and Geller, N.L. (2005) On averaging power for genetic association and linkage studies. Hum Hered, 59, 14-20.
- Gordon, D., De la Vega, F.M., Finch, S.J. and Ye, K.Q. (2005) Power for complex trait genetic association. Clin Neuroscience Res, 5, 31-35.
- Cochran, W.G. (1954) Some methods for strengthening the common chi-squared tests. Biometrics, 10, 417-451.
- Armitage, P. (1955) Tests for linear trends in proportions and frequencies. Biometrics, 11, 375-386.
- Mitra, S.K. (1958) On the limiting power function of the frequency chi-square test. Ann Math Stat, 29, 1221-1233.
- Chapman, D.G. and Nam, J.M. (1968) Asymptotic power of chi square tests for linear trends in proportions. Biometrics, 24, 315-327.
- Lewontin, R.C. (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics, 49, 49-67.
- Fisher, R.A. (1970) Statistical Methods for Research Workers. 14th ed. Hafner/MacMillan, New York.
- Weir, B.S. (1990) Genetic Data Analysis: methods for discrete population genetic data. Sinauer Associates, Inc., Sunderland.
- Zheng, G. and Tian, X. (2005) The impact of diagnostic error on testing genetic association in case-control studies. Stat Med, 24, 869-82.
- Bross, I. (1954) Misclassification in 2 x 2 tables. Biometrics, 10, 478-486.
- Edwards, B.J., Haynes, C., Levenstien, M.A., Finch, S.J. and Gordon, D. (2005) Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet, 6, 18.