PAWE
version 1.2 Help File, February 2002
Written by Derek Gordon
The name of this program is PAWE, which
stands for Power of Association With Errors.
Because it has been previously documented (Mote and Anderson 1965; Gordon et
al. 2002) that genotyping errors can substantially decrease the asymptotic
power to detect association between a trait locus and a marker locus, the
purpose of the PAWE program is two fold: (i) to compute power and sample size
calculations for genetic case-control association studies in the presence of
genotyping errors, and (ii) to determine quantitatively how much, in terms of
decrease in asymptotic power for a fixed sample size, or increase in sample
size to maintain constant asymptotic power, genotyping errors cost the
researcher performing genetic
association studies with cases and controls. Thus, results from the PAWE
program will be either asymptotic power or sample size values. We note that the
results we obtain for data without errors are identical to the results obtained
by other genetic association test power calculators, for example the Genetic Power
Calculator for case-control studies of discrete traits developed by authors
Purcell and Sham.
This file is designed to explain to you
the different values entered into the PAWE program, so that a meaningful
answer is obtained. This program is designed to perform asymptotic power and
sample size calculations for genetic case-control studies with a di-allelic
locus (for example, a SNP) in the presence of errors. The test statistics
considered are the standard chi-square statistics for allelic and genotypic
association. In what follows, it will be assumed that there is a di-allelic trait
locus for a discrete trait with two alleles: a wild-type allele or low risk
allele, denoted by +, and a trait or high-risk allele, denoted by d.
Also, it will be assumed that there is a marker locus with two alleles, denoted
by 1 and 2.
1.1 Parameter Settings
1.1.1 Asymptotic Power or Sample Size
The researcher who has a fixed sample size
and wants to know the value of asymptotic power for his/her study in the
presence of errors should choose the option
"power for a fixed sample size". The researcher who is
planning a study, and who wants to know what sample size is necessary, in the
presence of errors, to achieve a given asymptotic power level should choose
"sample size for fixed power".
1.1.2 Asymptotic Power for a fixed
sample size
1.1.2.1 Number of cases
In this box, you specify the number of
case individuals you have. These individuals are assumed to be both phenotyped
and genotyped. The number typed in this box must be a positive integer.
1.1.2.2 Number of controls
In this box, you specify the number of
control individuals you have. These individuals are assumed to be both
phenotyped and genotyped. The number
typed in this box must be a positive integer.
1.1.3 Sample Size for a fixed
asymptotic power
1.1.3.1 Asymptotic Power level
In this box, you specify the asymptotic
power you would like for your study. This number must be greater than 0 and
less than or equal to 1. It is usually a number closer to 1.
1.1.3.2 Ratio of Controls to Cases
In this box, you specify the ratio of the
number of controls to the number of cases that you expect to have for your
study. The number entered here must be a positive real number.
1.1.4 Genotype Frequency Generation
1.1.4.1 Genetic model free method
You choose this option if you do not know
the genetic model parameters such as penetrances, disease allele frequency, or
proportion of linkage disequilibrium for your study. When choosing the genetic
model free method, you specify the frequency distribution of genotypes 11, 12,
and 22 for cases and controls. You can assume or not assume Hardy Weinberg
equilibrium (HWE) for both the case and control population.
1.1.4.1.1 Hardy Weinberg equilibrium
Assumed
If you assume HWE, then the genotype
frequency distribution is a function of a single parameter, p, that you
enter into this box. The parameter p must be a real, positive number
less than 1. The genotype frequencies of 11, 12, and 22 are then
, respectively.
1.1.4.1.2 No Assumption of Hardy
Weinberg equilibrium
If you do not assume HWE, then the
genotype frequency distribution is a function of two parameters,
. The parameter
is the genotype frequency for the 11 genotypes, and the
parameter
is the genotype
frequency for the 22 genotypes. The genotype frequency for the 12 genotype is
given by
. Thus, the numbers you enter into these two boxes must be
postive real numbers whose sum is less
than 1.
1.1.4.2 Genetic model based method
You choose this option if you have
estimates for the following 6 parameters: the penetrances
,
, and
, the disease allele frequency
, the marker 1-allele frequency, and the proportion of
linkage disequilibrium (D'). In the previous sentence, the abbreviation
“aff” means “being a case”, and the three penetrances are conditional
probabilities, the conditioning being on the genotype at the trait locus. All
six of these entered parameters must be positive real numbers that are less
than 1. To see how these parameters are translated into genotype frequencies
for cases and controls, please click on the link:
PAWE1
Note: D'=1
means complete disequilibrium, the best case scenario
D'=0 means no
disequilibrium, the null scenario
1.1.5 Significance level
Here, you specify the significance level
of the test. This value is the probability of falsely rejecting a true null
hypothesis. This number must be positive and less than 1. Typically, it is
chosen to be less than or equal to 0.05.
1.1.6 Error models
For this option, you have the choice of
selecting one of four error models that you think best explains your data. The
choices are:
Gordon Heath Liu Ott (GHLO) error model (2001)
Douglas Skol Boehnke (DSB) error model (2002)
Sobel Papp Lange (SPL) error model (2002)
Mote and Anderson (MA) error model (1965)
Presented here is a brief, but by no means
comprehensive, list of some of the differences among the models. The GHLO model
introduces errors into alleles as opposed to genotypes. It is described by 2
parameters. The DSB model introduces errors into genotypes, and is the only
model for which it is not possible for a homozygous 11 genotype to be
incorrectly recoded as a homozygous 22 genotype, or vice versa. It is described by 2 parameters. The SPL model is, for di-allelic loci,
described by 3 parameters. It is the most general error model possible for
di-allelic loci, under the constraint that errors are independent of the
particular allele. The MA model, which
is the most general error model possible in the sense that it can describe all
other error models, is described by 6 parameters. The GHLO, SPL, and MA error models all allow for errors in which
one homozygote is incorrectly miscoded as another homozygote.
1.1.6.1 Gordon Heath Liu Ott (GHLO)
error model
The parameter settings for this error
model are:
= Pr(1 allele incorrectly coded as 2 allele)
= Pr(2 allele incorrectly coded as 1 allele)
Both entries must be positive real numbers
less than 1.0.
For more information, see:
Gordon D., Heath S.C., Liu X., and Ott J.
(2001) A transmission disequilibrium test that allows for genotyping errors in
the analysis of single-nucleotide polymorphism data. American Journal of Human Genetics 69:371-380
1.1.6.2 Douglas Skol Boehnke (DSB)
error model (2002)
The parameter settings for this error
model are:
= Pr(homozygous 11 or 22 genotype incorrectly coded as
heterozygote 12)
= Pr(heterozygote 12 genotype incorrectly coded as homozygote
11 or 22)
Both entries must be positive real numbers
less than 1.0.
Note: for the
parameter, it is assumed that the 12 genotype has an equal
probability (0.5) of being incorrectly coded as 11 or 22. Also, the notation
used here comes from the Gordon et al. (2002) reference.
For more information, see:
Douglas J.A., Skol A.D., and Boehnke M.
(2002) Probability of detection of genotyping errors and mutations as
inheritance inconsistencies in nuclear-family data. American Journal of Human
Genetics 70:487-495
1.1.6.3 Sobel Papp Lange (SPL) error
model (2002)
The parameter settings for this error
model are:
V1 = Pr(true homozygote
incorrectly coded as heterozygote)
V2 = Pr(one homozygote
incorrectly coded as another homozygote)
V3 = Pr(true heterozygote
incorrectly coded as a homozygote)
Note: This parameterization of the SPL
error model is an improvement over the parameterization previously used (Gordon
et al. 2002) in that it only requires three parameter settings. The author
gratefully acknowledges S. Seaman and P. Holmans for the improvement.
All entries must be positive real numbers
less than 1.0, subject to the following constraints:
V1 + V2 < 1.0
V3 < 0.5
For more information, see:
Sobel E., Papp J.C., and Lange K. (2002)
Detection and integration of genotyping errors in statistical genetics.
American Journal of Human Genetics 70:496-508
1.1.6.4 Mote and Anderson (MA) error
model (1965)
The parameter settings for this error
model are:
= Pr(12 genotype observed | 11 true)
= Pr(22 genotype observed | 11 true)
= Pr(11 genotype observed | 12 true)
= Pr(22 genotype observed | 12 true)
= Pr(11 genotype observed | 22 true)
= Pr(12 genotype observed | 22 true)
The following constraints are needed for
the MA error model:

For more information, see:
Mote V.L., and Anderson R.L. (1965) An
investigation of the effect of misclassification on the properties of
chisquare-tests in the analysis of categorical data. Biometrika 52:95-109
The PAWE program reports most of the input
parameters that the user enters, as well as the following items:
2.1.1 Non-centrality parameters
For a given test of association (allelic or genotypic), this
parameter completely determines either the asymptotic power or sample size
calculations. To see how the asymptotic power calculations are performed,
please see Gordon et al. (2002) and PAWE2. The non-centrality parameters for both
errorless data and data with errors are presented.
2.1.2 Asymptotic power for fixed sample
size - power loss
Based on the value of the non-centrality
parameter for either the allelic or genotypic test of association, the
asymptotic power of the test is reported for both errorless data and data
assuming the particular error model. Also reported is the percent loss in power
due to errors in the data.
2.1.3 Sample size increase for fixed
asymptotic power
Based on the value of the non-centrality
parameter for either the allelic or genotypic test of association, the minimum
sample of cases and controls is reported for both errorless data and data
assuming the particular error model.
Also reported is the percent increase in sample size needed to maintain
constant power when errors are present.
2.1.4 Genotype and allele frequencies
for errorless data
Based on the parameters entered for
genotype frequency generation (Section 1.1.4), the genotype and allele
frequencies in cases and controls are computed for errorless data.
2.1.5 Matrix of Penetrances
The entries of this matrix are the
conditional probabilities Pr(observed genotype i | true genotype j)
for i and j being one of the genotypes 11, 12, or 22. These conditional probabilities, also
called penetrances, are used in calculating the genotype and allele frequencies in the presence of errors (see
Section 2.1.6).
2.1.6 Genotype and allele frequencies
for error data
Using the genotype and allele frequencies
in cases and controls for errorless data
(Section 2.1.4), and the matrix of penetrances (Section 2.1.5), genotype
and allele frequencies in cases and controls are computed for error data. These
values are used to compute the non-centrality parameters for error data
(Section 2.1.1). For more details on how this computation is performed, please
see Gordon et al. (2002).
3.1 References
Please cite the following two references
when reporting results using PAWE:
Gordon D., Finch S.J., Nothnagel M., and
Ott J. (2002) Power and sample size calculations
for case-control genetic association tests when errors present: application to
single nucleotide polymorphisms. Human Heredity 54:22-33
Gordon D., Levenstien M.A., Finch S.J.,
and Ott J. (2003) Errors and linkage
disequilibrium interact multiplicatively when computing sample sizes for
genetic case-control association
studies. Pacific Symposium on Biocomputing:490-501.
Citations for error models:
Gordon D., Heath S.C., Liu X., and
Ott J. (2001) A transmission disequilibrium that allows for genotyping
errors in the analysis of
single-nucleotide polymorphism data. American Journal of Human Genetics
69:371-380
Douglas J.A., Skol A.D., and Boehnke
M. (2002) Probability of detection of genotyping errors and mutations as
inheritance inconsistencies in nuclear-family data. American Journal of Human Genetics 70:487-495
Sobel E., Papp J.C., and Lange K. (2002)
Detection and integration of genotyping errors in statistical genetics.
American Journal of Human Genetics
70:496-508
Mote V.L., and Anderson R.L. (1965) An investigation of the effect of
misclassification on the properties of chisquare-tests in the analysis of
categorical data. Biometrika 52:95-109