PAWE-3D: Visualizing Power for Association With Error in case/control genetic studies of complex traits – Help file

In this Help file, we provide information on each of the items listed on the PAWE-3D interface. If, after using this Help file, you still have questions and/or comments, please contact us at: pawe3d@linkage.rockefeller.edu.

 

1. Important assumptions

 

For the PAWE-3D webtool, we assume that the only observed data for all case and control individuals are genotypes for a single di-allelic marker locus with alleles labeled 1 and 2. It is also assumed that there is either an unobserved di-allelic trait locus (using the genotype relative risk parameters) or that there is an unobserved di-allelic quantitative trait locus (QTL) that explains some proportion of an overall quantitative trait that is distributed in the population as a univariate normal with mean 0 and variance 1 (i.e., distributed as N(0,1)). For the quantitative trait locus, individuals are categorized into affected and unaffected status by specifying lower and upper thresholds (See Section 2.7.2 below). For example, given the quantitative measure, cases may be defined as those individuals whose measure is between 1.5 and 3.5 and controls may be defined as those individuals whose measure is between –2.5 and –1.5.   

 

2. Description of the various entries for PAWE-3D

 

In the remainder of this help file, we provide a description of all the items on the webpage, http://linkage.rockefeller.edu/pawe3d/. We provide descriptions in the order in which each item appears on the webpage.

 

2.1 Analysis Method

 

Here, the user has a choice of two types of calculations: either asymptotic power for a fixed sample size, or minimum sample size for a fixed power.

 

Power- If the user chooses this option, then the user must enter the number of cases and controls for which power is to be determined. Each of these entries must be positive integers.

 

Sample size - If the user chooses this option, then the user must enter the desired power (a number between 0 and 1; typical values are 0.8, 0.9, 0.95) and the ratio of controls to cases. This ratio must be a positive number. 

 

2.2 Statistical tests

 

Genotypic test - If this option is checked, then power or sample size calculations will be performed using the genotypic test of association (i.e., chi-squared test of association) on contingency tables. Power and/or sample size for a specified significance level (see below: Section 2.4) is determined by specification of the test’s non-centrality parameter. Mitra (1958) provided the theoretical basis for determination of this parameter. For the interested reader, computation of this parameter is provided by clinking on this link: PAWE3D01

 

Linear Trend Test – If this option is checked, then power or sample size calculations will be performed using the Linear Trend Test of association (Cochran 1954; Armitage 1955) with the weights provided in the X0, X1, and X2 boxes. Power and/or sample size for a specified significance level (see below: Section 2.3) is determined by specification of the test’s non-centrality parameter. Chapman and Nam (1968) provided the theoretical basis for determination of this parameter. For the interested reader, the formula of this parameter is provided by clinking on this link: PAWE3D02

 

2.3 General parameters

 

2.3.1 Disequilibrium measure

 

The user has the choice of two different measures of disequilibrium between trait and marker locus: (r^2) or D'. A fuller explanation for each follows below (Section 2.6).

 

2.3.2 Sampling Distribution

 

The user has the choice of two different prior distributions: Uniform or Beta.

 

Uniform prior distribution: If this option is selected, then 105 vectors of parameter values will be randomly drawn from the n-dimensional cube , where  is the number of parameters for which a range of values have been selected. The vector of values is created by drawing uniformly from the interval  for all i. For example, suppose that we choose to compute power using the genetic relative risk model (Section 2.6) we are maximizing power over disease allele frequency (Pd) and marker allele frequency (P1), with all other parameter values set to a single setting. Furthermore, suppose that the range of values for Pd is [0.01,0.10] and the range of values for P1 is [0.05,0.45]. Then n = 2, and 105 vectors of parameter values will be created by randomly and uniformly selecting from the interval [0.01, 0.10], and randomly and uniformly selecting from the interval [0.05, 0.45], keeping all other parameter settings fixed at their specified values.         

 

Beta prior distribution: If this option is selected, then 105 vectors of parameter values  will be randomly drawn from the n-dimensional cube , where  is the number of parameters for which a range of values have been selected. The vector of values is created by randomly drawing from the interval  for all i, assuming a Beta distribution on [0,1] with user-specified mean  (between 0 and 1) and variance (greater than 0), suitably transformed to the interval . Means chosen closer to 0 will have more sampling towards the lower endpoint of each parameter, while means chosen closer to 1 will have more sampling towards the higher endpoint of each range. We recommend the settings: =0.5 and = 0.15 as initial settings. To see how we obtain a Beta distribution on  with user-specified mean and variance, please clink on the following link: PAWE3D03           

 

2.3.3 Genetic model

 

Here, the user has the choice of two different methods for specifying the underlying genetic model: Genotype Relative Risk (GRR) or Quantitative Trait Locus (QTL). If GRR is chosen, then the user is queried to specify a list of settings for parameters including genotype relative risks R1 and R2 (defined below; see Section 2.6 for a full listing of parameters). If QTL is chosen, the user is queried to specify a list of settings for parameters including QTL variance and Dominance/Additivity Ratio (defined below; see Section 2.7.1 for a full description of parameters).

 

2.3.4 Include Phenotype Errors?

Selecting “Yes” for Include Phenotype error ­– If this option is chosen, then power or sample size calculations will be performed allowing for phenotype misclassification error. In other words, the parameters  (the probability that an affected individual is misclassified as unaffected) and  (the probability that an unaffected individual is misclassified as affected) (see below; Section 2.6) will be used in the calculations along with other genetic model parameters. Note that, for the QTL Genetic Model, there is no phenotype error.

 

Selecting “No” for Include Phenotype error  ­– If this option is chosen, then power or sample size calculations will be performed assuming no phenotype misclassification error. That is, every person who is classified as a case is truly affected and every person classified as a control is truly unaffected.

 

2.4 Significance level

 

Here, the user specifies the significance level of the test. This value is the probability of falsely rejecting a true null hypothesis. This number must be positive and less than 1. Typically, it is chosen to be less than or equal to 0.05.

 

2.5 Linear Trend Test weights (if Linear Trend Test option (Section 2.2) is chosen)

 

In these boxes, weights are provided for the cells corresponding to the 11 marker genotype (X0), the heterozygote 12 marker genotype (X1), and the 22 marker genotype (X2). Radio buttons are provided for three commonly used weights. They are:

 

Recessive:  X0 = 1, X1 = 0, X2 = 0

Dominant:  X0 = 1, X1 = 1, X2 = 0

Additive:    X0 = 2, X1 = 1, X2 = 0

 

One can also manually select the choice of weights by selecting the “Custom” radio button. Weights are assumed to be non-negative numbers. It should be noted that resultant power or sample size values are only valid for the Linear Trend Test with the user-specified weights.

 

2.5.1 Beta Distribution Parameters (if Beta prior distribution (Section 2.3.2) is chosen)

 

Here, the user specifics the mean  (between 0 and 1) and variance (greater than 0) for the Beta distribution on the interval [0,1]. These values are then transformed into necessary parameters for the Beta distribution. For more information, see Section 2.3.2 above.

 

2.6 Genotype Relative Risk Model parameters

 

R1: This parameter is the ratio , where  (Schaid and Sommer 1993). This value must be greater than 0.

R2: This parameter is the ratio , where  (Schaid and Sommer 1993). This value must be greater than 1.

Pd: This parameter is the disease allele frequency . It ranges between 0 and 1. Typically, smaller values (< 0.5) are assumed.

P1: This parameter is the allele frequency of the SNP allele in coupling with the disease allele. It ranges between 0 and 1, not including the endpoints.

D': This parameter, first introduced by Lewontin (1964) is a measure of disequilibrium between the SNP marker and disease locus. It ranges between 0 and 1, where 0 means linkage equilibrium (worst-case scenario) and 1 means complete disequilibrium (best-case scenario). An example where D' equals 1 occurs when the SNP marker locus is the disease locus. For more information on how this measure is used in the power and sample size calculations, please go to the following link: PAWE3D04

 

: This parameter, first introduced by Fisher (1970) is another measure of disequilibrium between the SNP marker and disease locus. It ranges between 0 and 1, where 0 means linkage equilibrium (worst-case scenario) and 1 means complete disequilibrium (best-case scenario). An example where equals 1 occurs when the SNP marker locus is the disease locus. For more information on how this measure is used in the power and sample size calculations, please go to the following link: PAWE3D05

Theta (): This parameter is the probability that an affected person is misclassified as unaffected. It ranges between 0 and 1.   

Phi (): This parameter is the probability that an unaffected person is misclassified as affected. It ranges between 0 and 1. For diseases of low prevalence, this parameter can significantly affect power to detect association (Edwards et al., 2005).

K: This parameter is the prevalence of the disease, or the probability that a randomly selected person from the population being studied is affected.     

 

The user will note that there are check boxes next to each of these parameters. If the box next to a parameter is checked, then the user specifies a range of values over which power and/or sample size is to be computed. If the box is not checked, then the user enters a single value. At least one parameter must be checked in order for PAWE-3D to run.

 

2.6.1 Relative Risk Constraints 
 

With these radio buttons, we allow for constraints to be placed on the genotype relative risks (GRR) and/or differences between the disease and SNP allele frequencies. More information is provided below.

 

2.6.2 Constraints on GRR

 

If the "Yes" radio button is selected here, then constraints are placed on the GRR values R1 and R2. There are three possible constraint equations, corresponding to different modes of inheritance for the trait. They are:

 

a) Recessive: R1 = 1 and RR = R2;

b) Multiplicative: R12 = R2 and RR = R1;

c) Dominant: R1 = R2 = RR;

 

If the "No" radio button is selected, then no constraints are placed on the relationship between R1 and R2. This situation is the one that was implemented in the original version of PAWE-3D.

 

2.6.3 Constraints on DAF/P1  

 

If the "Yes" radio button is selected here, then the disease allele frequency (DAF) and the SNP allele frequency of the marker allele in coupling with the disease allele are constrained to be a fixed distance (delta) apart. Mathematically, we write that PD = P1 + Delta. The default setting for Delta is 0; that is, PD = P1.

 

If the "No" radio button is selected, then no constraints are placed on the relationship between P1 and PD. This situation is the one that was implemented in the original version of PAWE-3D.

 

2.7 QTL liability threshold model

 

For this model, we assume that there is a QTL locus that contributes some proportion of variance to the overall variance of a quantitative outcome measure that is normally distributed with mean 0 and variance 1 in the population. We define affected and unaffected status by providing lower and upper thresholds (or cutoffs). For example, we may define as affected those individuals whose quantitative measure is greater than 2 and less than 5, and define as unaffected those individuals whose quantitative measure is between –2 and 0. Lower and upper thresholds for affected/unaffected definition are user-provided. Note that with this model, there is no phenotype misclassification.  

 

The authors gratefully acknowledge Dr. Shaun Purcell, who provided the expert guidance regarding computation of the conditional genotype frequencies for the QTL liability threshold model. Having these frequencies enables specification of each statistical method’s non-centrality parameter, which in turn determines power and/or sample size.

 

2.7.1 QTL liability threshold model parameters

 

QTLVar: This parameter is the proportion of variance for the QTL that is explained by this locus. The value must be between 0 and 1.

Dominance/Additive Ratio: This parameter reflects the ratio of dominance to additive effects of the QTL locus. It must be a non-negative number. Note that:

 

Dominance/Additive Ratio = 0 means no dominance (i.e., all effects are due to additive effects of the alleles)

Dominance/Additive Ratio  = 1 means complete dominance

Dominance/Additive Ratio  > 1 means over-dominance

 

Pd: This parameter is the frequency of the trait increasing allele. Its value ranges between 0 and 1.

P1: This parameter is the allele frequency of the SNP allele in coupling with the QTL increaser allele. It ranges between 0 and 1, not including the endpoints.

D': (See above – Section 2.6)

*: (See above – Section 2.6)

 

The user will note that there are check boxes next to each of these parameters. If the box next to a parameter is checked, then the user specifies a range of values over which power and/or sample size is to be computed. If the box is not checked, then the user enters a single value. At least one parameter must be checked in order for PAWE-3D to run.

 

2.7.2 QTL Model Thresholds

 

Affected Lower: This parameter is the lower cutoff for definition of an affected individual, assuming a quantitative phenotype that is normally distributed with mean 0 and variance 1 in the population studied. Note that this parameter must be greater than or equal to the Unaffected Upper threshold (see below).

Affected Upper: This parameter is the upper cutoff for definition of an affected individual, assuming a quantitative phenotype that is normally distributed with mean 0 and variance 1 in the population studied. Note that this parameter must be greater than the Affected Lower threshold.

Unaffected Lower: This parameter is the lower cutoff for definition of an unaffected individual, assuming a quantitative phenotype that is normally distributed with mean 0 and variance 1 in the population studied.

Unaffected Upper: This parameter is the upper cutoff for definition of an unaffected individual, assuming a quantitative phenotype that is normally distributed with mean 0 and variance 1 in the population studied. Note that this parameter must be greater than the Unaffected Lower threshold and less than or equal to the Affected Lower threshold.

 

For fuller details of how these parameters are used to compute genotype frequencies conditional on affection status, see the link: PAWE3D06

 

3. Questions? Comments?

 

If you have any questions or comments on PAWE-3D or any of the Sections provided in this Help File, please contact us at: pawe3d@linkage.rockefeller.edu.

 

References

1.         Mitra, S.K. (1958) On the limiting power function of the frequency chi-square test. Ann Math Stat. 29, 1221-1233.

2.         Cochran, W.G. (1954) Some methods for strengthening the common chi-squared tests. Biometrics. 10, 417-451.

3.         Armitage, P. (1955) Tests for linear trends in proportions and frequencies. Biometrics. 11, 375-386.

4.         Chapman, D.G. and Nam, J.M. (1968) Asymptotic power of chi square tests for linear trends in proportions. Biometrics. 24, 315-327.

5.         Schaid, D.J. and Sommer, S.S. (1993) Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am J Hum Genet. 53, 1114-26.

6.         Lewontin, R.C. (1964) The interaction of selection and linkage. I. General considerations; heterotic models. Genetics. 49, 49-67.

7.         Fisher, R.A. (1970) Statistical Methods for Research Workers. 14th ed. Hafner/MacMillan, New York.

8.         Edwards, B.J., Haynes, C., Levenstien, M.A., Finch, S.J. and Gordon, D. (2005) Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet. 6, 18.