Traditionally, large pedigrees with many affected individuals have been studied for the purposes of detecting linkage. This may not be a productive strategy for traits determined by multiple loci with one or more of them having common susceptibility alleles. It has been stated that more distantly related affected relative pairs (i.e., 2nd cousins) give more information for linkage than affected sib pairs because they are less likely to share an allele at a marker identical by descent (IBD) by chance alone but are very likely to be IBD if there is linkage to the susceptibility allele. However, if the susceptibility allele is common, then distantly related relative affected pairs may not be as likely to be IBD at the susceptibility locus and hence will give less evidence for linkage. Risch (1990) demonstrated that affected sib pairs are more powerful for multiplicative models when the relative risk is low.
In our study, we looked at three models of inheritance for a susceptibility locus with a Risch's lambda of 1.75 and susceptibility allele frequencies (p) of 0.0025, 0.025, and 0.25 respectively. The trait was linked to a marker with 8 equifrequent alleles with = 0. The penetrances for the homozygotes and heterozygotes for the susceptibility allele were 0.10 and 0.05 respectively. The genetic linkage analysis software programs used were SIBPAL, ASPEX (both of which require breaking pedigrees down into nuclear families), and GENEHUNTER (which analyzes large pedigrees). We simulated 1000 replicate series in which there were either 50 affected sib pairs families, 25 pedigrees with two sib pairs that were first cousins of each other, or 25 pedigrees with two sib pairs that were second cousins of each other. We found that when p = 0.0025, using GENEHUNTER to analyze the large pedigrees with 1st or 2nd cousin sib pairs had the most power to detect linkage. However, when p = 0.25, the most power came from analyzing nuclear families with affected sib pairs or by breaking the pedigrees down into nuclear families. In all of the analyses, SIBPAL and ASPEX gave very similar results. We conclude that for traits with complex inheritance and possibly common susceptibility alleles, analyzing nuclear families with affected sibs may provide the best strategy to detect linkage.
Several weighting schemes have been proposed for affected-sib-pair (ASP) analysis when sibships containing more than two affected individuals are included in a sample. Under the null hypothesis of no linkage to disease, any weighting scheme will give a valid test of the null hypothesis and weights are often chosen to maximize power. However, if linkage between a marker and the disease is established, the estimated locus-specific relative risk may be compared to the overall population sibling relative risk to determine how much of the disease effect has been accounted for and whether or not to continue the search for additional susceptibility genes. The Risch likelihood model assumes that the sibships have been ascertained through random sampling of ASPs. We show here that relative risk estimates obtained using ASP data are asymptotically unbiased only when each pair is given a weight proportional to the inverse of the true sibship ascertainment probability. In practice, sibships of different sizes may have been sampled with unknown probabilities. It is therefore important to determine the possible extent of the bias when analyzing data assuming an incorrect ascertainment scheme. We simulated data under various genetic models and sampled the data according to three ascertainment schemes: single ascertainment of pairs (SAP), of individuals (SAI), and complete ascertainment (CA). Each data set was then analyzed using the three weighting schemes corresponding to SAP, SAI and CA, and the extent of the bias resulting from an incorrect ascertainment assumption was examined. In many cases the bias of the relative risk estimates was small; for some models, notably single-locus recessive models and some two-locus heterogeneity models, large biases were found. We observed large-sample expected values of locus-specific sibling relative risks as small as 0.3 times and as large as 4.0 times the true value. If unbiased estimation is the goal, we recommend analysis of the sensitivity of the parameter estimates to different weighting schemes. If the values of the estimates depend heavily on the weights, knowledge of the true ascertainment scheme, and the correct weights, are necessary to ensure accuracy of the relative risk estimates.
Sib-pair is a simple command driven program that performs a number of "non-parametric" analyses on family data. These are mainly based on (MCMC) simulations of the null-hypothesis distributions for the test statistic given the pedigree structure and estimated allele frequencies. Included are tests for allelic association, transmission- disequilibrium, and linkage. The latter include IBS and IBD-based affected-pedigree-member methods, some using information from both affected and unaffected relatives after Ward (Am J Hum Genet 1993; 52 1200). These methods are fast in large pedigrees. Examples will be presented of use. A beta-test version (DOS binary or Fortran 77 source) is available at http://www.qimr.edu.au/davidD/davidd.html. A related program Gconvert reads Linkage-format pedigree files and outputs files for APM, CRI-MAP, FISHER, MENDEL, PAP, SAGE etc.
Discordant sib pairs (DSPs) can be used to efficiently map qualitative traits under certain conditions. For example, DSPs are more powerful than affected sib pairs (ASPs) when sibling recurrence risk is high. On this basis, DSPs may be potentially useful for diseases such as diabetic nephropathy (DN), where sibling recurrence risk exceeds 70%. To analyze DSP data, statistical methods are used to test for the diminished allele sharing that is characteristic of linkage. Here, we present a different setting in which linkage is characterized by diminished allele sharing, graft-versus-host disease (GVHD) resulting from bone marrow transplant, and demonstrate that GVHD siblings (i.e., a bone marrow donor and an HLA-identical recipient sibling who develops GVHD) may be considered as a special type of DSP. Thus, DSP methodology can be used to map putative minor histocompatibility loci, which may contribute to GVHD in HLA-compatible siblings.
In an effort to identify the most efficient statistical test for DSPs, both traditional and GVHD, we derive a ``possible triangle'' (PT) test analogous to the test of Holmans (1993) for ASPs. We also derive the asymptotic distributions of the DSP PT test and of the restricted (R) test used by Rogus and Krolewski (1996) for DN. Using asymptotic estimates of power, we find the R test to have greater power than the PT test over much of the DSP possible triangle and to be significantly inferior in only a very small area. Although asymptotic power estimates tend to be slightly lower in general, we also obtain similar results via simulation. Finally, we investigate the effects of heterogeneity on power and discuss how stratification on major HLA antigens may improve power to detect GVHD loci.
We conclude that DSP methodology may be valuable for mapping GVHD, as well as diseases with high sibling recurrence risk, that the R test is typically the most efficient DSP test, and that stratification by natural subgroups (such as different background HLA genotypes) may substantially improve power to detect linkage using DSPs.
To detect quantitative traits with reasonable power, the Haseman-Elston regression approach using randomly sampled sibpairs generally requires high trait heritability and/or large numbers of sibpairs. As illustrated by various authors (Eaves and Meyer, 1994), (Risch and Zhang, 1995), using sib pairs with extremely discordant (ED) trait values reduces greatly the sample size to detect linkage. On the other hand, simulations (Gu et al, 1996) have shown that a combined strategy of EC and ED pairs is optimal under most modes of transmission. To date, all investigations into the efficiency of using selected samples have been based on sib pair data. Assuming that the same conclusion would also hold for sibship data, it is not clear what statistic one should use to measure the information from a sibship of a given size using sibship oriented variance components approaches. We are searching for the optimal statistic(s) and evaluating the merits of ED, EC approach with the combined approach using sibship data. We have compared sampling criterion based on the sibship variance as well as higher moments for power to detect linkage. Our simulation under a two-locus quantitative model with a trait heritability of 50% (at an allele frequency of 0.1) indicate that the combination of top and bottom 10% of the sibships ranked by intra sibship variance is able to recover 63.4% of linkage information, and appear to have better power than choosing the top or the bottom 20%. When the data were simulated under a one-locus model with 75% heritability, however, using the top 20% or sibships with highest variances can extract roughly 90% of the linkage information, therefore is more powerful than the alternatives.
The human species has been steadily growing during the last several thousand generations. The growth rate has been modest, in the order of 1% increase per generation for most of the time, but such modest growth causes strongly inhomogeneous allelic distributions compared to stable populations. An exact recursive solution was obtained for the branching process that takes place in the reproduction of alleles through population growth and genetic drift. The model estimates the number of meioses present in the descent tree leading from one ancestral allele to all copies of this allele in the present population. The number of meioses divided by the present number of copies in a generation (called: meiotic count) can be estimated to be about 3.5 on the basis of the historical growth rate of the world population. It can be estimated that after 300-1000 generations millions of alleles will have descended from one original allele, which implies inversely that coalescence of any allele can be expected to occur between 300 and 1000 generations ago. The mean number of copies per mutant allele can be shown to be independent of the mutation rate and equal to: (Number of generations to coalescence)/Meiotic count (about 85-280). The standard deviation of the number of copies per mutant allele is 2-4 orders of magnitude higher, even when moderate (1% or less) selection against a mutant allele is present. For most loci that mutate at a rate of about 10-6 per meiosis likely no major mutations or only one will be present in founder populations. Which of many possible genes causes most disease depends therefore on random factors.
The expected haplotype sharing surrounding major mutations is, depending on assumptions on locus heterogeneity and phenocopies, detectable with 5-10 cM genome screens in phase-known samples of 30 or more patients. Such haplotype sharing tests are statistically independent of other tests and are, if applicable, an additional gene mapping tool with power characteristics that compare favourably to association and transmission distortion tests.
A non parametric method of linkage analysis denoted as WPC Weighted Pairwise Correlation has been introduced by Daniel Commenges [Genet. Epidemiol. 1994, 11: 189-200]; it allows to analyze any kind of phenotype (quantitative, binary, binary with age of onset) and considers all pairs of relatives in large pedigrees. The principle of WPC is to test if two relatives having close phenotypes also resemble at the marker locus more than expected under the null hypothesis of no linkage. The marker resemblance between two relatives in the WPC method is presently estimated by the proportion of alleles shared IBS Identical By State. However, the use of IBD Identical By Descent information is expected to increase the power of WPC approach, as it has been shown in other non parametric methods of linkage. Here we propose a method to incorporate the IBD information into the WPC approach. For any kind of relative-pairs, the computation of the proportion of alleles shared IBD is based on the identification of the closest couple of ancestors, denoted as the reference couple. IBD information is obtained for pairs of relatives having the same reference couples using individual genotypic vectors derived from this couple. This reconstruction of the IBD information is performed rapidly even in large pedigrees. Simulation studies conducted under various genetic models confirmed that use of IBD instead of IBS information leads to a large increase of power, especially in the situation of poorly informative markers.
Sib-pairs linkage studies are widely used to investigate the genetic factors involved in quantitative traits. The classical Haseman-Elston (HE) method (Behav Genet 1972, 2:3-19), which regresses the squared difference of the sib-pair phenotypes on the expected number of allele shared identical by descent, needs to decompose the sibship into sib-pairs, and assumes normally distributed and uncorrelated errors. We propose a maximum likelihood binomial method (MLB) which considers the sibship as a whole, and is based on the introduction of a latent binary variable Y capturing the linkage information between the observed quantitative trait Z and the marker M. The first part of the likelihood depends on P(M|Y) which is expressed using a binomial distribution of the marker parental alleles among sibs. The second part of the likelihood is P(Y|Z) which can be specified assuming a parametric distribution for Z (e.g. normal), but also without any assumption on Z (fully nonparametric method). This method provides a simple likelihood-ratio test for linkage involving a single parameter. In the case of various sibship sizes, simulation studies showed that the MLB approach (fully nonparametric or specifying a normal distribution for Z) provides very consistent results in terms of type I errors whereas HE method provides large inflation of 0.001 type I errors, and yields power levels generally higher than those of the HE method. In the case of selected (discordant and/or concordant) sib pairs, the MLB approach is at least as powerful as the methods developed in this context (Risch and Zhang, Science 1996, 268:1584-1589; Gu et al, Genet Epidemiol 1996, 13:513-533). The MLB method appears to be a quite interesting alternative for mapping quantitative trait loci in humans.
The software package, SimIBD, is a powerful, robust method for detecting linkage between a marker and a disease gene by measuring identity-by-descent (IBD) sharing in all affected relative pairs in general pedigrees. It employs conditional simulation to arrive at an empirical p-value that is very robust to marker allele frequency misspecification. However, when only partial typing information is present, SimIBD uses an approximation to the actual IBD sharing that can lead to a substantial decrease in power to detect linkage when genotyping is sparse. We describe a new approximation for partially-typed pedigrees that uses simulation to fill the genotypes of untyped individuals conditional on the genotypes in typed individuals in the pedigree. A comparison using multiple sets of 100 nuclear families with four children each (at least two of whom were affected) simulated under several two-locus disease models shows that the new approximation improves average power to detect linkage over the old SimIBD approximation and performs nearly as well as ASPEX [Hauser ER et al. (1996) Genet Epidemiol 13:117-137; Hinds D, Risch N (1996) ftp://lahmed.stanford.edu/pub/aspex], a popular sib-pair analysis package which performs full maximum likelihood calculations (Table 1). Although we show results here for nuclear families, SimIBD is not limited to nuclear family data and can analyze extended families, including loops. We also describe how bootstrapping can improve the precision of the empirical p-value without major increases in computation time.
Table 1. Average power to detect linkage at p>0.05 in 100 nuclear families.
Program Both parents One parent Both parents typed typed untyped SimIBD: Old approx. 74% 63% 41% SimIBD: New approx. 74% 70% 64% ASPEX 74% 73% 69%
This work was supported by NIH grants HG00719 and EY09859, the University of Pittsburgh, the Association Française Contre Les Myopathies (AFM), and the Wellcome Trust Centre for Human Genetics.
In general, lodscores offer a greater power than affected sibpair (ASP) methods. However, there might be circumstances under which ASP will out-perform lod score methods. Recently, Dizier et al. [1996] reported that in 3 out of 14 complex genetic models, ASP methods appeared to have greater power than lodscore methods using parameters estimated by segregation analysis. Provided that the mode of inheritance (MOI) is correctly specified at the linked locus, it has been shown for complex models, that a single-locus (SL) analysis with reduced penetrance provides a close approximation to the analysis under the correct model.
We investigated the 3 situations where ASP methods appeared to yield more information for linkage than lodscores by performing a SL analysis twice - assuming a dominant and a recessive MOI at an arbitrary penetrance of 0.5. Data were simulated according to the parameters specified by Dizier et al. (2-locus dominant-recessive models, the recessive locus is linked). The higher of the two lodscores were compared to the ASP results given by Dizier et al.:
B2 chi2 (2df) = 14.4 chi2 (1df) = 26.7 E2 chi2 (2df) = 21.2 chi2 (1df) = 44.2 F2 chi2 (2df) = 15.9 chi2 (1df) = 24.9Our results show that even for complex genetic models, a lodscore analysis performed twice - under a simple dominant and recessive MOI - has substantically greater power than ASP methods. This holds even true when the significance level of the lodscores is corrected because of multiple testing (i.e.twice). The results by Dizier et al. question the usefulness of using parameters from a complex segregation analysis in linkage analysis, rather than demonstrate the superiority of ASP methods over lodscores. The important factor in linkage analysis is the MOI at the locus under investigation whereas a segregation analysis yields information about the mode of inheritance of the trait
In linkage analyses based on testing for excess allele sharing in affected sib pairs, the expected proportions of pairs sharing alleles identical by descent (IBD) are usually constrained to fall within the plausible triangle (Holmans P., Am J Hum Genet 52:362-374,1993). When the sib-pair sharing can vary with covariates, situations can arise where it may not seem appropriate to constrain the proportions within each covariate subgroup, or where the order in which constraints are applied affects the results.
We simulated 400 sets of 100 affected sib pairs under 2 models where genes interacted with an environmental factor (EV), assuming IBD status known. Model 1 is a complex model where EV affects two genes. The relative risk (RR) for gene 1 was 5 when EV was present, and the RR for gene 2 was 5 when EV was absent. All other RR were 1. Model 2 is a single gene model with an additional risk associated with EV. The RR for gene 1 was 2, with an additional RR of 5 for gene 1 when EV was present.
Models with none (Nc) or 2 binary covariates (Cov) (EV present both sibs, EV discordant) were analyzed without constraints (NCn), with constraints in each of the 3 subgroups (SCn), or with constraints applied to the average over the 3 subgroups (ACn). Heterogeneity tests (Het) compare models with covariates to those without. Power estimates are based on the appropriate cutoffs from null simulations, and are as follows:
TEST-MODEL Nc-NCn Nc-ACn Cov-NCn Cov-ACn Cov-SCn Linkage-1 0.07 0.13 0.13 0.17 0.20 Linkage-2 0.18 0.41 0.15 0.21 0.33 Het-1 0.12 0.12 0.17 Het-2 0.07 0.09 0.02Power is low for these complex models with a sample size of only 100, but all constraints improve power. Constraints within subgroups give the best results even though they may be inappropriate for model 1. Power is exceptionally low for the heterogeneity tests in model 2, which are essentially trying to detect a qualitative interaction.
It is well-known that one particular ASP analysis is statistically equivalent to Lod scores calculated under a simple recessive model (Knapp et al., Hum Hered 44, 1994). Thus it is of interest to examine the correlation coefficient (p) between Lod scores calculated under dominant (D) or recessive (R) models (both with reduced penetrance and nonzero phenocopy rates) and the log-likelihood statistic from a new ASP method (SIBPAIR, Satsangi et al., Nature Genet 14, 1996). We did this in a real dataset of 23 PD pedigrees, with the p's calculated across markers. We then simulated datasets under 4 models (D vs. R; recombination fraction theta= .05 vs. .50), 100 replicates of each model. The simulated datasets matched the real data in pedigree structure, individual diagnoses, and DNA availability, and were analyzed in the same way as the real data; p's were calculated across replicates. There was a moderately high p between R-based Lod scores and SIBPAIR, and p's were generally lower between D-based Lod scores and SIBPAIR. All p's were significantly different from 0 (P > .02) except for p = .13 (P > .2). We found that parental diagnosis is a factor in the observed differences between Lod score and SIBPAIR results. Future work should undertake more extensive simulations and analytic studies to further characterize and quantify the factors (pedigree structure, parental diagnoses, etc.) contributing to the similarities and differences in information used by Lod scores and by ASP methods.
Pearson's p calculated between R- or D-based Lod scores and SIBPAIR statistic. Analysis Real data Simulated Data model theta=0.05 theta=0.5 R D R D R 0.45 0.60 0.61 0.52 0.51 D 0.20 0.25 0.50 0.46 0.13
Genome scans for complex genetic traits can be analyzed efficiently using multipoint affected-sib-pair (ASP) methodology. Because of the nature of complex traits, replication of linkage findings in additional families or in different populations is seen as a crucial step in verifying the identification of a locus for a complex trait. There has been much discussion about the magnitude of the lod score required to confirm a linkage result but little discussion about localization in confirmation studies. We investigated linkage confirmation in terms of the estimates of disease gene location in the context of a genome scan for a complex trait using the program SIBLINK (Hauser et al. 1996) which applies Risch's (1990) affected-sib-pair model to a complete multipoint map. We used simulation to examine the variability in the maximum likelihood estimate of disease locus location (\hat{d) for different map densities, locus-specific risk ratios (lambda), lod score critical values, and sample sizes. We created 2, 4, 10, 20, and 40 cM intervals around \hat{d and calculated the proportion of intervals which included the true disease locus location, D. In the no linkage case (lambda =1) we observed a tendency for the maximum lod score to occur at either end of the map, especially for the 10 and 20 cM map densities. As expected, larger values of lambda, larger sample size, or higher map density resulted in a larger percentage of replicates containing D in the interval. In the best case scenario (lod >3, 2 cM map density, 400 ASPs, and lambda=2), 100% of 40 cM intervals, 99% of 20 cM intervals, 94% of 10 cM intervals, 61% of 4 cM intervals and 17% of 2 cM intervals included D. For a more typical 10 cM map density the percentages including D were 99, 97, 81, 36, and 8 for 40, 20, 10, 4, and 2 cM intervals, respectively. These results have important implications for placement of follow-up markers after an initial suggestion of linkage and for the interpretation of findings in replication studies of complex genetic traits.
Affected sib pairs (ASPs) are a powerful tool for linkage analysis. Discordant sib pairs (DSPs) are also powerful in some situations (Blackwelder & Elston 1985, Risch 1990, Rogus and Krolewski 1996). Non-amplifying or null marker alleles are a common source of genotyping error in sib pair linkage data. A null allele is a marker allele that cannot be seen and is therefore incorrectly scored if codominance of alleles is assumed. Null allele heterozygotes generally look like and are scored as homozygotes for the non-null allele. When parental genotype information is unavailable, it is not possible to identify the presence of a null allele by apparent non-inheritance.
By examining the probabilities of sib-pair identity by state arrangements, I show that under most conditions an undetected null allele will decrease the observed allele sharing for a marker. Using simulation, I quantify the effects of an undetected null allele on the power of ASP and DSP linkage analyses using a lod score test (Risch 1990).
In summary, the impact of an undetected null allele can be considerable, significantly reducing power to detect a linked marker for ASPs, or significantly increasing the risk of false positive linkage with an unlinked marker for DSPs. For ASPs, a linked marker with a sibling relative risk lambdas=2, null allele frequency p0=.05, and 4, 8, or 16 equally frequent non-null alleles, power is reduced an average of 32% over what would be present if the null allele were correctly typed. For p0=.10 and p0=.025, power is reduced by 54% and 18%, respectively. For DSPs, an unlinked marker, null allele frequency p0=.05, and 4, 8, or 16 non-null alleles, the false positive rate is increased on average by a factor of 2.2; for p0=.10 and p0=.025, the average factors are 5.6 and 1.7, respectively.
As researchers seek to map more complex genetic traits, the ability to extract as much power as possible from the available data will become ever more important. Given the strong effects on power of undetected low frequency null alleles in sib pair data, I advocate the testing of all markers for null alleles prior to performing linkage analyses. Statistical methods for detecting null alleles in sib pair data are discussed.
Power to detect linkage by the affected sib-pair (ASP) test and transmission/disequilibrium test (TDT) critically depends upon the magnitudes of (Ps-.5) and (Pt-.5), respectively. Ps denotes the probability of ASP allele "sharing" or the probability that a randomly ascertained parent of an ASP transmitted the same marker allele (identical-by-descent) to both affected sibs. Pt denotes the probability of parental allele "transmission" or the probability that a particular marker allele (e.g. allele A) was transmitted to an individual affected child by a randomly ascertained and informative (A/non-A) parent. Assuming linkage between a biallelic marker and biallelic disease locus, I have demonstrated that Ps=.5+(Ls)(Ms)(Rs) and Pt=.5+(Lt)(Mt)(Rt). In these expressions, the factors Ls and Lt depend only on the recombination fraction (theta) between marker and disease locus, the factors Ms and Mt depend on marker allele frequency (m) and disequilibrium (delta) between marker and disease locus, and the factors Rs and Rt depend only on the frequency of disease-causing allele D and the three penetrances of the disease locus genotypes. Based on analysis of the expressions for Ps and Pt, I will present several major findings including: (1) Disequilibrium (delta) increases the magnitudes of both (Ps-.5) and (Pt-.5); (2) The value of Ps for a completely polymorphic marker equals Ps for a biallelic marker in equilibrium with the disease locus; (3) Previous analytic investigations of TDT power such as the analysis by Risch and Merikangas (Science 273:1516-17, 1997) are special cases of a more general framework provided by expressions for Ps, Pt, and the proportion (H/F) of ascertained parents who are informative at the marker. This general framework can be used to compare the power of the TDT and ASP test for genome scans or for tests of a single candidate gene.
The choice of allele-sharing statistic can have a great impact on the power of robust affected pedigree member methods. Similarly, when allele-sharing statistics from several pedigrees are combined, the relative weight applied to each pedigree's statistic can affect power. Here we show that under a single gene model, the optimal allele-sharing statistic will have the property that if the genotypes of affected pedigree members are permuted, the value of the statistic is unchanged. That is, within a given pedigree, the value of the statistic is not affected by whether observed sharing is between more closely or more distantly related individuals. For specific classes of two-allele models, we give the most powerful statistics and optimal weights for inbred and outbred affected pedigree members and for outbred discordant sib pairs. We also consider the inverse problem: for which two-allele models are certain commonly-used statistics (Spairs and Sall) optimal? We find that Sall is not the optimal statistic for any two-allele model in general, although it can be in certain small pedigrees. We also find that for two-allele models for which the deviation from null sharing is large, the correspondence between allele-sharing statistics and the models for which they are optimal may also depend on which method is used to test for linkage.
It is customary to maximize the affected-sib-pair (ASP) lod score under the genetic constraints (Holmans, 1993, AJHG 52:362). If instead one has sampled discordant sib pairs, one may maximize the discordant-sib-pair (DSP) lod score subject to the appropriate genetic constraints. Here, I point out that there exist two triangular regions of the unconstrained parameter space that give positive lod scores under both the ASP and the DSP genetic constraints. As an example, suppose for a sample of 100 sib pairs and fully informative marker data, one obtains unconstrained MLEs of (z0, z1, z2) equal to (.05, .80, .15), where zi is the probability that a sib pair shares i alleles IBD conditional on the disease phenotype of the sib pair. If the sample consists of ASPs, the constrained MLEs equal (.125, .500, .375) and the maximum lod score (MLS) is 2.84. If the sample consists of DSPs, the constrained MLEs equal (.28, .57, .15) and the MLS is 1.29. That is, the underlying configuration of marker data gives evidence in favor of linkage regardless of the disease phenotype of the sib pairs. Clearly, it is not sensible to consider a set of marker data to be in support of linkage using ASPs if the same marker data would also provide evidence in favor of linkage if the pairs were DSPs, without careful consideration. This situation should arise rarely in practice, although it may arise more frequently than expected if the ASP model assumptions are also violated. As a general policy, I recommend that the unconstrained parameter estimates always be reported. If a set of parameter estimates show this inconsistency, one way to resolve the dilemma would be to determine if the unconstrained estimate of mean allele-sharing z1/2 + z2 is greater than or less than 0.5. I will use graphical methods to illustrate the portions of the parameter space that give rise to this type of contradictory result, and show that the line z1/2 + z2 = 1/2 provides a reasonable resolution of the problem. I will further show that the constraints that apply to extreme discordant sib pairs (EDSPs) make the problem even worse, but that the same solution can be applied.
With the advance of the human genome project, attention is being paid to the possibility of large scale, genome-wide association studies. Because of the large volume of genotyping required by such studies, it is important to consider efficiencies that can be brought to bear on this problem, both in terms of study design and analysis. The greatest efficiency can be obtained by DNA pooling, whereby only a small number of DNA pools are genetically characterized rather than the large number of individuals underlying these pools.
Here we consider study designs based on nuclear families - affecteds with parents, or affected and unaffected sibs without parents. We consider statistics based on either pooled DNA (e.g. affected children forming one pool, parents another; or affected sibs forming one pool, unaffected sibs another) or individual genotyping. For sibships without parents and individual genotyping, we introduce a novel disequilibrium statistic (the sibship-based disequilibrium or SD test) which is both powerful and robust.
For sibships without parents, the power of affected-unaffected sib pairs is about half that of singletons with parents, and increases proportionately with the number of unaffected sibs. Designs with two affected sibs are generally superior to those with single affecteds, especially when the disease allele frequency is low.
Pooled studies require statistics based on overall allele frequencies; these tests may be sensitive to population stratification and have inflated type-1 errors, although the power is typically comparable for pooled and unpooled data. Therefore, we recommend a two-stage procedure. First, tests should be performed based on DNA pooling to identify interesting loci for follow-up by individual genotyping (second stage). However, even for this second stage, we show that genotyping efficiency can still be enhanced by some sample pooling with no loss of power or robustness.