CRI-MAP can only handle disease loci to the extent that genotypes are known: this means that with autosomal dominant loci, affected individuals are to be scored as heterozygotes at the disease locus, while unaffecteds are scored either as homozygotes for the normal allele (if you are willing to assume complete penetrance for that individual), or as genotype missing (alleles both 0) otherwise. For recessive loci, only affecteds and obligate carriers should be scored for the disease locus.
With general pedigrees, CRI-MAP deduces missing genotypes where possible, and computes a likelihood based only on analysis of the known or deduced genotypes. (Partial genotypes, i.e. those in which only one allele is known or deducible, are also utilized in the likelihood calculation to the extent possible). This likelihood is the usual likelihood (as defined for example by [ Ott (1985)] when the pedigree data are "complete" (i.e. genotypes are known or can be deduced at every locus, for every individual having descendants in the pedigree).
If data are missing for an individual at a particular locus, and the possible genotypes at that locus (i.e. those genotypes compatible with the genotypes of ancestors and descendants) include a homozygous genotype, then all meioses in that individual are treated by CRI-MAP as uninformative for that locus. A full likelihood analysis (as provided for example by LIPED or LINKAGE) would instead consider in turn each possible genotype at the locus, assign relative probabilities to each based on the genotypes of ancestors and descendants, and compute a likelihood which is the weighted sum of the likelihood expressions for each particular choice of genotype.
CRI-MAP thus ignores some of the available information. In cases that we have examined, however, the information loss appears to be small. If the missing locus genotype is in an "original parent" (i.e. an individual with no ancestors in the pedigree), then in a full likelihood analysis the population allele frequencies are used to assign probabilities to various possible genotypes. These enter into the likelihood by influencing the probability that the allele in any child of the original parent is derived from that parent.
CRI-MAP does not make use of allele frequencies; for any allele in a child of an original parent, CRI-MAP determines the parental origin when this can be deduced from the other genotype information, but otherwise assigns equal probability to the two possible parental origins for the allele. In fact, little information is lost by this procedure, except when the allele is rare. For a rare dominant disease, if one parent is known to be affected, the other should therefore be scored as unaffected (rather than as missing, for example). It should be noted in any case that the frequencies of RFLP alleles have usually been estimated in a different population (e.g. the CEPH family parents). From the population from which the disease family is drawn. It is our experience that allele frequencies may vary dramatically between populations. Therefore it may be inappropriate to perform a full likelihood analysis of disease linkage, if the results of that study depend in an essential way on parameters estimated from another population.
A somewhat more limited analysis which makes no assumptions concerning allele frequencies, such as that given by CRI-MAP, may be preferable. If (for reasons of power) a full likelihood analysis is necessary, that analysis should be performed using a range of allele frequencies at each locus, in order to ensure that the results do not depend in an essential way on these frequencies.
Finally, it should be noted that whenever some available information is arbitrarily excluded, as is the case with CRI-MAP's analysis of general pedigrees, there is a potential that artifactual biases in parameter estimates can be created. In a preliminary examination of this possibility [cf. Goldgar et al, (1989) ], we have compared the results of CRI-MAP with those of a full likelihood analysis program (LINKAGE) for all pairs of a set of 20 linked loci, in a data set having 136 pedigrees, most with missing individuals. The information loss (as estimated by comparing the effective number of informative meioses for the two programs) averaged about 15%, and the estimated recombination fractions were almost exactly the same.
This suggests that bias is not significant and information loss is small, at least in this particular data set. However, the possibility of bias clearly needs to be considered more thoroughly by means of simulation studies involving multiple loci, and we are in the process of doing this. I would recommend that you check the extent of information loss and/or possible bias in your data set by comparing the results of CRI-MAP to those of a full likelihood program for small sets of loci.
previous section: 1. introduction
next section: 3. getting started