The running time of LINKAGE and FASTLINK grows rapidly with the number of alleles specified for each locus used in a run. Therefore, it is important to specify no more alleles than are actually needed for the analysis. Various partial solutions to the "extra allele" problem have been implemented by:
Ellen Wijsman (in the context of LIPED) Jathine Wong and Cathryn Lewis (in the context of LINKAGE/FASTLINK) Scott Diehl, Bettie Duke, and Lynn Ploughman (in the context of MENDEL) Alan Young (in the context of GAS)At the end of this essay we briefly describe the the partial solution implemented by Wijsman and Diehl-Duke-Ploughman. In the context of FASTLINK, their solution is applicable only to the LINKMAP and MLINK programs. We have not yet implemented an extension of their solution in FASTLINK 3.0P.
A concrete FASTLINK example:
Suppose the general population has the possibilities:
Allele 1 2 3 4 5 6 Frequency .3 .2 .15 .1 .22 .03and this is encoded in the locus file (datain.dat).
Suppose that the pedigree(s) encoded in the pedigree file (pedin.dat) contain only the alleles 2, 4, and 5. LINKAGE and FASTLINK require that the alleles be numbered consecutively starting at 1. Therefore, in the process of reducing from 6 to 4 alleles it is necessary to renumber the alleles.
Renumber old allele 2 to be new allele 1 with frequency .2 Renumber old allele 4 to be new allele 2 with frequency .1 Renumber old allele 5 to be new allele 3 with frequency .22 Create catch-all allele 4 with frequency .48 (sum of frequencies of old 1, old 3, old 6)No person should have the catch-all allele, but it is absolutely wrong to omit the catch-all allele.
Important technical note: the process of renumbering alleles to reduce their number loses no information in a statistical sense, unless one is estimating allele frequencies. Renumbering is distinct from "downcoding", in which multiple alleles that are distinct and do occur in the population are given the same number, in the interest of reducing running time. In general, downcoding loses information, although there are some special situations in which it does not because the frequencies of some different alleles happen to be identical.
The MLINK and LINKMAP programs analyze each pedigree one at a time, and sum the values of -2*(log(likelihood)) for each pedigree. Since allele renumbering makes sense on a per pedigree basis, it is valid to renumber alleles for each pedigree in an optimal manner. This requires using a different locus file for each pedigree because the renumbering may assign the same new allele number to different old alleles. One annoyance of doing the analysis for each pedigree separately is that the output values must be summed. The process of automating the separation of input pedigrees and combination of output results was automated for LIPED by Ellen Wijsman and for MENDEL by Scott Diehl, Bettie Duke, and Lynn Ploughman.
The above solution does not work for ILINK or LODSCORE.