As described in the papers:
R. W. Cottingham Jr., R. M. Idury, and A. A. Schaffer, Faster Sequential Genetic Linkage Computations, American Journal of Human Genetics, 53(1993), pp. 252-263.and
A. A. Schaffer, S. K. Gupta, K. Shriram, and R. W. Cottingham, Jr., Avoiding Recomputation in Linkage Analysis, Human Heredity, 44(1994), pp. 225-237and
A. A. Schaffer, Faster Linkage Analysis Computations for Pedigrees with Loops or Unused Alleles, Human Heredity, to appear.this directory and its subdirectories contain version 2.1 of faster versions of the general pedigree programs of LINKAGE 5.1. Several of our users of earlier versions 1.0 and 1.1 have dubbed the new programs FASTLINK. A PostScript version of the papers can be found in the file paper1.ps, paper2.ps and paper5.ps. Please cite the first two papers if you use these programs in a published experiment. paper1.ps describes the algorithmic changes introduced in version 1.0; paper2.ps describes the algorithmic changes introduced in version 2.0, paper3.ps describes the algorithmic changes introduced in version 3.0P. Papers describing the parallel implementation are paper3.ps and paper4.ps.
There are several significant changes in each version from the user's perspective.
2.SEPARATE COMPILATION.
All three programs have been split up into multiple definition and code files. The code is still similar but it is organized differently. The main advantages of separate compilation are:
Version 2.0 has the code split up even further (than 1.1) into separate files.
4. MLINK. Version 2.0 includes MLINK, while 1.0 and 1.1 did not.
5. CHECKPOINTING. If the computation of LODSCORE or ILINK crashes it can restarted. The checkpointing process is described in a separate README file called < ahref="fl_checkpoint.html">README.checkpoint . The granularity of checkpointing that we are doing (roughly every two likelihood function evaluations) is not applicable to LINKMAP or MLINK where all the likelihood function evaluations are essentially independent. Checkpointing is new with version 2.0.
6. BETTER HANDLING of LOOPS. On some looped pedigrees version 2.0 will be much faster than version 1.1. Read paper2.ps for details.
Fixed various compilation problems that impeded portability to different systems and compilers.
Fixed a bug that arose when the first pedigree was not informative for the loci chosen. Bug was fixed and posted to the server on March 2, 1994. Thanks to Tim Magnus (U. Calgary) for the bug report.
Fixed a bug in MLINK that caused it to give bad results when the number of pedigrees was greater than 127. Bug was fixed and posted to the the server on March 2, 1994. Thanks to Tim Magnus (U. Calgary) for the bug report.
8. Documentation. Wrote two documents, traverse.ps, and loops.ps, to explain how pedigree traversal is done in LINKAGE and FASTLINK. These documents are primarily intended for users who wish to actually modify the code or are generally annoyed that they do not know what is inside the LINKAGE/FASTLINK "black box". These documents tell you "Things I wish I had known when starting on FASTLINK development in 1992". Thanks to Dan Weeks (U. Pittsburgh), Brian Nichols (U. Iowa), Meg Gelder (Rice) and Sandeep Gupta (Rice) for asking me to help them look inside the code and understand what is going on.
9. Removed p2c. Earlier versions relied on the p2c library. Dependence on the p2c library was created when translating the LINKAGE programs from PASCAL to C originally. Various users have indicated that the need to get the p2c library and install it was a menace to easy installation of FASTLINK. The problem is now solved.
10. Sanity checks for constants. Cathryn Lewis (Utah) and Gerard Tromp (Thomas Jefferson U.) suggested that FASTLINK print, at the beginning of each run, a diagnostic message indicating in genetic terms what the characteristics of the run are, and whether the constants (see README.constants ) have been set correctly. This is now done. The diagnostic message is printed on the screen and also appended to the file FASTLINK.err. If you wish to suppress this message, set DIAGNOSTIC to 0 in commondefs.h.
11. Dynamic memory allocation. Some of the big data structures are now allocated dynamically at runtime. This enables FASTLINK to be less conservative about their size. This makes it possible to do runs whose memory requirements are close to the virtual memory resources of your system. Thanks to Carol Haynes (Duke) for daring me to fix this.
12. More polite exits. FASTLINK should now exit more politely when you do not have enough memory to do the run you want to do and the error occurs at runtime. Thanks to Gerard Tromp (Thomas Jefferson U.) for suggesting this change.
13. Flushing output. LINKMAP and MLINK now flush their output to disk after each candidate theta vector. This is a crude form of checkpointing. Thanks to Luc Krols (U. Antwerp) and other users for suggesting this change.
One bug caused nonsensical results to be reported in LODSCORE. This bug was inherited by FASTLINK from LINKAGE 5.1. It was fixed in LINKAGE 5.2. The same fix now appears in FASTLINK's LODSCORE.
The other bug caused the wrong value to be printed to one output file for the ratio of thetas. There was no problem with the likelihood computations.
Thanks to Jerry Halpern (Stanford) for the bug report.
Thanks to Ellen Wijsman (U. Washington) for the bug report.
The bug manifests itself as follows. At least one of the recombination fractions must be 0.0. Say that it is the one between loci a and b. The data must have a forced recombination between a and b. Because of the the likelihood should be -inifinity and FASTLINK 2.2 reports this correctly (as - a very big number). In LINKAGE and earlier versions of FASTLINK, the 0.0 might get misrepresented in the computer as a very small, but strictly positive number. This leads to a very negative, but far from infinite (and incorrect) likelihood.
This bug is discussed at length, albeit in the context of CMAP rather than LINKMAP on page 122 of Handbook of Human Genetic Linkage by Joseph Douglas Terwilliger and Jurg Ott.
NB: If you compare LODSCORE or ILINK from LINKAGE to FASTLINK and get *different* answers in a case where one of the theta components is .001 *and* somewhere in outfile.dat you see the message: "A VARIABLE WAS SET TO A BOUND", then you have almost certainly hit either this bug in LINKAGE or the bug fixed in item 3. above, or both. Thanks to Sandeep Gupta for detecting this bug.
dostream (default value true)
fitmodel (default value false)
score (default value true)
approximate (default value false)
If you tried to use values opposite form the defaults, FASTLINK
would behave incorrectly.
Thanks to Martin Farrall (Wellcome Centre), Victoria Haghihi (Columbia), and Ken Morgan (McGill) who reported different instances of the problem. Unfortunately, it took me 3 bug reports to grasp that the problem occurred in the initial translation and not in the modifications that converted LINKAGE to FASTLINK.
15. More dynamic memory allocation. Removed the need for the constants maxprobclass, maxclasssize, and maxneed. The constant maxhap remains. Adjusted the diagnostics (introduced in FASTLINK 2.1) and README.constants accordingly.
Also, most of the constants can now be set at compile time, by just editing the Makefile. See README.constants for an explanation.
16. Diagnostics for more constants.
FASTLINK is distributed with maxloop set to 2 and maxchild set to 16. These may need to be raised for some inputs. There is little to be gained from lowering them.
17. More checkpointing. Added checkpointing to LINKMAP and MLINK. Now you can recover completely from a crash in those programs. If you have not used the checkpointing facility before (e.g, because you only use LINKMAP and MLINK) you are STRONGLY encouraged to read the file README.checkpoint. We have tried to make the checkpointing and crash-recovery process as transparent as possible, but there are some unavoidable subtleties of which the user ought to be aware.
18. Minor Makefile changes. For the clean target added the flag -f, so that user is *not* asked before any files are deleted (suggested by Shriram Krishnamurthi). Added targets installfast and installslow, which make all 4 programs at once in either "fast" or "slow" versions (suggested by Kimmo Kallio). Fixed problems with target unknown (suggested by Bob Cottingham).
19. Portability information. The file README.portability has information about running FASTLINK on the following operating systems: SunOS, Solaris, Ultrix, OSF/1, AIX, IRIX, Linux, VMS, and DOS. Until shown otherwise, I am very naively assuming that version number of the operating system does not matter. The good news is that FASTLINK is quite portable. The bad news is that:
Special thanks to Ramana Idury for figuring out how to run FASTLINK on DOS.
Thanks to Alan Cox, David Featherstone, Kimmo Kallio, Shriram Krishnamurthi Joe Terwilliger, Ellen Wijsman, and Xiaoli Xie for sending me portability information and trying things out on various systems.
20. Code clean-up. The four code files *modified.c have been significantly cleaned up and more comments have been added in them.
21. Better UNKNOWN.
The C version of UNKNOWN is distributed strictly as a courtesy to FASTLINK users who want to avoid the need for a PASCAL compiler. In particular, some of the algorithmic improvements in FASTLINK are applicable to UNKNOWN, but have not been implemented in UNKNOWN.
22. Added -DDOS flag such that adding -DDOS to CFLAGS in the Makefile eliminates checkpointing. Used this to produce DOS versions. See README.DOS for more details.
23. Made some changes in filename conventions and input/output format commands, so that FASTLINK could be ported to VAX/VMS using the VAX C compiler. Previously FASTLINK would run on VAX/VMS only with the DEC C compiler. See README.VAX for more details.
24. Fixed a bug caused by the fact that the bug fix reported above in item 3 was not done exactly correctly. In particular, if one was using ILINK or LODSCORE to estimate allele frequencies and the frequency of the highest numbered allele was very small, the incorrect values for all frequencies might get printed out. Thanks to Gerard Tromp (Thomas Jefferson U.) for the bug report.
25. Added README.djgpp to explain how to install the djgpp compiler for DOS and how to compile FASTLINK with djgpp. Users may find it preferable to compile FASTLINK for DOS themselves rather than use the distributed executables. One reason is that the distributed executables are the "slow" version, and it is possible to compile the "fast" version for many runs.
26. Fixed a performance bug in checkpointing. The fix increases the chances that if the crash occurred as the checkpoint file was being written, the presence of the file will be detected. This bug caused only wasted recomputation, not incorrect results. Thanks to Margaret Gelder Ehm (Rice) for the bug report.
27.
28. Version 2.3P includes a new auxiliary program called OFM (Optimize for Maxhap) which computes the optimal value of maxhap for any given run and recompiles automatically. Refer to README.ofm for details.
29. Version 2.3P has a much more robust Makefile for use on UNIX and DOS. Added README.makefile to describe how the new Makefile is organized.
30. The major change in version 2.3P is that ILINK, LINKMAP, and MLINK can now be run on parallel computers for autosomal data. This explains the P is the new version number 2.3P. The parallel code can run either on shared-memory UNIX machines or on networks of UNIX workstations. See README.parallel , README.p4, and README.TreadMarks for more details.
31. In response to significant clamoring from users, we removed initial `.'s from the filenames of files used in checkpointing. See README.checkpoint for details.
32. Improved the UNKNOWN preprocessor program so that it pinpoints which nuclear family has the error when Mandelian rules are violated in the pedigree file. See README.unknown for details. Thanks to Carol Haynes (Duke) for the suggestion.
33. We modified LINKMAP to avoid some recomputation when multiple LINKMAP runs are done from the same lcp-produced script. This works only on unix for the moment. To turn off this feature, set MULTI_LINKMAP to 0 in lidefs.h. This feature is already de-activated for VMS and DOS. Thanks to Patricia Kramer (Oregon Health Sciences Institute) for this suggestion.
34. We have included a new diagnostic to report when the locus file lists more alleles for a locus than necessitated by the pedigree file. See README.allele for details.
35. Added README.time , a short essay on estimating the running time of sequential FASTLINK runs. Thanks to Frank Visser at the HGMP Centre in Hinxton, U.K. for suggesting this.
36. Added -i option (for info) for ILINK, MLINK, LINKMAP, and LODSCORE that summarizes how the various compilation options/variables are set for a given executable. For example, if you run:
linkmap -i
you get a description of how the program is configured, but nothing
interesting is computed. Flagless runs now also print out "(slow)"
with the version number if the given executable is a "slow" version.
Thanks to Tara Cox Matise at Columbia University for the suggestion.
38. Fixed several bugs regarding the printing of values in stream.out when running the parallel version of LINKMAP or MLINK. Thanks to Franz Rueschendorf in berlin and Lucien Bachner in Paris for bug reports.
39. Fixed a bug in ILINK and LODSCORE that would cause completely nonsensical estimates of theta and/or gene frequencies. The problem was due to an array being to small, and would be likely to arise only when using ILINK to estimate gene frequencies. Thanks to Reynir Arngrimsson in Glasgow.
40. Fixed a bug that occurred in some pedigrees that have both a loop and and multiple marriage. thanks to Rita Kruse in Bonn for reporting the bug.
41. Corrected a bug in the declarations for the checkrisk routine that that caused the program to crash when doing a risk calculation. Thanks to Lucien Bachner in Paris for the bug report.
42. Implemented faster algorithms for handling looped pedigrees. See paper5.ps and the updated versions of unknown.ps and loops.ps for details. As a result, it is now obligatory to use the UNKNOWN that comes with FASTLINK rather than LINKAGE's UNKNOWN.
43. Fixed incompatibility error checking in UNKNOWN, so that violations of Mendelian rules in looped pedigrees are reported. Thanks to Lucien Bachner in Paris for alerting me to the fact that all previous versions of UNKNOWN did no error checking for looped pedigrees.
44. Implemented allele amalgamation, which speeds up the computation when not all alleles at a locus are used in a pedigree. See paper5.ps and the updated unknown.ps for details. Caution: If you have a looped pedigree to which allele amalgamation applies, the printed values of -2 *ln(likelihood) may be different from previous versions of FASTLINK, but lod scores should be the same.
45. Wrote README.trouble, which is a troubleshooting guide for LINKAGE and FASTLINK. It explains almost all error messages.
46. Fixed the parallel implementation so that it could handle an arbitrary number of loops in the input pedigrees.
47. Eliminated the constant MAXWORKINGSET, which was used in the parallel code (see README.p4 and README.TreadMarks).
48. Corrected various problems that occurred when the number of alleles at numbered allele locus or binary factors locus was > 31. Now having more than 31 alleles at a binary factors locus is forbidden and caught by a proper error message. Now having more than 31 alleles at a numbered alleles locus works fine. Thanks to Jeff O'Connell in Pittsburgh and Joe Terwilliger in Oxford for bringing these problems to my attention and encouraging me to fix them.
49. Eliminated maxhap and maxfem. This makes ofm (cf. item 28) obsolete. All data structures whose sizes depend on the number of haplotypes are now allocated and freed dynamically during the run. Some consequences:
50. Improved the diagnostic (see item 37) to detect unbroken loops in UNKNOWN. Thanks to Ken Morgan (Montreal) for showing me a data set for which the previous implementation did not detect an unbroken loop that was there.
51. Added a diagnostic to detect if a person is assigned an allele that that is larger than the number of alleles specified for that locus. Thanks to David Stockton (Baylor College of Medicine) for the suggestion.
52. Fixed a bug in unknown.c, which was inferring genotypes incorrectly when a child was untyped and had both parents homozygous at a numbered allele locus. Thanks to Ken Morgan (Montreal) for the bug report.
53. Fixed a memory allocation error that occurred if ALLELE_SPEED was set to 0. Thanks to Ken Morgan (Montreal) for the bug report.
54. Added a diagnostic to warn about extremely low allele frequencies and fixed a problem inherited from LINKAGE UNKNOWN that occurred if the first allele frequency of a locus is 0.0. Thanks to Les Biesecker (Bethesda) and Jeff O'Connell (Pittsbuegh) for useful guidance on how to handle this situation --- having a frequency of 0.0 is legitimate mathematically, but probably a typo in practice.
55. Fixed a problem with checkpointing. We were implcitly assuming that the files recoveryFoundText and recoveryNotFoundText were accessible from the directory where the run was being done. Thanks to Robert Williams (Tempe, Arizona) for reporting this problem.
56. Fixed a problem with the mutation model. It would require a substantial rewrite of unknown.c to make the improvements for loops that are done in unknown.c (item 42) compatible with the mutation model. Furthermore, FASTLINK confers no speed advantage over LINKAGE when using the mutation model. Therefore, I recoded so that if you try to use the mutation model with LOOPSPEED set to 1, UNKNOWN will now complain. In my experience, most users who use the mutation model don't intend to do so; thus this warning may save some users lots of time and may prevent them from getting unintended results.
57. Fixed some memory management problems specific to LODSCORE. Thanks to Mihales Polymeropoulos (Bethesda) for the bug report.
58. Fixed a bug in the way UNKNOWN was reporting incompatibility errors that caused it to omit some errors when there was at least one. The bug could not arise if there were 0 errors. Thanks to Sayuko Kobes for the bug report.
59. Modified UNKNOWN to make it almost backwards compatible (except for mutation model). This way the new UNKNOWN can be safely used with LINKAGE, FASTSLINK, GenoCheck, etc. Thanks to Ramana Idury for a suggestion on the easiest way to achieve backwards compatibility.
60. Significantly reduced the shared-memory usage in parallel FASTLINK for some runs of LINKMAP and MLINK. Changed the memory printing convention so that if you use -m with k processors, you find out how much memory would be needed for every number less than or equal to k. See README.p4 (shared-memory) or README.TreadMarks (network) depending on which library you use to make parallel FASTLINK. Thanks to John Powell for pointing out a memory-usage anomaly that led to improving the memory usage.
61. Fixed a bug in comlike.c that could arise on looped pedigrees. If you hit the bug, you would get a crash. Thanks to Tara Cox Matise (Columbia) for reporting the bug.
62. Fixed 3 problems with unknown.c. One was a bug that would unjustifiably cause UNKNOWN to complain about incompatibilities on a few sex-linked pedigrees. The second was a missing diagnostic for the disequilibrium model. The third was a compilation problem on SunOS using cc. Thanks to Ken Morgan and Lucien Bachner for reporting the problems.
63. Fixed 3 more problems with unknown.c. The first was an inefficiency on some pedigrees with many loops. The second was a problem with the backwards compatibility of unknown.c (see item 59) for sexlinked pedigrees. The third was a bug in unknown.c for some looped pedigrees that would cause a 0 likelihood and -infinity lodscore to be reported. Thanks to David Stockton and Suzanne Leal for reporting the problems.
64. Fixed a bug on some sex-linked looped pedigrees. The file that changed is comlike.c. The bug wuld show up with the message: Error in translate_loop_vector. Thanks to Carla Bock for reporting the problem.
65. Fixed a bug on some pedigrees with many loops, for which unknown would complain unjustifiably about inconsistent genotypes. The corrected code file is unknown.c. Thanks to David Stockton and Ken Morgan for reporting the bug.
66. Fixed a bug on looped pedigrees with multiple marriages for which the code would produce wrong results if the proband is a descendant of a loop breaker. This should occur only in risk calculations with MLINK. The corrected code file is comlike.c. Thanks to Tom Dyer for the bug report.