Version 4.1, April 16, 1990
Lynn M. Ploughman and Michael Boehnke
Department of Biostatistics, School of Public Health
University of Michigan, Ann Arbor, Michigan 48109
Phone: (313) 936-1001, FAX: (313) 763-2215, Bitnet: USERLEQ1@UMICHUB
1. Introduction |
2. Definitions |
3. Assumptions of the Power Calculation |
4. Options |
5. Outline of the Power Calculation |
6. Input for SIMLINK |
7. Compiling and Running SIMLINK |
8. Output from SIMLINK |
9. Four Sample Problems |
10. Array Sizes, File Management, and Other Practical Hints |
11. Error Conditions |
12.References
This document describes a computer program to estimate the probability, or power, of detecting linkage given family history information on a set of identified pedigrees. It is assumed that the pedigrees are of known structure and that some data may be available for the genetic trait that is to be mapped. The analysis described here can be applied to autosomal or X-linked traits determined by a single major locus. The trait may be dichotomous with complete or reduced penetrance, or may be quantitative. This power calculation is most usefully undertaken after family history data are gathered, but prior to examination and testing of pedigree members to obtain marker information. The result of this power calculation is an objective answer to the question: Will my families be sufficient to demonstrate linkage? The theoretical basis for this program is given by Ploughman and Boehnke (1989) and Boehnke (1986).
The program SIMLINK (LODSTAT is now incorporated as part of SIMLINK) required for this power calculation has three major components:
To estimate the power of a proposed linkage study, multiple replicates of each pedigree for each of several true recombination fractions or map distances between the trait and marker loci are simulated. After a replicate pedigree has been simulated for each pedigree type and each true recombination fraction or map distance, MENDEL calculates lod or location scores. The resulting scores are used to estimate the maximum lod/location score for each pedigree and for the set of pedigrees and to update the linkage information statistics. Once this process has been completed for the desired number of replicates, estimates of the linkage information provided by the pedigrees, including expected maximum lod/location scores and the probabilities of maximum lod/location scores greater than particular constants, are calculated and output to a series of tables. The probability of a maximum lod/location score greater than 3.0 gives the probability that the pedigree or set of pedigrees will be sufficient to demonstrate linkage.
SIMLINK version 4.1 is provided on this diskette. A file editor will be required to create the input files. If you will be using a microcomputer, it should have a hard disk and a math coprocessor if more than a few replicates of each pedigree are to be simulated. A FORTRAN compiler will also be necessary to allow changes in array dimensions.
We thank Kenneth Lange and Daniel Weeks for their work in developing MENDEL and for generously allowing us to incorporate portions of it into SIMLINK. Any problems that arise through the use of the modified version of MENDEL as a component of SIMLINK are the responsibilities of Ploughman and Boehnke, and questions should be directed to us.
Several terms are used in this document that are of key importance. These include:
3. Assumptions of the Power Calculation
This power calculation for a linkage study assumes:
The power calculation outlined here can be carried out in several different ways depending on the trait of interest and the interests and preferences of the investigator. Options available include:
5. Outline of the Power Calculation
The power calculation is a four step process, involving
SIMLINK creates pedigree files appropriate for MENDEL containing a single replicate of each pedigree type. In each replicate pedigree, members with known trait phenotype are assigned their correct trait phenotype. Pedigree members of currently unknown trait phenotype may be assigned a trait phenotype if desired; marker phenotypes can also be simulated and assigned. When simulating one marker locus, one marker phenotype will be listed for each true recombination fraction under which pedigrees were simulated; when simulating two flanking marker loci, two marker phenotypes, one per locus, will be listed for each pair of true recombination fractions under which pedigrees were simulated.
These information criteria may be used to estimate:
In addition, SIMLINK will as an option calculate the expected maximum lod score for each pedigree conditional on the heterozygosity/homozygosity status of each pedigree member. This provides a means of identifying pedigree member(s) whose marker status has a strong impact on the linkage information provided by the pedigree.
Three input files are required:
100 1 1 1 4 1 1 0 1.00
0.00 0.05 0.10 0.50
0.00 60.0 0.00 0.00
0.00 60.0 0.80 0.80
0.00 60.0 0.80 0.80
0.00 60.0 0.00 0.00
0.00 60.0 0.80 0.80
0.00 60.0 0.80 0.80
M F
TRAIT
LOCUS.DAT
PEDIG.DAT
31171 2413 19771
The following records in the given order and with variables and formats as described below are required in the control file (see Examples):
Note: This record and its format have been substantially altered since version 4.0. The definition of NTHETA has also been changed to include free recombination.
Here, alleles 1 and 2 correspond to the first and second trait alleles entered in the locus file, respectively.
For a dichotomous trait with a piecewise linear penetrance function (PENOPT=1):
Note: If a constant penetrance of 80% is desired, independent of age, a line with the values 0. 60. .80 .80 could be entered.
For a dichotomous trait with a cumulative normal penetrance function (PENOPT=2):
If dealing with a quantitative trait due to a mixture of normal distributions (PENOPT=3):
Note: The control file should end with an end-of-file symbol.
TRAIT AUTOSOME 2 3 d .99 D .01 1. 1 d/d 2. 2 d/d D/d 3. 1 D/d MARKER1 AUTOSOME 2 3 1 .50 2 .50 11 1 1/1 12 1 1/2 22 1 2/2 ABO AUTOSOME 3 4 A .26 B .06 O .68 A 2 A/A A/O B 2 B/B B/O AB 1 A/B O 1 O/O
The trait locus has autosomal dominant inheritance with reduced penetrance; the specific penetrance functions are described in the control file. Because the D allele is relatively rare, the D/D genotype is assumed impossible, and unaffected spouses in the pedigree file (see below) will be assumed not at risk (phenotype 1.). While these assumptions are not exactly true, they are reasonably accurate, and they result in a much simplified power calculation. We strongly recommend the use of such assumptions whenever possible. It is important to remember that this is a power calculation; approximate answers should be quite satisfactory. Note: excluding either homozygous genotype is not appropriate for an X-linked trait, since hemizygous males are assumed by MENDEL to be homozygous for their allele.
The first marker in the locus file is a two allele codominant marker with equal allele frequencies (note, allele names can be characters, including numbers). Given no prior interest in a particular marker, we generally use such a codominant marker as a compromise along the broad continuum between infinitely polymorphic "magic markers" at one extreme and two allele polymorphisms with one rare allele at the other extreme. The second marker is the ABO locus, and demonstrates how dominance relationships are dealt with when all genotypes are allowed for.
Inspection of this example shows that data on the loci are provided one locus at a time with the following records (also see Examples and Lange et al., 1988):
Note: Allele frequencies should sum to 1.0.
For each trait phenotype, enter record 3 below once and record 4 below once for each trait genotype that corresponds to the particular trait phenotype.
For dichotomous traits, three trait phenotypes are possible: 1.=normal and not at risk of becoming affected; 2.=normal and at risk of becoming affected; 3.=affected. Using the not at risk phenotype 1. when possible (for example, for spouses who marry into the pedigree for a relatively rare trait) can result in substantial computational savings since it will usually correspond to fewer possible trait genotypes than the at risk phenotype 2. .
For quantitative traits, by convention, zero trait phenotypes are possible.
Note: The dichotomous trait phenotypes must be 1., 2., or 3. in that order, and the trailing decimal points are required.
Note: For an X-linked trait, no special symbols are required for males. If a listed phenotype is appropriate for both females and males, only the associated homozygous genotypes will be assigned to a male with the phenotype. Internally, the program identifies hemizygous genotypes with the corresponding homozygous genotypes.
Data on the marker loci are provided one locus at a time with the following records 5-8 required for each marker locus.
Note: Lod/location score calculation time can increase rapidly as a function of the number of marker alleles. Given more alleles, attendant array sizes may also become too large, particularly on microcomputers.
For each phenotype for the current marker, enter record 7 below once and record 8 below once for each marker genotype that corresponds to the particular marker phenotype.
Note: For an X-linked trait, no special symbols are required for males. If a listed phenotype is appropriate for both females and males, only the associated homozygous genotypes will be assigned to a male with the phenotype. Internally, the program identifies hemizygous genotypes with the corresponding homozygous genotypes.
(I3,1X,A8)
(3(A3,1X),2A1,A2,T15,A2,A3,A4)
10 FAMILY1
1 M 3. 1. 80.
2 F 1. 1. 70.
3 1 2 F 3. 1. 80.
4 1 2 M 1. 1. 80.
5 8 9 F 3. 1. 80.
6 4 5 M 1. 1. 80.
7 4 5 M 1. 1. 85.
8 M 3. 1. 80.
9 F 1. 1. 75.
10 8 9 F 3. 1. 50.
6 FAMILY2
1 5 6 M 3. 1. 80.
2 F 1. 1. 70.
3 1 2 F 3. 1. 80.
4 1 2 M 3. 1. 80.
5 M 3. 1. 80.
6 F 1. 1. 80.
In the pedigree file, two format statements are followed by information on each pedigree, one pedigree at a time. Pedigree information includes a pedigree description record, followed by a record for each pedigree member. The following records in the given order and with variables and formats as described below are required in the pedigree file (see Examples and Lange et al., 1988):
7. Compiling and Running SIMLINK
There are several approaches to compiling and running SIMLINK. If you are using an IBM or compatible microcomputer running DOS, and if the default array dimensions with the shipped version (see below) happen to be appropriate for your problem, you can skip compiling and just do the following:
SIMLINK <CR>
where <CR> represents a carriage return. SIMLINK will say hello and ask you for the control input file name (see above) and the output file name (the name of the file to contain the results of the power calculation). SIMLINK will then simulate and compute for awhile.
The SIMLINK.EXE included on the diskette was compiled and linked using MICROSOFT FORTRAN version 5.00 with the commands:
These commands may be found in the file MS.BAT.
The advantages of MICROSOFT FORTRAN are that it is widely used, generates fast executable code, and with academic discount is inexpensive. However, for DOS-based microcomputers, I much prefer the Lahey F77L-EM/32 FORTRAN. This compiler generates executable code that is nearly as fast as that generated by MICROSOFT FORTRAN. In addition, F77L-EM/32 breaks the 640K barrier imposed by DOS. This makes it possible to carry out essentially any linkage power calculation on a microcomputer, given sufficient time. Using MICROSOFT FORTRAN, many linkage power calculations simply cannot be carried out due to lack of array space.
Recompiling with MICROSOFT FORTRAN or with some other FORTRAN compiler will require compiling each of the .FOR files on the floppy, and linking them together with SIMLINK.FOR as the main program. The three commands listed above will accomplish this for MICROSOFT FORTRAN. For F77L-EM/32 the corresponding commands are:
F77L3 SIMLINK /B/I/L F77L3 SIM1 /B/I/L F77L3 SIM2 /B/I/L F77L3 MEN1 /B/I/L F77L3 MEN2 /B/I/L UP L32 SIMLINK+SIM1+SIM2+MEN1+MEN2;
These commands may be found in the file F77L.BAT.
SIMLINK has also been successfully compiled and run on SUN workstations and DEC VAX minicomputers.
To run SIMLINK on a VAX, edit SIMLINK and change all occurrences of "WRITE(0" to "WRITE(*". If you are running VMS, use the G_FLOATING option rather than the F_FLOATING option when you compile SIMLINK.
For Examples 1-4 described below, SIMLINK required about 3:45, 17:50, 17:40, and 12:00 minutes:seconds elapsed time on my EVEREX 386 33MHz IBM compatible when compiled using MICROSOFT FORTRAN as above, and about 4:15, 20:30, 20:30, and 12:30 minutes:seconds elapsed time when compiled using F77L-EM/32 as above.
After running SIMLINK, two scratch files will exist: SIMDOC.SCR and SIMERR.SCR. SIMDOC.SCR contains each pedigree member's ID assigned by the user and the corresponding ID used by MENDEL. SIMERR.SCR contains the MENDEL batch file and error messages produced by MENDEL. These error messages will also be output to the screen. In addition, some error messages will be printed to the output file. The ID numbers given in these error messages correspond to those being used by MENDEL. To determine which individual(s) MENDEL is referring to, check the file SIMDOC.SCR. After a successful SIMLINK run, these two scratch files can be deleted.
The output from SIMLINK takes the form of up to seven tables, depending on the analyses carried out. Maximum lod/location scores for each replicate of each pedigree are estimated by quadratic interpolation over the lod/location score values calculated at the test recombination fractions/map distances.
Table 1 summarizes the information used in the simulation. This includes the trait locus name, the number of pedigree replicates simulated, true recombination fractions/map distances, and the test recombination fractions/map distances used.
Tables 2 and 3 give estimates of the mean maximum lod/location score and the probabilities of maximum lod/location scores greater than specified constants for each of the true recombination fractions/map distances. These estimates are given for each pedigree separately (listed under 1, 2, and so forth), for the pedigrees combined assuming genetic homogeneity (under SUMMED), for the pedigrees combined allowing for between-pedigree heterogeneity (under SUMMEDH) (optional), and for any one pedigree over all the available pedigrees (under ANY).
The values for a specific pedigree give estimates of the expected information provided by that pedigree. The values for the summed pedigrees estimate the expected information provided by pooling the data. Pooling the data in this way assumes that the trait is caused by a single genetic locus, that is, there is no heterogeneity. The values for the summed pedigrees allowing for heterogeneity estimates the expected information provided by pooling the data while explicitly allowing for heterogeneity. The values under ANY correspond to the information provided when an analysis is carried out under the assumption of genetic heterogeneity, and information from different pedigrees is not pooled, but the trait is actually homogeneous.
This table lists the estimated mean maximum lod/location score, its standard error, and the maximum maximum-lod/location-score among all replicates for each pedigree, for the summed pedigrees assuming homogeneity, for the summed pedigrees allowing for between-pedigree heterogeneity (optional), and for any of the pedigrees. These estimates are reported for each of the true recombination fractions/map distances.
Note: Since the maximum of the sum is usually less than the sum of the maxima, the expected maximum summed lod/location score (for all pedigrees combined) will usually be less than the sum of the expected maximum lod/location scores for the individual pedigrees.
This table lists the estimates and standard errors of probabilities of maximum lod/location scores greater than 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0 for each pedigree, for the summed pedigrees assuming homogeneity, for the summed pedigrees allowing for heterogeneity (optional), and for any of the pedigrees. These values are reported for each of the true recombination fractions/map distances. For linked loci, estimates of the probabilities of maximum lod/location scores greater than 3.0 give estimates of the power of a proposed linkage study based on the corresponding data and the assumption of a linked marker or a pair of flanking markers at the given recombination fraction/map distance. For unlinked loci, these same estimates give estimates of the probability of incorrectly inferring linkage to an unlinked marker or pair of markers. In statistical terms, this estimates the probability "a" of making a type I error for a single analysis. Since many markers will often be considered, the overall probability of making a type I error is greater. Assuming that the linkage calculations for the different marker (pairs) are independent, the overall probability of making a type I error becomes 1-(1-a)**n, where n is the number of marker (pairs) and "**" represents exponentiation.
Greater Than Specified Constants, Averaged Over the
Interval Between the Two Marker Loci.
This table lists estimates of the average probability, when the trait locus is located somewhere between the two marker loci, of a maximum location score greater than constants 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0 for each pedigree, for the summed pedigrees assuming homogeneity, for the summed pedigrees allowing for heterogeneity, and for any of the pedigrees. Table 4. is omitted when simulating only one marker locus or if only a single location for the disease locus was chosen in the control file (see above). See Boehnke (1986) for a method using two-point lod scores to calculate a lower bound on the information provided by flanking markers and location scores.
Tables 5 and 6 provide estimates of the expected lod/location score and probability of a lod/location score greater than specified constants when the marker (pair) is unlinked. These tables differ from tables 2 and 3 by reporting values for each test recombination fraction/map distance, rather than maximizing over all test recombination fractions/map distances. Tables 5 and 6 can be used to estimate the distance to each side of an unlinked marker (pair) that is likely to be excluded using the available pedigrees. Tables 5 and 6 are included only if free recombination is simulated (that is, IFREE=1).
For each test recombination fraction/map distance, this table gives the estimate of the mean lod/location score, its standard error, and the sample maximum and minimum lod/location scores for each pedigree and for the summed pedigrees assuming homogeneity. In addition, an estimate of the test recombination fraction/map distance at which the mean lod/location score equals -2.0 is printed. This estimate is based on quadratic interpolation of the lod/location score. This recombination fraction/map distance gives an estimate of the expected exclusion distance when testing for linkage to an unlinked marker (pair). If interpolation is not possible, asterisks are printed.
For each test recombination fraction/map distance, estimates and standard errors for the probabilities of lod/location scores greater than -2.0, -1.5, -1.0, ... , 2.5, and 3.0 are given. For each test recombination fraction/map distance, one minus the probability of a lod/location score greater than -2.0 gives an estimate of the probability that linkage will be excluded for at least that distance from an unlinked marker (pair).
Input files for these examples are EXAMPLE*.CON, EXAMPLE*.LOC, and EXAMPLE*.PED; output files are EXAMPLE*.OUT (*=1,2,3,4). These files are all included on the diskette. Before using SIMLINK for your own data, we strongly recommend running the test problems to verify that you are obtaining the same results. The example input files should be helpful when you go to prepare input files for your own analyses.
Each of the eight pedigrees in this example is identical to that described by Ploughman and Boehnke (1989). Eight copies are used to achieve a moderate-sized power estimate for demonstration purposes.
Pedigrees 1 through 8 are segregating an autosomal dominant trait with complete penetrance by age 40. Three pedigree members, numbered 4, 6, and 7, in each of the pedigrees, are unaffected, at risk, and below the age of 40. The penetrance for these pedigree members is described by a piecewise linear function (PENOPT=1) which increases from 0 at age 0 to 1.0 at age 40 for trait genotypes DD and Dd, and is 0 at all ages for trait genotype dd. The remaining pedigree members are either affected or unaffected and assumed not to be at risk. The ages listed for these pedigree members are not needed by the penetrance function, and, hence, need not be correct (see pedigree file).
Only 20 replicates are simulated in this example, so that it can be used to quickly check that the program is producing the same results as are given in EXAMPLE1.OUT.
Control file: EXAMPLE1.CON
Column numbers are provided for easy reference; they are not part of the input file.
1 2 3 4 5 6
1234567890123456789012345678901234567890123456789012345678901234
20 1 1 1 4 1 0 0
0.00 0.10 0.20 0.50 2. True rec. frac.
0.0 40.0 0.0 1.0 3. for males, DD
0.0 40.0 0.0 1.0 for males, Dd
0.0 40.0 0.0 0.0 for males, dd
0.0 40.0 0.0 1.0 for females, DD
0.0 40.0 0.0 1.0 for females, Dd
0.0 40.0 0.0 0.0 for females, dd
M F 4. male and female symbols
AUTODOM 5. trait locus name
EXAMPLE1.LOC 6. locus file name
EXAMPLE1.PED 7. pedigree file name
3791 3271 313 8. seeds for random number generator
Locus file: EXAMPLE1.LOC
Column numbers are provided for easy reference; they are not part of the input file.
1 2
12345678901234567890123456789 Comments:
AUTODOM AUTOSOME 2 3 1. Trait locus information
D .01 2. Trait allele information
d .99
1. 1 3. Trait phenotype information
d/d 4. Pheno/geno correspondence
2. 2 3. Trait phenotype information
D/d 4. Pheno/geno correspondence
d/d 4. Pheno/geno correspondence
3. 1 3. Trait phenotype information
D/d 4. Pheno/geno correspondence
MARKER1 AUTOSOME 2 3 5. Marker locus information
A .50 6. Marker allele information
B .50
AA 1 7. Marker phenotype information
A/A 8. Pheno/geno correspondence
AB 1 7. Marker phenotype information
A/B 8. Pheno/geno correspondence
BB 1 7. Marker phenotype information
B/B 8. Pheno/geno correspondence
Pedigree file: EXAMPLE1.PED
Column numbers are provided for easy reference; they are not part of the input file.
1 2
12345678901234567890123456789 Comments:
(I3,1X,A8) 1. Pedigree record format
(3(A3,1X),2A1,A2,T15,A2,A3,A4) 2. Individual record format
10 FAMILY 1 3. Pedigree information
1 M 3. 1. 80. 4. Individual data
2 F 1. 1. 70.
3 1 2 F 3. 1. 80.
4 1 2 M 2. 1. 30.
5 8 9 F 3. 1. 80.
6 4 5 M 2. 1. 10.
7 4 5 M 2. 1. 5.
8 M 3. 1. 80.
9 F 1. 1. 75.
10 8 9 F 1. 1. 50.
10 FAMILY 2 3. Pedigree information
1 M 3. 1. 80. 4. Individual data
2 F 1. 1. 70.
3 1 2 F 3. 1. 80.
4 1 2 M 2. 1. 30.
5 8 9 F 3. 1. 80.
6 4 5 M 2. 1. 10.
7 4 5 M 2. 1. 5.
8 M 3. 1. 80.
9 F 1. 1. 75.
10 8 9 F 1. 1. 50.
.
10 FAMILY 8 3. Pedigree information
1 M 3. 1. 80. 4. Individual data
2 F 1. 1. 70.
3 1 2 F 3. 1. 80.
4 1 2 M 2. 1. 30.
5 8 9 F 3. 1. 80.
6 4 5 M 2. 1. 10.
7 4 5 M 2. 1. 5.
8 M 3. 1. 80.
9 F 1. 1. 75.
10 8 9 F 1. 1. 50.
Pedigrees 1 and 2 are segregating a heterogeneous autosomal dominant trait with complete penetrance by age 40. In pedigree 1, individuals 32, 35, 39, and 40 are unaffected, at risk, and below the age of 40; likewise, in pedigree 2, individuals 30, 33, 36, and 38 are unaffected, at risk, and below the age of 40. The penetrance for these individuals is described by a cumulative normal function (PENOPT=2) with a mean age of 10.0, a standard deviation of 4.0, a minimum penetrance of 0.0, and a maximum penetrance of 1.0 for trait genotypes DD and Dd. The penetrance is 0.0 at all ages for trait genotype dd. The remaining pedigree members are either affected or unaffected and not at risk. The linked fraction of pedigrees is assumed to be .80. A related example is described by Boehnke (1986).
Control file: EXAMPLE2.CON
250 1 2 1 1 1 0 1 0.80
0.05 0.50 2. True rec. frac.
10.0 4.0 0.0 1.0 3. for males, DD
10.0 4.0 0.0 1.0 for males, Dd
0.0 4.0 0.0 0.0 for males, dd
10.0 4.0 0.0 1.0 for females, DD
10.0 4.0 0.0 1.0 for females, Dd
0.0 4.0 0.0 0.0 for females, dd
1 2 4. male and female symbols
AUTODOM 5. trait locus name
EXAMPLE2.LOC 6. locus file name
EXAMPLE2.PED 7. pedigree file name
3191 371 21713 8. seeds for random number generator
Locus file: EXAMPLE2.LOC
AUTODOM AUTOSOME 2 3 1. Trait locus information D .01 2. Trait allele information d .99 1. 1 3. Trait phenotype information d/d 4. Pheno/geno correspondence 2. 2 3. Trait phenotype information D/d 4. Pheno/geno correspondence d/d 4. Pheno/geno correspondence 3. 1 3. Trait phenotype information D/d 4. Pheno/geno correspondence MARKER1 AUTOSOME 2 3 5. Marker locus information A .50 6. Marker allele information B .50 AA 1 7. Marker phenotype information A/A 8. Pheno/geno correspondence AB 1 7. Marker phenotype information A/B 8. Pheno/geno correspondence BB 1 7. Marker phenotype information B/B 8. Pheno/geno correspondence
Pedigree file: EXAMPLE2.PED
(I2,1X,A8) 1. Pedigree record format (3(A3,1X),2A1,A3,T15,3A3) 2. Individual record format 40 FAMILY 1 3. Pedigree information 1 1 3. 0. 80. 4. Individual data 2 2 1. 0. 80. 3 2 1. 0. 80. 4 1 2 1 3. 0. 80. 5 1 2 1 3. 0. 80. 6 2 1. 1. 80. 7 2 1. 1. 80. 8 3 4 1 3. 1. 80. 9 3 4 2 1. 1. 80. 10 3 4 1 3. 1. 80. 11 2 1. 1. 80. 12 2 1. 1. 80. 13 5 6 1 3. 1. 80. 14 5 6 1 3. 1. 80. 15 2 1. 1. 80. 16 5 6 2 1. 1. 80. 17 5 6 2 3. 1. 80. 18 1 1. 1. 80. 19 5 6 1 1. 1. 80. 20 1 1. 1. 80. 21 7 8 2 3. 1. 80. 22 7 8 1 1. 1. 80. 23 7 8 1 1. 1. 80. 24 7 8 1 3. 1. 80. 25 10 11 1 1. 1. 80. 26 10 11 2 1. 1. 80. 27 10 11 1 3. 1. 80. 28 12 13 2 1. 1. 80. 29 12 13 2 3. 1. 80. 30 12 13 2 1. 1. 80. 31 14 15 2 1. 1. 80. 32 14 15 2 2. 1. 10. 33 14 15 1 3. 1. 80. 34 17 18 2 1. 1. 80. 35 17 18 2 2. 1. 5. 36 17 18 2 3. 1. 80. 37 17 18 1 1. 1. 80. 38 20 21 2 1. 1. 80. 39 20 21 2 2. 1. 12. 40 20 21 2 2. 1. 8. 38 FAMILY 2 3. Pedigree information 1 1 3. 0. 80. 4. Individual data 2 2 1. 0. 80. 3 1 1. 1. 80. 4 1 2 2 3. 0. 80. 5 1 2 2 3. 1. 80. 6 1 2 2 1. 1. 80. 7 1 1. 1. 80. 8 3 4 2 3. 1. 80. 9 3 4 2 1. 1. 80. 10 3 4 1 3. 1. 80. 11 2 1. 1. 80. 12 1 1. 1. 80. 13 7 8 2 3. 1. 80. 14 1 1. 1. 80. 15 7 8 2 3. 1. 80. 16 7 8 2 3. 1. 80. 17 1 1. 1. 80. 18 10 11 2 1. 1. 80. 19 10 11 1 3. 1. 80. 20 2 1. 1. 80. 21 12 13 1 1. 1. 80. 22 12 13 1 1. 1. 80. 23 14 15 2 1. 1. 80. 24 2 1. 1. 80. 25 16 17 1 3. 1. 80. 26 16 17 2 3. 1. 80. 27 1 1. 1. 80. 28 16 17 1 3. 1. 80. 29 16 17 1 3. 1. 80. 30 16 17 1 2. 1. 17. 31 19 20 1 3. 1. 80. 32 19 20 2 3. 1. 80. 33 19 20 1 2. 1. 13. 34 24 25 1 1. 1. 80. 35 24 25 1 3. 1. 80. 36 26 27 2 2. 1. 8. 37 26 27 1 1. 1. 80. 38 26 27 2 2. 1. 10.
The rare, X-linked recessive trait segregating in these pedigrees is Becker Muscular Dystrophy. The pedigrees BD28, BD78, and BD9 were taken from Brown et al. (1985) with some modification of ages. Although this trait has age-dependent penetrance, usually appearing in the 20s, since all unaffecteds in the line of descent of the trait are beyond the typical range of onset ages, assuming complete penetrance is reasonable for a power calculation and will save computation time. Therefore, the piecewise linear penetrance function used in the analysis has complete penetrance for individuals with trait genotype dd and 0.0 penetrance for individuals with trait genotype DD or Dd. Two flanking marker loci with a true map distance of 10 cM between them were used in the simulation.
Control file: EXAMPLE3.CON
250 2 1 1 1 1 1 0
0.10 1 2. True map dist., dist. option
0.0 40.0 1.0 1.0 3. for males, dd
0.0 40.0 0.0 0.0 for males, Dd
0.0 40.0 0.0 0.0 for males, DD
0.0 40.0 1.0 1.0 for females, dd
0.0 40.0 0.0 0.0 for females, Dd
0.0 40.0 0.0 0.0 for females, DD
M F 4. male and female symbols
XREC 5. trait locus name
EXAMPLE3.LOC 6. locus file name
EXAMPLE3.PED 7. pedigree file name
2791 3903 1313 8. seeds for random numbers
Locus file: EXAMPLE3.LOC
XREC X-LINKED 2 3 1. Trait locus information d .0001 2. Trait allele information D .9999 1. 2 3. Trait phenotype information D/D 4. Pheno/geno correspondence D/d 2. 3 3. Trait phenotype information D/D 4. Pheno/geno correspondence D/d d/d 3. 1 3. Trait phenotype information d/d 4. Pheno/geno correspondence MARKER1 X-LINKED 2 3 5. Marker locus information A .50 6. Marker allele information B .50 AA 1 7. Marker phenotype information A/A 8. Pheno/geno correspondence AB 1 7. Marker phenotype information A/B 8. Pheno/geno correspondence BB 1 7. Marker phenotype information B/B 8. Pheno/geno correspondence MARKER2 X-LINKED 2 3 5. Marker locus information Y .50 6. Marker allele information Z .50 YY 1 7. Marker phenotype information Y/Y 8. Pheno/geno correspondence YZ 1 7. Marker phenotype information Y/Z 8. Pheno/geno correspondence ZZ 1 7. Marker phenotype information Z/Z 8. Pheno/geno correspondence
Note: The genotypes DD and dd must be included in this X-linked example so that the male hemizygous genotypes will be allowed for by MENDEL.
Pedigree file: EXAMPLE3.PED
(I3,1X,A8) 1. Pedigree record format (3(A3,1X),2A1,A2,T15,A2,A3,A4) 2. Individual record format 10 BD28 3. Pedigree information 1 M 1. 0. 80. 4. Individual data 2 F 1. 0. 80. 3 M 1. 1. 80. 4 1 2 F 1. 1. 80. 5 1 2 M 3. 0. 80. 6 F 1. 1. 80. 7 1 2 M 3. 1. 80. 8 3 4 M 3. 1. 80. 9 5 6 M 1. 1. 80. 10 5 6 F 1. 1. 80. 7 BD78 3. Pedigree information 1 M 1. 1. 90. 4. Individual data 2 F 1. 1. 85. 3 M 1. 1. 65. 4 1 2 F 1. 1. 60. 5 1 2 M 3. 0. 60. 6 1 2 M 1. 1. 60. 7 3 4 M 3. 1. 33. 12 BD9 3. Pedigree information 1 M 1. 0. 90. 4. Individual data 2 F 1. 0. 90. 3 M 1. 1. 90. 4 1 2 F 1. 1. 90. 5 1 2 M 1. 1. 90. 6 3 4 M 3. 1. 62. 7 3 4 M 3. 1. 64. 8 3 4 M 3. 1. 66. 9 3 4 F 1. 1. 63. 10 M 1. 1. 66. 11 9 10 M 3. 1. 36. 12 9 10 M 3. 1. 40.
The large nuclear family in this example is segregating an autosomal major locus for a quantitative trait. The mean trait value for an individual with the DD or Dd trait genotype is 10.0 plus 0.10 times the age of the individual; the standard deviation is 1.0. The mean trait value for an individual with the dd trait genotype is 5.0 and is not a function of age; the standard deviation is also 1.0.
Control file: EXAMPLE4.CON
250 1 3 1 3 1 1 0
0.00 0.10 0.50 2. True rec. frac.
10.0 0.10 1.0 0.0 3. for males, DD
10.0 0.10 1.0 0.0 for males, Dd
5.0 0.0 1.0 0.0 for males, dd
10.0 0.10 1.0 0.0 for females, DD
10.0 0.10 1.0 0.0 for females, Dd
5.0 0.0 1.0 0.0 for females, dd
M F 4. male and female symbols
QUANT 5. trait locus name
EXAMPLE4.LOC 6. locus file name
EXAMPLE4.PED 7. pedigree file name
3191 371 21713 8. seeds for random number generator
Locus file: EXAMPLE4.LOC
QUANT AUTOSOME 2 0 1. Trait locus information D .01 2. Trait allele information d .99 MARKER1 AUTOSOME 2 3 5. Marker locus information A .50 6. Marker allele information B .50 AA 1 7. Marker phenotype information A/A 8. Pheno/geno correspondence AB 1 7. Marker phenotype information A/B 8. Pheno/geno correspondence BB 1 7. Marker phenotype information B/B 8. Pheno/geno correspondence
Pedigree file: EXAMPLE4.PED
(I2,1X,A8) 1. Pedigree record format (3(A3,1X),3A1,A4,A3,A4) 2. Individual record format 15 QUANT 3. Pedigree information 1 M 20. 1. 80. 4. Individual data 2 F 5. 1. 70. 3 1 2 M 19. 1. 55. 4 1 2 F 16. 1. 52. 5 1 2 M 16. 1. 50. 6 1 2 M 14. 1. 48. 7 1 2 M 15. 1. 46. 8 1 2 F 6. 1. 44. 9 1 2 M 4. 1. 41. 10 1 2 F 17. 1. 39. 11 1 2 F 16. 1. 36. 12 1 2 M 5. 1. 35. 13 1 2 F 12. 1. 33. 14 1 2 F 6. 1. 31. 15 1 2 M 5. 1. 29.
Note: A blank must be present in the first trait phenotype field for a quantitative trait.
10. Array Sizes,
File Management, and Other Practical Hints
The maximum sizes of the variables and arrays in SIMLINK are initially set according to the values of the following variables:
| Variable | Description | Initial Value |
|---|---|---|
| MAXALL | maximum number of marker alleles | 4 |
| MAXCON | maximum number of constants for comparing to lod/location scores | 9 |
| MAXGEN | maximum number of marker genotypes | 10 |
| MAXP | maximum number of people on whom a person's conditional probabilities can depend | 4 |
| MAXPED | maximum number of pedigrees | 20 |
| MAXPEO | maximum number of people per pedigree | 100 |
| MAXPHN | maximum number of marker phenotypes | 10 |
| MAXTH | maximum number of true recombination fractions/map distances | 8 |
| MAXTOT | maximum number of people in entire data set | 200 |
| MAXTST | maximum number of test recombination fractions/map distances | 8 |
| MXGLST | maximum size of GLIST array | 1200 |
| MXMG | maximum size of MARGEN array | 6400 |
| MXMP | maximum size of MKPHEN array | 3200 |
| MXTM | maximum size of the hetero/homozygos arrays | 1600 |
| MXPLST | maximum size of PLIST array | 800 |
| MXPROB | maximum size of CONDPR array | 16200 |
| MXTEMP | maximum size of TEMPPR array (maximum number of conditional probabilities per person) | 81 |
| LENC | maximum size of CARRAY array for MENDEL | 200 |
| LENI | maximum size of IARRAY array for MENDEL | 5000 |
| LENL | maximum size of LARRAY array for MENDEL | 100 |
| LENR | maximum size of RARRAY array for MENDEL | 5000 |
To modify these dimensions, as you will almost certainly need to do, modify the parameter statement in SIMLINK.FOR for the variable in question. This may be accomplished by using a file editor. Then recompile SIMLINK.FOR and link the .OBJ files.
Note: Many of the maximum sizes listed above are interrelated, so that if one is altered, others may need to be as well. The relationships are given below:
Note: MAXTST must be greater than or equal to the number of test recombination fractions/map distances (NTEST).
When SIMLINK stops without completing the desired analysis, error messages may be found (1) on the screen, (2) in the output file, or (3) in the file SIMERR.SCR. SIMDOC.SCR can be consulted to determine the correspondence between input IDs and MENDEL IDs.
The most frequent error encountered when using SIMLINK is insufficient array size for any of a large variety of arrays. This can be dealt with by editing SIMLINK.FOR, identifying the PARAMETER statement associated with the array dimension that is too small, recompiling SIMLINK.FOR, and linking the program. NOTE: On a microcomputer using MICROSOFT FORTRAN, it may not be possible to make all arrays sufficiently large because of the 640K limitation of DOS. In such cases, possible solutions include:
As you encounter other errors that are not clearly explained by the error message(s) provided, I would appreciate knowing about them so that I can add them to this documentation and/or add better error messages to the program.
Boehnke M (1986) Estimating the power of a proposed linkage study: a practical computer simulation approach. American Journal of Human Genetics 39:513-527.
Brown CS, Thomas NST, Sarfarazi M, Davies KE, Kunkel L, Pearson PL, Kingston HM, Shaw DJ, Harper PS (1985) Genetic linkage relationships of seven DNA probes with Duchenne and Becker muscular dystrophy. Human Genetics 71:62-74.
Haldane JBS (1919) The combination of linkage values and the calculation of distances between the loci of linked factors. Journal of Genetics 8:299-309.
Lange K, Boehnke M, Weeks D (1988) Documentation for MENDEL, Version 2.3, November, 1988.
Lange K, Weeks D, Boehnke M (1988) Programs for pedigree analysis: MENDEL, FISHER, and dGENE. Genetic Epidemiology 5:471-472.
Morton NE (1955) Sequential tests for the detection of linkage. American Journal of Human Genetics 7:277-318.
Ploughman LM, Boehnke M (1989) Estimating the power of a proposed linkage study for a complex genetic trait. American Journal of Human Genetics 44:543-551.
Risch N (1989) Linkage detection tests under heterogeneity. Genetic Epidemiology 6:473-480.
Smith CAB (1963) Testing for heterogeneity of recombination fraction values in human genetics. Annals of Human Genetics 27:175-182.
Wichman BA, Hill ID (1982) An efficient and portable pseudo-random number generator. Applied Statistics 31:188-192.
this document is converted to HTML by
Frank Visser <fvisser@hgmp.mrc.ac.uk>
and further modified by