Jurg Ott / 20 Apr 2007
Rockefeller University, New York
ott@rockefeller.edu / ott@big.ac.cn
http://www.genemapping.cn
TLINKAGE programs for 2-locus traits
INTRODUCTION
The Tlinkage programs described here are an
extension of the general LINKAGE programs
for genetic linkage analysis. The extension consists of allowing for a
disease phenotype to be under the control of two loci. The current
version (20 Apr 2007) corrected a bug that had an effect only with
relatively large numbers of alleles at the marker loci.
The two postulated disease loci are typically unlinked (on two
different chromosomes), each with two alleles (one normal, one being
the disease allele), although there is no such restriction in this
implementation. Below, the two disease loci are implemented as "null
loci". Each of the disease loci may be linked with a marker or map of
markers. Typically, two recombination fractions will be estimated, that
between disease locus 1 and marker 1, and that between disease locus 2
and marker 2.
If each of the two disease loci has two alleles and, thus, three
genotypes, there is a total of 9 possible genotype patterns at the two
loci jointly. Which of these confer susceptibility (are associated with
positive penetrance) is often unknown but specific patterns of
interaction between the two disease loci have been described (Risch
1990). Below, technicalities of implementation of the TLINKAGE programs
are provided. The programs are written in Free Pascal and have been
compiled so
they don't check whether program constants are exceeded. In case of
problems check the first few lines of program output -- they will tell
you the maximum parameter values allowed for in the programs.
If research papers refer to these programs, the appropriate reference
is Lathrop and Ott (1990) (see below).
IMPLEMENTATION
---------------------------
Program name Corresponds to
---------------------------
TUNK UNKNOWN
TMLINK MLINK
TLINKM LINKMAP
TILINK ILINK
---------------------------
The TLINKAGE programs implement a new locus type #4, called the null
type because it may have no associated phenotype in the pedigree file.
Each null locus is associated with the same phenotype. The number of
null loci corresponds to the number of loci jointly responsible for a
disease phenotype. That number is indicated as the last entry on the
first line of the datafile (after the program number). In this version
of TLINKAGE, you must have two null loci, or zero null loci. If two
null loci are defined, there are thus two consecutive locus
descriptions for these null loci in the datafile, but they correspond
to only one phenotype in the pedfile. That phenotype must be an
affection status phenotype, and it may be at any position among the
phenotypes (ie, the null loci are not restricted to be the first two
loci in the datafile). NOTE, however, the following restriction: If the two null loci
are loci 1 and 2, their order must be 1 2 and not 2 1. For
example, with loci 3 and 4 being markers, orders 1 3 2 4 and 4 1 3 2
are all right but orders 2 3 1 4 and 4 2 3 1 are not. An
analogous restriction on order applies when the null loci are numbered
other than 1 and 2.
In the datafile, the description of a null locus contains only the
number of alleles and gene frequencies, e.g.:
4
2 << null
locus, number of alleles
0.9
0.1 << allele frequencies
except that after the last null locus, a line specifying the number of
liability classes must be present, followed by one or more tables of
penetrances (as many tables as there are liability classes). Each such
table (see example below) has a single entry for each genotype
combination at the two loci and is arranged as shown in the example
below. The numbers to be entered in each table are only the
penetrances, in this case the 3 × 3 = 9 numbers in the body of
the table (do not enter any of the genotypes such as 1/2 or 2/2).
Repeat this table with different entries for different liability
classes. Remember that only the last null locus has an associated
phenotype in the pedfile. An example may look as follows:
-----------------------------
First Second null locus
null -----------------
locus 1/1
1/2 2/2
-----------------------------
1/1
0 0 0
1/2
0 0.8 0.8
2/2
0 0.8 1
-----------------------------
The current 2-locus version of LINKAGE allows analysis of autosomal
loci only. Please make sure you do not use these programs for X
chromosomal loci -- there is no check in the programs to ensure that
you are adhering to this restriction.
The different steps for running the TLINKAGE programs are analogous to
those for the LINKAGE programs:
- Prepare datafile using the PREPLINK program and modify it
according to the rules given above. Copy this file to a file called
DATAFILE.DAT
- Prepare pedigree input file
- Run this pedigree file through the MAKEPED program (the file,
TEST.PED, mentioned below is an example of a file resulting from
MAKEPED). Copy this file to a file called PEDFILE.DAT
- Run the TUNK program
- Run the TMLINK, TLINKM, or TILINK program
Instead of the steps indicated above, after preparing input files you
may invoke the RUN
command, which will copy the relevant infiles and invoke the analysis
programs. Input file names must be given on the command line. Example:
RUN TESTML.DAT TEST.PED TMLINK.
KNOWN BUGS
In releases prior to 13 Feb 1991, two bugs were present in these
programs: The programs did not work right when individuals with unknown
disease status were present and when more than one liability class was
used. Both bugs were fixed by Joseph Terwilliger.
SAMPLE INPUT FILES
The files TESTML.DAT, TESTLM.DAT, and TESTIL.DAT are sample datafiles
for TMLINK, TLINKM, and TILINK, respectively. A test pedigree file
(after processing by MAKEPED) is included as TEST.PED. To run the test
example for TMLINK, copy the test files to your current directory in
which you want to carry out the TLINKAGE runs. Then give the following
commands in the Windows command box:
copy
testml.dat datafile.dat
copy test.ped
pedfile.dat
tunk
tmlink
These commands may all be executed by one single command (RUN batch
file), in this case:
run testml.dat test.ped tmlink
In the regular LINKAGE programs, various utility programs such as LCP
and PREPLINK may be used for the creation of the input files. These
programs have not been adapted for the TLINKAGE programs.
The sample files mentioned above refer to the situation of two disease
loci (one trait) and a single marker locus that is linked with disease
locus 2. The other disease locus is assumed somewhere else in the
genome, unlinked with any tested marker locus. The input datafile for
the TMLINK program looks as follows:
3 0 0 5 2 <<< Number of
loci, risk locus, sexlinked (if 1), program (5=TMLINK), # null loci
0 0.0 0.0 0 << Mut
Locus, Mut Rates (male, female), Haplotype Frequencies (if 1)
1 2 3
4 2 <<-
null locus, number of alleles
0.97000 0.03000
4 2 <<-
null locus, number of alleles
0.99000 0.01000
1
<<- number of liability classes
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 1.000
3 4 << Allele
numbers, Number of alleles
0.25000 0.25000 0.25000
0.25000
0 0 << Sex
Difference, Interference (If 1 or 2)
0.50000 0.00000 <<
Recombination Values
2 0.10000 0.40000 <<
Rec. varied, Increment, Finishing value
On line 1, the program code (5 for TMLINK) is not used by the program
(it was used in previous versions of LINKAGE). So, the user must be
careful that the structure of the datafile is appropriate for the
program he or she is using. The input pedfile (after processing by the
Makeped program!) corresponding to the above datafile looks as shown
below. It refers to two parents, one unaffected, the other affected,
and two affected children.
1 1 0 0 3 0 0 1 1 1 1
2 Ped: 1 Per: 1
1 2 0 0 3 0 0 2 0 2 3
3 Ped: 1 Per: 2
1 3 1 2 0 4 4 2 0 2 1
3 Ped: 1 Per: 3
1 4 1 2 0 0 0 2 0 2 1
3 Ped: 1 Per: 4
In the file holding the pedigree data, the phenotypes are listed in the
order "disease - marker 1 - marker 2" as this is the order in which
these loci are given in the datafile above (input order).
If the two disease loci are taken to be linked with a marker locus
each, the corresponding datafile (for TMLINK) may look as follows,
where locus order is "marker 1 - disease 1 - disease 2 - marker 2"
(chromosome order):
4 0 0 5 2 <<< Number of
loci, risk locus, sexlinked (if 1), program code, # null loci
0 0.0 0.0 0 << Mut
Locus, Mut Rates (male, female), Haplotype Frequencies (if 1)
3 1 2 4
4 2 <<-
null locus, number of alleles
0.97000 0.03000
4 2 <<-
null locus, number of alleles
0.99000 0.01000
1
<<- number of liability classes
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 1.000
3 4 << Allele
numbers, Number of alleles (marker 1)
0.25000 0.25000 0.25000
0.25000
3 3 << Allele
numbers, Number of alleles (marker 2)
0.25000 0.25000 0.50000
0 0 << Sex
Difference, Interference (If 1 or 2)
0.0001 0.50 0.0 <<
Recombination Values
1 0.10 0.45 << Rec.
varied, Increment, Finishing value
LITERATURE
Lathrop GM, Ott J (1990) Analysis of complex diseases under oligogenic
models and intrafamilial heterogeneity by the LINKAGE programs. Am J Hum Genet 47, A188 (abstr)
Risch N (1990) Linkage strategies for genetically complex traits. I.
Multilocus models. Am J Hum Genet
46, 222-228
Schork NJ, Boehnke M, Terwilliger JD, Ott J (1993) Two trait locus
linkage analysis: a powerful strategy for mapping complex genetic
traits. Am J Hum Genet 53,
1127-1136