Jurg Ott / 14 August 2006
ott@rockefeller.edu
http://www.genemapping.cn
LIPED program for linkage analysis
Contents
1. INTRODUCTION
2. WORKING WITH SEVERAL LOCI
3. PENETRANCES
4. NON-AUTOSOMAL LINKAGE
5. INPUT FILE
6. DIMENSIONS
7. COMPLEX PEDIGREES
8. MUTATION
9. QUANTITATIVE PHENOTYPES
10. AGE-DEPENDENT PENETRANCE
11. CALCULATION OF GENETIC RISKS
12. LIKELIHOOD AT A SINGLE LOCUS
13. HELPFUL HINTS
14. EXAMPLES
15. LITERATURE
1. INTRODUCTION
The LIPED program (for LIkelihoods
in PEDigrees) estimates the
recombination fraction by calculating pedigree likelihoods for various
assumed values of the recombination fraction. The algorithm is based on
Elston and Stewart (1971) with some extensions. Its first application
(to the large Alaska pedigree, Schrott et al. 1972) resulted in mild
evidence for linkage of familial hypercholesterolemia to the C3
polymorphism (Ott et al 1974), which was later confirmed by various
authors. This
disease locus (LDLR, previously FH and FHC) is now located on
chromosome 19p13.3. The
program contained one error (in the likelihood calculation for
quantitative traits), which was pointed out to me by Dr. Robert Elston.
This manual describes the PC version (June 1995) of the LIPED computer
program for genetic linkage analysis. Only two loci can be
handled at a time, for example, a disease locus and a marker locus.
Originally written in Fortran IV (Ott 1974), LIPED requires input in
fixed
format (numbers must be provided in a fixed number of spaces or
columns). The code is essentially as originally written, with some
additions such as proper treatment of age of onset data. This describes
an updated version (June 1995) of LIPED suitable for use on PCs. Only
minor modifications were made in the latest revision. The program has
been compiled with Microsoft Fortran PowerStation 4.0. for Windows.
KNOWN BUGS: For a pedigree consisting of a single individual, LIPED
does not calculate a likelihood. This "problem" may be avoided by
including two parents with unknown phenotypes. In practice, this bug is
irrelevant for linkage analyses.
Files included:
- LIPED.FOR -- Source code of LIPED program
- LIPED.EXE -- Executable code. Max. 12 alleles per locus, 21
phenotypes per locus.
- LIPED16.EXE -- older program version. Allows for 16 alleles
per locus.
- LOGNORM.EXE -- Program to convert means and standard deviations
from
normal to lognormal distributions and vice versa, and to compute
penetrances and age classes.
- LIPED.DAT-- Input file holding example data.
- EXi.DAT -- individual example input files, i = 1..5,
corresponding to examples in section 14.
- LIPED.OUT -- Output file resulting from running LIPED.DAT.
To initiate the program, type LIPED. It will assume that input is
furnished in the file, LIPED.DAT.
When LIPED is used for research, the appropriate literature reference
is Ott (1974) or Ott (1976) or Ott (1999).
2. WORKING WITH SEVERAL LOCI
Two kinds of loci
are distinguished in the LIPED program: main
locus
(internal number zero) and marker loci (numbered from 1 to NMARK). Lod
scores can be computed for any combination main locus vs. marker locus.
With input item 16 (see section 5, Input File, below), any one of the
marker loci can be declared to represent a new main locus so that lod
scores may also be computed among marker loci. If more than one marker
locus is present, the program creates two temporary disk files that
will be deleted on program termination. Note the following restriction:
with a single marker locus, any number of pedigrees may be analyzed in
a single run. However, with more than one marker locus, only a single
pedigree may be analyzed in a run. One way to overcome this restriction
is as follows. If several independent families are presented to the
program as one single pedigree, LIPED will recognize this and carry out
the proper calculations, the resulting lod score being the sum over the
individual families; however, individual lods for the families will not
be recognizable by the user. Analyzing several independent families as
a single large pedigree requires a substantial amount of memory. If an
error occurs (MNP or MLIST too small), you may have to analyze the
families in the usual manner as separate pedigrees.
Generally, to analyze several pedigrees with phenotypes at more than
two loci, one might proceed as follows. First, one decides on the two
loci to be analyzed. If one of these is the main locus, then the
comparison main locus vs. marker is identified on one line of input
item 9. Otherwise, all numbers in input item 9 are set equal to 0, and
a comparison among markers is defined on a new line of input item 9
that appears after a line containing 5000 which immediately follows the
pedigree data. Thus far, one has decided on the 2 loci to be compared.
Now, one must tell LIPED where on the line (in which columns) to read
the phenotypes of these loci, which is done with FORTRAN Format
expressions (see beginning of section 5) furnished in input item 4.
If you interrupt LIPED while it is still running and if more than one
marker locus has been defined, scratch files named "ZZ..." or "for..."
will
remain on
the disk. These files would be deleted when the program terminates
normally. You may simply delete them.
3. PENETRANCES
Penetrance is defined as the probability of occurrence of a
particular
phenotype given the presence of a certain genotype. Accordingly, with
respect to a disease, penetrance is the probability of being affected
given a certain genotype.
Penetrances are needed to describe the relation between genotypes and
phenotypes. In the following three simple and common cases, only full
penetrance (values 0 or 1 only) is assumed to occur. Assume a locus
with 2 alleles, T and t. When this is a disease locus, let T be the
dominant disease allele and consider the phenotypes AFF for affected
and NA for unaffected.
---------------------------------------------
Phenotypes
------------------------------------
Geno-
Dominant Recessive Codominant
type
disease
disease case
AFF
NA AFF
NA TT Tt tt
---------------------------------------------
T
T
1 0
1 0 1
0 0
T
t
1 0
0 1 0
1 0
t
t
0 1
0 1 0
0 1
---------------------------------------------
4. NON-AUTOSOMAL LINKAGE
To code for X-linked inheritance in LIPED, tables such as the one
above
are used to represent the relation between genotypes and phenotypes.
They apply directly for females. For males, for example, the genotype
T/T is interpreted as T/y (hemizygote), and all lines corresponding to
heterozygote genotypes are disregarded. Therefore, it is not necessary
to distinguish male and female phenotypes. For example, TT can serve as
a phenotype for either sex.
To analyze loci on the Y-chromosome, it is easiest to code Y-linkage as
a special case of autosomal linkage, but precautions must be taken. For
details, see Ott (1986); note that the methods described in that
reference apply only to full penetrance.
5. INPUT FILE
For most input quantities, their location (column numbers) on an
input
line is fixed and must strictly be adhered to. For some input
quantities, the user is flexible and can determine with so-called
Format expressions where on a line that quantity will be found by the
program. For each of the Format expressions, below, a recommendation is
given that will accommodate most situations occurring in practice. It
provides four spaces (columns) for each input quantity. A short
explanation of FORTRAN Formats is as follows.
An input quantity is read either with an A-Format (alphanumeric) or an
F-Format (floating point quantities) where it is determined by the
program which of the two forms much be used for each input quantity
(described in this section). For example, (A4) means that an input
quantity such as a phenotype symbol should occupy 4 spaces (columns).
If several input quantities must be read by the program, each requires
its own Format, for example, (A4, A4, A4) or, equivalently, (3A4). To
skip reading over a number of spaces, the X-Format is used. For
example, the Format expression (2A4, 8X, A4) means that the program
will read input quantity 1 in columns 1-4, quantity 2 in columns 5-8,
and quantity 3 in columns 17-20. For alphanumeric quantities (A
format), the position within the space provided is critical; preferably
the same number of spaces is always used for the same quantity and it
is right-justified within the allotted space.
The input file must consist of the following "lines", here being
numbered as input item 5.1, item 5.2, etc.
5.1. Problem description
Col.1-2
NMARK, number of marker loci in addition to the main locus. LIPED stops
when NMARK<1 is encountered.
Col. 4
= 0 (a value of 1 prints internal information not generally
interpretable)
Col. 5
= 0 usual setting
= 1 to prevent underflows as much as possible. This should be used only
when an underflow has occurred
since
underflows
are very unlikely with
double precision calculations.
Col. 6
= 0 for autosomal loci
= 1 for loci on the X-chromosome
Col. 7
= 0 if gene frequencies (rather than haplotype frequencies) will be read
= 1 if haplotype frequencies are to be read (input item 5.14b). Note
that
in
this case, dummy gene frequencies
must
still be provided (input item
5.11).
Col. 8-20 Mutation rate at main locus (see MUTATION below).
Col.21-80 Text
5.2. Format for locus symbols, allele symbols and phenotype
symbols
Col. 1-80
Format to read input item 5.10 (symbols for loci, alleles and
phenotypes). Only A-format is allowed; the maximum length is A4 for
locus and allele symbols, and A8 for phenotype symbols. EXAMPLE:
(20A4). The format for the phenotype symbols must correspond in length
to the one used to read the actual phenotypes of the pedigree data
(input item 5.14).
5.3. Format to read input item 5.12 (symbols for a pair of
alleles
and
values of penetrances)
Col. 1-80 Two A-formats and then F-formats. EXAMPLE:
(2A4,21F4.0)
5.4. Format for pedigree data
Col. 1-80
Format to read input item 5.5, 5.14 and 5.14a. A-format only. EXAMPLE:
(25A4). The first four items must not contain more than 4 characters
each whereas phenotypes can be up to 8 characters long, ie, may be read
with an A8 Format, for example.
5.5. Symbols, to be read with Format as on input item 5.4
At place of Provide
symbol
for
item no.
----------------------------------------------------------------
3
no
parent (e.g.: blank)
4
male sex (e.g.: m). The first different sex symbol encountered
in input item 14 will be considered the symbol for female sex.
5
unknown phenotype at main locus (e.g.: blank)
6,
etc. unknown phenotype at marker locus
1, etc.
----------------------------------------------------------------
5.6. Number of alleles per locus
Col. 1-2 Number of alleles at main locus (locus 0)
Col. 3-4 Number of alleles at marker locus 1, etc.
5.7. Number of phenotypes per locus
Col. 1-2 Number of phenotypes at main locus
Col 3-4 Number of phenotypes at marker locus 1, etc.
For loci of type KONT = 1, 2, or 3, the number of phenotypes is
predetermined and will be set by the program. Here you may simply use 1
for number of phenotypes when KONT > 0.
5.8. Locus type, indicated by the variable KONT
Col. 1-2 KONT for main locus
Col. 3-4 KONT for marker locus 1, etc., where he following values
for KONT apply:
- for a locus with discrete phenotypes but penetrances
possibly
other than 0 and 1. This is more general but runs
slower than KONT = 0.
- for a locus with discrete phenotypes and penetrances of 0 and
1 only.
- for a locus with quantitative phenotypes following a
conditional normal distribution (see section on
quantitative phenotypes).
- for a locus with age-dependent penetrances following a lognormal
distribution (see section on
age-dependant penetrance).
- for a locus with straight-line age-dependant penetrances.
5.9. Output options as indicated by the variable IAU(i)
Col. 2 IAU(1) for main locus versus marker locus 1
Col. 4 IAU(2) for main locus versus marker locus 2, etc., where
the values of IAU(i) have the following effect (below, rm and rf = male
and female recombination fractions, respectively):
- to do checks only and compute likelihood at rm = rf = 0.5
- when no computation is desired for main locus versus marker
locus i
- if the values of the recombination fractions are to be read
by input item 5.15 below. In this case, one set (input line) of values
rm
and rf after each pedigree, for each likelihood to be computed. Note
that after input item 5.16, additional lines of input of this type may
be
given to allow for comparisons among marker loci.
- to compute lods at rm = rf = 0, .001, .05, .10, .20, .30, .40
- to compute lods at rm = rf = 0, .001, .05, .10, .15, ...
- to compute lods at values of rm and rf shown below.
- to compute lods at values of rm and rf shown below
- to compute lods at values of rm and rf shown below
- to compute lods at values of rm and rf shown below
- to compute lods at values of rm and rf shown below
- to read recombination fraction values from input item 5.9a once
for all pedigrees and to sum the lod scores over pedigrees. Allowed
only with NMARK = 1 on input item 5.1, i.e., when no more than one
marker
locus is specified.
- to compute lods at rm = 0, .001, ... (as with option 2)
whereas rf = 2rm(1 – rm) [Ott, 1991,
equation (8.7), p. 175], that is,
the female map distance (Haldane) is assumed to be twice the male map
distance.
- to compute lods at rm = 0, ... (as with option 3) whereas rf
= 2rm(1 – rm)
Below is a graphic representation of the values of rm and rf
at which lods will be computed depending on the value of IAU(i):
IAU(i) = 4 (22 points)
rm
.5 |x
x x x
x x x x
.4
|
x x
.3
|
x x
.2
|
x x
.1
|
x
x
.05
|
x
x
.001|
x
x
0
|x
x
+-----------------------------
0 .001
.05 .1 .2 .3 .4
.5 rf
IAU(i) = 5 (34 points)
rm
.5 |x
x x x
x x x x x
x x x
.45
|
x x
.4
|
x x
.35
|
x x
.3
|
x
x
.25
|
x
x
.2
|
x
x
.15
|
x
x
.1
|
x
x
.05
|
x
x
.001|
x
x
0
|x
x
+---------------------------------------------
0 .001
.05 .1 .15 .2 .25 .3
.35 .4 .45 .5 rf
IAU(i) = 6 (9 points) IAU(i)
= 7
(16 points)
rm
rm
.5 |
x x
x .5 | x
x x x
.3 |
x x
x .35| x
x x x
.1 |
x x
x .2 | x
x x x
+------------------ .05|
x x
x x
0.1 0.3 0.5
rf +------------------------
0.05 0.20 0.35 0.50 rf
IAU(i) = 8 (64 points)
rm
.5 |x
x x x x
x x x
.4 |x
x x x x
x x x
.3 |x
x x x x
x x x
.2 |x
x x x x
x x x
.1 |x
x x x x
x x x
.05 |x x
x x x x
x x
.001|x x
x x x x
x x
0 |x
x x x
x x x x
+-------------------------
0 .001
.05 .1 .2 .3 .4 .5 rf
Option 8 is useful for approximate factorization of joint male and
female lods into sex-specific lods.
5.9.a (optional) Recombination fractions when IAU(1)=9 in
input
item 5.9
Each pedigree will be analyzed at the rm,rf-values provided here.
Col. 1- 5 Value for male recombination fraction
Col. 6-10 Value for female recombination fraction. These two
values are read with Format 2F5.4. For example, an input line for
rm = 0.1 and rf = 0.45 may look like this: b1000b4500, or bbb.1bb.45,
where
b stands for blank (space). For each likelihood to be computed, one
such line of values must be provided. To terminate the set of
rm,rf-values, enter 60000 as the last line. Maximum number of lines
including the terminating line is equal to MT.
The following input items, no. 5.10 through no. 5.12, must be repeated
for
each locus in the order main locus, marker locus 1, marker locus 2, etc.
5.10. Locus description
To be read with Format as provided on input item
5.2. The following items are expected:
- name of
locus
(at most 4 characters)
- symbol for allele
1 (at most 4
characters)
- symbol for allele 2,
etc. (at most 4
characters)
- symbol for phenotype
1 (at
most 8 characters)
- symbol for phenotype 2,
etc. (at most 8 characters)
5.11. Gene frequencies (dummy values required with a value of
1 in
col.7
of input item 5.1)
Col. 1- 8 Population frequency of allele 1
Col. 9-16 Population frequency of allele 2, etc. These values are
read with format 10F8.4, that is, every ten numbers must be on a single
line. Each number occupies at most 8 spaces with an implied decimal
point between the first four and last four spaces. For example,
bbbb9500 is
equivalent to bbbbb.95 and represents 0.95.
5.12. Mode of inheritance
To be read with the Format as provided on
input item 5.3. As many lines of input item 5.12 are expected as there
are
genotypes at the given locus. In the case of X-linkage, this refers to
the female genotypes. For males, with X-linkage, a genotype A/A is
interpreted as A/y while heterozygote genotypes such as A/a are
disregarded. On each line (for each genotype), the following items are
expected:
- symbol for first allele
- symbol for second allele; these two define a genotype
- probability of observing phenotype 1 under the given genotype
- probability of observing phenotype 2 under the given genotype,
etc.
The above applies to loci with KONT = 0 or KONT = -1 on input item 5.8.
For
quantitative phenotypes (KONT = 1), four items are expected for each
genotype, two alleles (defining the genotype) plus a mean and a
standard deviation. For age-dependent penetrances (KONT = 2 or KONT =
3),
each line must contain two allele symbols plus six parameters, ie.
three parameters for females and three parameters for males (the
sex-specific three parameters are defined in chapter 10). Note that in
this microcomputer implementation of LIPED, all phenotypes must be read
with A-Formats even though they may be quantitative measurements or age
values.
The following input items, no. 5.13 through no. 5.16, are to be
repeated
for each pedigree except that input item 5.14b is needed only once,
after
the first pedigree:
5.13. Pedigree information
Col. 1- 4 Number of individuals in pedigree. Count a
"doubled"
individual as 2 persons (see section on complex pedigrees).
Col. 5- 8 Number of (pairs of) doubled individuals; = 0 for
simple
pedigrees
Col. 9-68 optional remarks
5.14. Pedigree data
To be read with Format as given on input item 5.4. For
each individual, the following items must be given:
max.length
- symbol identifying the
individual
(ID) 4
- ID for one of the parents
*)
4
- ID for the other of the
parents
*)
4
- symbol for individual's
sex
4
- phenotype at main
locus
8
- phenotype at marker locus
1,
etc.
8
*) Note that each individual must either have two parents in the
pedigree, or both parents' ID may be replaced by the symbol for no
parent. If you have information on only one parent, you must provide an
ID for the other parent who will then have unknown phenotypes.
5.14a (optional) Identification of "doubled" individuals
Applies only to
complex pedigrees (number greater than zero in col. 5-8 of input item
5.13). For simple pedigrees, no input item 5.14a is expected. To be
read
with Format as given on input item 5.4. For each pair of doubled
individuals, the following two items are required:
- ID of first member of pair of doubled individuals
- ID of second member of pair of doubled individuals
5.14b (optional) Haplotype frequencies
This information is needed only
once, after the first pedigree, and only with a value of 1 in col. 7 of
input item 5.1. The haplotype frequencies are read with format 10F8.4
(as
are the gene frequencies). These values must be given in the following
order. Consider a main locus and a marker locus where n is the number
of alleles at the marker locus. Then, the order of that haplotype
corresponding to the i-th
allele at the main locus and the j-th
allele
at the marker locus is given by n(i – 1) + j. As an example with 2 alleles
at the main locus and 3 alleles at the marker locus, the haplotypes are
numbered as follows:
j=1 j=2 j=3
------------------
i=1
1 2 3
i=2
4 5 6
Note: there is no check that the haplotype frequencies sum to 1.
5.15. (optional) Recombination fractions
This information is needed only
when IAU(i) = 1 on input item 5.9. Then, as many likelihood
calculations
will be carried out as there are lines of input item 5.15. Each line is
read with format 2F5.4 (cf. input item 5.9a):
Col. 1- 5 value for male recombination fraction
Col. 6-10 value for female recombination fraction.
As the terminating line, enter 60000. For IAU(i) = 1 and i > 1, multiple sets of
recombination fractions (i of
them), separated
by 60000, must be entered.
5.16. Direction of further analysis
Col. 1-4 Value to determine what action to take next.
= 5000 if new lines of input item 5.9 are to be read (allowed
only
if no more than one pedigree is present in this problem) thus allowing
for linkage analyses between marker loci. The new lines are expected
immediately after input item 5.16. On each line, there must be as many
values as there are marker loci. The program will scan these values,
IAU(i),i = 1,2,.., and the first marker locus with associated value IAU
different from zero will be considered the new main locus. From then
on, on that line, the values of IAU have the same meaning as on input
item 5.9.
Multiple lines of input item 5.9 may follow a single 5000 value on
input
item 5.16. For example, consider a total of 5 loci, that is, one main
locus (locus no. 0) and 4 marker loci (numbered 1 through 4):
_
_2_2_2_2 ←
original input item 5.9
_
5000
_1_2_2_2 |
_0_1_2_2 |
extra lines of input item 5.9
_0_0_1_3 |
9000
Each of these extra lines of input item 5.9 has one field for each of
the
original marker loci. For instance, the following extra line, _0_1_2_2
would mean: "now take marker locus 2 as the new main locus, and pair it
with marker locus 3 (using option 2), then with marker locus 4 (using
option 2)", which could be extended to all marker loci. Note that
whenever option 1 is specified, a corresponding set of lines of input
item 5.15 is expected immediately after the line containing option 1.
To
terminate the set of extra lines of input item 5.9, enter a line with
8000 or 9000 (same meaning as below) in col. 1-4.
= 7000 if a new pedigree is to be read. Then, new lines of input
item 5.13 etc. are expected. Note that this is allowed only when no
more
than one marker locus is present.
= 8000 if a new problem is to be analyzed. Then, new lines of
input item 5.1 etc. are expected.
= 9000 to terminate this run.
6. DIMENSIONS
In the CONSTANT.INC (CONSTANT.FOR in Win version) file, constants
are
given which are used for dimensioning arrays. Example values are as
follows.
MLIST = 50 headsibs
(nuclear families)
MMARK = 30 marker loci in
addition to the main locus
MNAL = 5
alleles at any locus
MNDI = 5 pairs
of doubled individuals
MNFE = 21 phenotypes
at any locus
MNP = 25
genotype vectors stored in memory
MNPT = 200 individuals in a
pedigree
MT = 20
pairs of theta values after item 5.9,
including the terminating 60000 line.
To change these, simply adjust the values of the constants in the
parameter statements and recompile the program.
The following information is for programmers only and is not needed for
general program use. With the abbreviations,
KK = MNAL*MNAL
KK1 = MNAL*(MNAL+1)/2,
KK2 = KK*(KK+1)/2,
the array dimensions are given as follows (the arrays not listed below
have fixed dimensions):
FENO1(MNPT,KK1)
IAD(KK,KK) LIST(MLIST)
PHI(KK2)
FENO2(MNPT,KK1)
IAU(MMARK)
NAL(MMARK) PHIS(KK)
GEN(MNAL)
ID(MNPT)
NC(MNPT) PHPROB(MNFE)
GENO(MNP,KK2)
IGENO(MNP)
NF(MMARK) THV1(MT)
GF1(MNAL)
ISEX(MNPT)
NM(MNPT) THV2(MT)
GF2(MNAL)
KONT(MMARK)
NS(MNPT) THVS(MT)
GVX1(MNFE,KK1)
LDI(2,MNDI) PHE1(MNFE)
UNK(MMARK)
GVX2(MNFE,KK1)
LGC(MNDI) PHE2(MNFE)
HOLD(MNDI,KK2)
LGENO(MNPT) PHEPED(MMARK)
Note: MNFE must have a value of at least 8.
7. COMPLEX PEDIGREES
In a so-called simple pedigree, tracing the inheritance of genes
by
going backwards through the generations (upwards in the pedigree)
always leads to the same pair of founder parents. Pedigrees for
which this is not the case are called complex pedigrees. In
particular, pedigrees with the following features are examples of
complex pedigrees: (1) both members of a pair of parents have
themselves parents in the pedigree; (2) consanguinity loop, i.e.,
parents are related; (3) marriage loop, e.g., two brothers are
married to two sisters, or an individual has been married twice, the
two spouses being related with each other but not with the individual
who married twice. An example of the last kind is pedigree 1,
below, where [] refers to a female and () refers to a male:
Pedigree 1: marriage loop
[1]--.--(2)
|
.--------------.
|
|
(3)--.--[4]--.--(5)
| |
(6) (7)
Without special measures, LIPED analyzes only simple pedigrees. The
analysis of complex pedigrees is possible by manipulating the pedigree
in a certain way so that it "appears" to LIPED as a simple pedigree.
This manipulation consists of replacing a particular individual by two
individuals as shown in the example below (pedigree 2), and by
identifying the two individuals actually corresponding to the same
individual in the original pedigree (input item 5.14a). Note that such
a
"doubling" of individuals is necessary for breaking up loops, and also
whenever more than one of the two parents has parents in the pedigree.
When an individual has been "doubled", the number of individuals in the
pedigree must be increased by 1 thus counting a pair of doubled
individuals as two persons. Up to MNDI individuals may be "doubled" so
that, e.g., multiple consanguineous loops can be accommodated. At the
end of a pedigree with doubled individuals, the two individuals
corresponding to the one original person must be identified (input item
5.14a).
Pedigree 2: Example of a pedigree,
manipulated for processing by LIPED
Original pedigree
Manipulated
pedigree, acceptable to LIPED
[1.1]--.--(1.2)--.--[1.3]
[1.1]--.------(1.2)----.--[1.3]
|
|
|
|
|
|
|
|
(2.1)--.--[2.2]
(2.1a) (2.1b)-.-[2.2]
|
|
|
|
[3.1]
[3.1]
Here are some important notes regarding "doubling" of individuals:
- Individuals can only be doubled, not tripled. For example,
an
individual who is an offspring and is also married twice with children
from each marriage cannot be manipulated as described.
- Computation time generally increases drastically with the
number of pairs of doubled individuals. When one has a choice among
several candidates to be doubled, it is recommended to take an
individual with as much phenotypic information as possible in order to
exclude as many genotypes as possible. For example, in pedigree 1,
above, any one of individuals 3, 4 or 5 could be chosen for doubling.
In the presence of one doubled individual, the QLIK routine for
calculating the likelihood is executed for each genotype of that
individual, except for those genotypes known to be incompatible with
the individual's phenotypes or the phenotypes of his or her offspring.
Analogously, for several pairs of doubled individuals, QLIK is called a
maximum of m times, where m is calculated as follows. Let n be the
number of haplotypes at the two loci jointly, i.e., n is the product of
the number of alleles at the two loci under consideration. The number
of joint genotypes is then given by g
= n(n + 1)/2, so that m = gNDI,
where NDI is the number of pairs of doubled individuals. For example,
with 2 and 3 alleles at the respective two loci, one has n = 6
haplotypes and g = 21
genotypes. With NDI = 3 pairs of doubled
individuals, QLIK may be called up to m
= 9261 times. The present
version of PC-LIPED counts these calls and displays them on the screen.
- Whenever the genotype of an individual can unequivocally be
inferred with certainty (including phase), such an individual may be
represented as multiple individuals in the pedigree if necessary, and
this individual must not be counted as a so-called doubled individual
(treat it as separate multiple individuals). The likelihood will then
not be correct but the lod score will be unaffected by such a
manipulation. For example, if an individual is known to be A/A at locus
1 and B/b at locus 2, the joint genotype is known to be AB/Ab. Note
that for doubly heterozygous individuals it will not generally be
possible to make use of this feature even though phase may be known, as
there is no easy way to identify phases in LIPED on the basis of
phenotypes.
8. MUTATION
Mutation is allowed for at the current main locus only and is
assumed
to occur with a constant rate from any of the alleles no. 2, 3,...
towards the first allele, with the mutation rate being specified in
col. 8-20 of input item 5.1. Backmutation is assumed to be negligible.
Also, in the computation of the likelihood, it is assumed that a
mutation occurs only in one or the other of two parents, but not
simultaneously in both parents.
WARNING: when processing a disease locus with mutation and
subsequently, in the same run, testing marker versus marker, then the
mutation rate keeps applying to the current main locus unless a new run
is carried out with the mutation rate set equal to zero.
Note that only simple pedigrees can be processed by LIPED unless
special steps are taken to code for a complex pedigree (see that
section above).
9. QUANTITATIVE PHENOTYPES
At any locus, quantitative rather than qualitative phenotypes can
be
read. For a locus with quantitative phenotypes, the following special
rules must be observed.
Input
item Explanation
----------------------------------------------------------------
3 With two
F-formats, read one mean and one standard
deviation for each genotype.
7 Set the
number of phenotypes equal to 2. The
program will correct wrong numbers.
8 Set KONT
equal to 1.
10 Two
phenotype symbols will be read by the program
but they are not used in any way.
12 After the
symbol for the second allele, two items
are expected, the mean and the standard
deviation
of the phenotype
distribution given the particular genotype specified by the two
alleles.
14 The
phenotype values must not occupy more than 8
spaces each.
----------------------------------------------------------------
10. AGE-DEPENDENT PENETRANCE
10.1 Age classes with different penetrances
Age-dependant penetrance refers to the fact that a carrier of a
disease
gene may not exhibit the disease at birth but only later in life, that
is, the penetrance (= probability of showing a certain phenotype given
a genotype) depends on the age of an individual.
The easiest way of implementing age-dependant penetrance is by forming
age classes and having different penetrances in these classes. For
affecteds, irrespective of their age, only one class is required
(provided that no phenocopies are allowed for), but unaffecteds must be
grouped into age classes. For example, in a given disease, if all gene
carriers beyond 10 years of age have expressed the disease, a suitable
assumption is that penetrance rises linearly from 0 at age 0 to 100% at
age 10, as pictured below:
Penetrance
|
1
| -------------
| /
| /
| /
|/
0
+------------------age
0
10 20
One might then form 6 classes as follows, where AFF stands for the
'affected' phenotype, and NA1, NA2, etc. stands for unaffected in age
class 1, 2, etc.; NA5 denotes unaffected individuals older than 10
years who are taken to be known not to carry the disease gene. The
disease is assumed dominant, the disease being T. Note that the
probability of being unaffected is 1 minus the probability of being
affected.
-------------------------------------------
Phenotypes
----------------------------
Genotype
AFF NA1
NA2 NA3 NA4 NA5
-------------------------------------------
T
T
1 .88 .63 .38 .13 0
T
t
1 .88 .63 .38 .13 0
t
t
0 1 1
1 1 1
-------------------------------------------
10.2 Age-of-onset distributions
Rather than forming age classes, the distribution of the age at
disease
onset may be assumed to follow a certain distribution. In LIPED, two
such distributions are implemented, the lognormal and a straight-line
distribution. Below, F denotes the distribution (cumulative sum) of age
at onset whereas f denotes the corresponding density (histogram).
Whatever the age of onset distribution used, to represent in a single
number the various pieces of phenotypic information (age at onset,
present age, affection status) at a disease locus, the following
conventions must be observed in LIPED. In principle, the phenotype to
be provided in the input to LIPED is an individual's present age (or
age last seen) or the age at onset, taken with a minus sign for
unaffecteds, and taken to be positive for affecteds. Present age and
age at onset are distinguished as outlined, below.
UNAFFECTED INDIVIDUALS
The phenotype is an individuals present age (or age last seen),
taken
with a minus sign (the sign distinguishes affecteds from unaffecteds).
Example: unaffected, present age is 56, phenotype given in program is
-56. If present age is unknown, a guess must be used, for example,
based on ages of sibs or parents.
AFFECTED INDIVIDUALS
If actual age at disease onset is unknown, the phenotype is a
person's
present age. Example: 56. If present age is unknown, a guess must be
used, based on ages of relatives.
If age at disease onset is known, it is entered into LIPED by the
following coding scheme: The phenotype to be provided is obtained by
adding 500 to the age at onset. Example: age at disease onset is 23
years; phenotype to be provided is 523.
NOTE: actual age at disease onset is relevant only when disease can
occur under different genotypes with different penetrances. If this is
not so (it usually is not), then present age may be given for all
affecteds.
UNKNOWN DISEASE STATUS
The phenotype is given as 0. Alternatively, on input item 5.5, one
may
define any other code for unknown phenotype, for example, blank.
10.3 Lognormal distribution of age of onset
It is often meaningful to assume that age of onset is lognormally
distributed, that is, that LN(age of onset) follows a normal
distribution where LN denotes natural logarithm (a simpler assumption
for age-of-onset distribution is covered in section 10.4, below). Mean
and standard deviation for the lognormal and normal distributions are
defined and connected with each other as follows:
Age
(orig. values) LN(age)
(lognormal distr.) (normal distr.)
--------------------------------------------------
mean
μ
u
std.
dev. σ
s
--------------------------------------------------
For ease of presentation, define m = exp(u) and w = exp(s2).
Then one
has:
μ = m √w
σ = m √[w(w – 1)] = μ √(w – 1)
u = 2 LN(μ) – 0.5 LN(μ2 + σ2)
s = √[LN(μ2 + σ2) – 2 LN(μ)] = √[2{LN(μ) – u}],
where LN denotes natural logarithm. Also, with given mean, μ, of the
raw data, and standard deviation, s, of the transformed data, one
obtains the mean of the transformed data as u = LN(μ) – 0.5s2.
Some example values are given in the following table:
--------------------------------------------
Original
scale
LN scale (normal distr.)
-------------
------------------------
μ
σ
u s
--------------------------------------------
20
5
2.97 0.25
20
10
2.88 0.47
20
15
2.77 0.67
40
5
3.68 0.12
40
10
3.66 0.25
40
15
3.62 0.36
--------------------------------------------
The LOGNORM program (included) transforms values u and s into the
corresponding values of μ and σ,
and vice versa.
If age at onset for an (affected) individual is known, the
corresponding likelihood is simply f(age at onset), where f is the
lognormal density. If age at onset is unknown, then the likelihood is
F(age) where 'age' denotes current age, or age last seen, and F is the
lognormal distribution function. For unaffecteds, the likelihood is
equal to 1 – F(age). If the final penetrance, t, is less than 100% then
f and F above are multiplied by t.
Lognormal age-dependent penetrance is modeled in analogy to
quantitative phenotypes (see previous section) except that here, 6
parameters must be specified (3 for females and 3 for males). For each
genotype (input item 5.3), these are
- the mean, u, of LN(age of onset)
- the standard deviation, s, of LN(age of onset)
- the limiting penetrance, t, when age is very high, for
females, followed by the analogous three parameters for males.
Depending on the values of the 6 parameters given (input item 5.3) for
each genotype, the following 2 situations can be distinguished. Assume
a disease locus with two alleles, a dominant disease allele, D, and a
normal allele, d.
1. Age of onset follows a lognormal distribution with parameters u and
s, where the final penetrance attained (at high age) is equal to t. For
example (parameters taken to be the same for males and females), one
may have on input item 5.3:
D D 3.35 0.17 1.0 3.35 0.17 1.0 →
u = 3.35, s = 0.17, final
penetrance 100%
D d 3.35 0.17 0.6 3.35 0.17 0.6 →
susceptible individuals express
disease with max. penetrance of 60% when they are very old
d d 3.0 0.1 0.0 3.0
0.1 3.0 → genotype d/d not
susceptible to disease; values of u and s are irrelevant (likelihood is
zero for affecteds and 1 for unaffecteds)
2. Penetrance does not depend on age but is a fixed value (t for
affecteds, 1-t for unaffecteds). To accommodate this situation, set
s = 0.0. The value of u is then irrelevant. For example, one may have
d d 0.0 0.0 0.01 0.0
0.0 0.01 → t = 0.01, that is, d/d
genotypes express the disease with probability 1%, irrespective of age
(likelihood is 0.01 for affecteds and 0.99 for unaffecteds); the value
of u is irrelevant. This case should be used with great care since it
does not differentiate between age of onset known or unknown.
In summary, for a locus with age-dependent (lognormal) penetrance, the
following special rules must be observed.
Input
item Explanation
-------------------------------------------------------------
3 Provide six F-formats to
read on each line (genotype) one
mean, one standard deviation and one final penetrance for each sex
(note that these Formats will apply to all loci).
7 Set the number of
phenotypes equal to 1 (the program will set
the correct number of phenotypes).
8 Set KONT equal to 2.
10 Six phenotype symbols
will be read by the program, but they
are not used in any way.
12 After the symbol for the
second allele, six items are
expected: mean, standard deviation and final penetrance for females,
and the analogous three parameters for males.
14 The value for the
phenotype (age) must not occupy more than 8
spaces (4 recommended), the actual number of spaces used being
determined by the format statement given in input item 5.4. A positive
age value refers to an affected individual, a negative age figure
identifies an unaffected individual. Phenotypes are coded following the
rules given in section 10.2, above.
---------------------------------------------------------------
10.4 Straight-line curves for age of onset (locus type 3)
F
|
t
| --------
| /.
| / .
|
/ .
|
/ .
|
/ .
0
+------------------ age
A1 A2
F is the probability of being affected, that is, the penetrance (or
likelihood) is equal to F for an affected individual (age at onset
unknown) and equal to 1 – F for an unaffected individual. According to
the figure, above, the age-of-onset curve is defined as
/
0
if a ≤ A1
F = | t(a - A1)/(A2 - A1)
if A1 < a < A2
\
t
if a ≤ A2
where "a" is an individual's present age, or age last seen. If age at
onset is known (for an affected individual) then the likelihood
(density) is equal to f = t/(A2 – A1) if the age of onset is between A1
and A2, and equal to zero otherwise. If age at onset is considered a
random variable, according to the present definition and with t = 1, it
follows a uniform distribution with mean (A2 – A1)/2 and standard
deviation (A2 – A1)/3.464.
For a locus of type 3 (straight line age of onset), coding is very
similar to the conventions used for lognormal age of onset (specific
instructions are given below). The phenotypes are the ages of each
individual, taken to be positive for affected individuals and taken
with a minus sign for unaffected individuals. Zero will be interpreted
as unknown, but any other symbol may also be designated to represent
unknown phenotype. For affected individuals with known age at onset,
enter a number equal to 500 plus age at onset as the phenotype (see
section 10.2, above).
As in lognormal age-dependent penetrance, 6 parameters must be
specified but here, they have the following meaning. For each genotype
(input item 5.3), they are (see graph, above)
- the age, A1, at which penetrance becomes positive
- the age, A2, at which penetrance reaches its final values
- the limiting penetrance, t, when age is very high, for females,
followed by the analogous three quantities for males.
Depending on the values of the 6 parameters given (input item 5.3) for
each genotype, the following 2 situations can be distinguished. Assume
a disease locus with two alleles, a dominant disease allele, D, and a
normal allele, d.
1. Age of onset follows a straight-line distribution with parameters A1
and A2, where the final penetrance attained (at high age) is equal to
t. For example (parameter values taken to be the same for females and
for males), one may have on input item 5.3:
D D 10 60 1.0 10 60
1.0 → for individuals with D/D
genotype, susceptibility to disease starts at age 10 and penetrance
reaches its maximum of 100% at age 60.
D d 10 60 0.6 10 60
0.6 → susceptible individuals
express disease with max. penetrance of 60% when they are 60 years or
older.
d d 10 11 0.0 10 11
0.0 → genotype d/d not
susceptible to disease; values of A1 and A2 are irrelevant (given
genotype d/d, likelihood is zero for affecteds and 1 for unaffecteds).
2. Penetrance does not depend on age but is a fixed value (t for
affecteds, 1 – t for unaffecteds). To accommodate this situation, set
A2 = 0.0. The value of A1 is then irrelevant. For example (same
parameter
values for males and females), one may have
d d 0.0 0.0 .01 0.0
0.0 .01 → t = 0.01, that is, d/d
genotypes express the disease with probability 1%, irrespective of age
(likelihood is 0.01 for affecteds and 0.99 for unaffecteds); the value
of A1 is irrelevant. This case should be used with great care since it
does not differentiate between age of onset known or unknown.
In summary, for a locus with straight-line age-dependent penetrance,
the following special rules must be observed.
Input
item Explanation
-----------------------------------------------------------------
3 On each line
(genotype) provide six F-formats to read
one starting age (A1), one finishing age (A2) and one final penetrance
(t) for each sex. Note that these Formats will apply to all loci.
7 Set the number of
phenotypes equal to 1 (the program
will set the correct numbers).
8 Set KONT equal to
3.
10 Six phenotype
symbols will be read by the program, but
they are not used in any way.
12 After the symbol
for the second allele, six items are
expected: starting age (A1), finishing age (A2) and final penetrance
(t) for females, and the analogous three parameters for males.
14 The value for the
phenotype (age) must not occupy more
than 8 spaces (4 recommended), the actual number of spaces used being
determined by the format statement given in input item 5.4. A positive
age value refers to an affected individual, a negative age figure
identifies an unaffected individual. Phenotypes are coded following the
rules given in section 10.2, above.
----------------------------------------------------------------
11. CALCULATION OF GENETIC RISKS
To calculate conditional genotype probabilities for a specific
individual, given all the family data, one must carry out several
likelihood computations and combine their results as follows. For
example, consider an individual with phenotype 'unaffected' and
penetrances as given in the table below, where D is the disease allele
and d is the normal allele at the main locus.
-------------------------------------
Penetrance for
phenotypes
---------------------------
Genotype
affected
unaffected XDd
-------------------------------------
D
D
0.9
0.1 0
D
d
0.6
0.4 0.4
d
d
0
1 0
-------------------------------------
For this unaffected individual, one wants to compute the risk that he
or she has genotype D/d. To obtain this risk, one runs LIPED twice,
each time with a different phenotype assigned to this individual, that
is, in run 1, the individual has phenotype unaffected, and in run 2,
the individual has phenotype XDd. Denote the resulting likelihoods (not
lod scores) by L(ua) and L(XDd). Then, the risk to this individual of
having genotype D/d is given by L(XDd)/L(ua). Note that other programs,
such as the MLINK program of Dr. Mark Lathrop, can compute genetic
risks directly.
With X-linked recessive deleterious traits, for a female founder
individual (no parents in pedigree), the prior probability, q, of being
a carrier of the disease gene is a multiple of the mutation rate, u.
For example, in Duchenne muscular dystrophy (DMD), q = 4u (Murphy and
Chase, "Principles of Genetic Counseling"). In the likelihood
calculation of pedigree data, on the other hand, the prior probability
of a founder's genotype is determined solely by the gene frequency, p.
For example, the prior probability that a founder is heterozygous is
given by 2p(1 – p). Therefore, to implement the
prior probability, q,
that a woman is heterozygous for an X-linked recessive deleterious
gene, in the likelihood calculation, one must choose the gene frequency
of the deleterious gene, p,
such that q = 2p(1 – p) or, approximately,
p = q/2 (in DMD, thus, p = 2u).
12. LIKELIHOOD AT A SINGLE LOCUS
In some applications, the likelihood at a single (disease) locus
only
is needed. For example, one may want to estimate from family data gene
frequencies or age-of-onset parameters at a single locus. In LIPED,
single-locus calculations are accom-modated easiest by defining a dummy
second locus with a single allele of frequency 1.
13. HELPFUL HINTS
In a pedigree to be processed by LIPED, any individual must have
either
both parents in the pedigree, or be a founder individual, that is, have
both parents unknown (not in pedigree). Note that siblings cannot be
recognized as such unless their parents are also in the pedigree. If
such parents are not actually known, they will must still be present in
the pedigree, possibly with all phenotypes coded as unknown.
When there is at least one known recombination in a pedigree but the
value of the recombination fraction is set equal to zero, then the
likelihood will be equal to zero, and the log likelihood equal to
-∞. On output, -∞ is represented as -99.99.
When the likelihood is equal to zero, either because recombinants are
present while the recombination fraction (θ, theta) is set to zero or
because of a genetic inconsistency (incompatible genotypes of some
individuals), LIPED will report this with the message, "L(rm, rf = 0 at
rm, rf =...", and will print the male and female recombination
fractions at which the likelihood is zero, and sequential number and ID
code of the individual at which this was first detected. An
incompatibility then exists among the indicated individual and his or
her spouse(s) and descendants. Note that additional incompatibilities
not yet detected may exist in the given pedigree. Using θ = 0, this
helps to find recombinations in a pedigree. In families with loops (eg,
inbreeding), this scheme does not work, and LIPED will report a zero
likelihood only at θ = 0.5 but cannot pinpoint where this was first
detected.
In the locus descriptions, when there are several alleles, the number
of possible phenotypes may become quite large. For the analysis,
however, it is not necessary to list all phenotypes that might possibly
occur. One only needs to identify those phenotypes that are actually
present in at least one family member.
When you request output on a disk file, the input used will be appended
to the output file. If you do not want this, it is easiest to proceed
as follows. Pretend running an additional problem, that is, your last
input line is 8000 rather than 9000, and add an additional input line
containing 0 (= number of marker loci) in column 2. LIPED will then
stop without appending the input file to the output file. This method
is used in example 5 (file EX5.DAT), below.
14. EXAMPLES
Eample 1
Th EX1.DAT file contains input corresponding to a family pedigree
with
the
structure as shown in section 7, "Complex pedigrees" (pedigree 2),
above. Two dominant loci are used with two alleles each, where A > a
and B > b. The gene frequencies are p = P(A) = 0.4 and q = P(B) =
0.3. Calculation of the pedigree likelihood by first principles yields
L(θ) = (1 – p)5 p(1 – q)3 q2 (1 – θ)[1
+ q(1 – θ)2 + qθ2]/8,
where θ denotes the recombination fraction. With this, one obtains, for
example, log [L(0.5)] = –4.16106927 and log[L(0.2)] = –3.93702064,
which agrees with the output given by LIPED. The lod score at θ = 0.2
is thus 0.224.
Example 2
The EX2.DAT file shows an example with 3 pedigrees and the use of
output
option 9, ie, summation of lod scores over pedigrees. The second
pedigree in this set of 3 pedigrees requires much more computer time
per lod score than either pedigree 1 or 3.
Example 3
Compare linkage relationships among 6 gene markers, here labelled
main
locus and marker loci 1 through 5. The first comparison made takes more
computer time per lod score than the other comparisons. This example
shows the use of various output options in combination with locus
comparisons.
Example 4
The EX4.DAT file shows a published pedigree with Norrie's disease
(X-linked
recessive) and 2 marker loci. Analysis is disease versus each marker
and marker versus marker.
Example 5
Mode of inheritance of disease locus in example 1 is changed such
that
penetrance rises linearly from age 0 to 10. As no individual is in that
age range, the output is the same as in example 1.
15. LITERATURE
Ban Y, Davies TF, Greenberg DA, Concepcion ES, Tomer Y (2002) The
influence of human leucocyte antigen (HLA) genes on autoimmune thyroid
disease (AITD): results of studies in HLA-DR3 positive AITD families. Clin Endocrinol (Oxf) 57:81-88 [a
recent example for the use of the LIPED program]
Cheung KH, Nadkarni P, Silverstein S, Kidd JR, Pakstis AJ, Miller P,
Kidd KK (1996) PhenoDB: an integrated client/server database for
linkage and population genetics. Comput
Biomed Res 29:327-337 [an example of the use of the LIPED
program]
Elston RC, Stewart J (1971) A general model for the analysis of
pedigree data. Hum Hered
21:523-542
Ott J (1974) Estimation of the recombination fraction in human
pedigrees: efficient computation of the likelihood for human linkage
studies. Am J Hum Genet
26:588-597
Ott J, Schrott HG, Goldstein JL, Hazzard WR, Allen FH Jr, Falk CT,
Motulsky AG (1974) Linkage studies in a large kindred with familial
hypercholesterolemia. Am J Hum Genet
26, 598-603
Ott J (1976) A computer program for linkage analysis of general human
pedigrees. Am J Hum Genet
28:528-529
Ott J (1986) Y-linkage and pseudoautosomal linkage. Am J Hum Genet
38:891-897
Ott J (1999) Analysis of Human
Genetic Linkage, 3rd edition. Johns
Hopkins University Press, Baltimore
Schrott HG, Goldstein JL, Hazzard WR, McGoodwin MM, Motulsky AG (1972)
Familial hypercholesterolemia in a large kindred. Evidence for
monogenic mechanism. Annals of
Internal Medicine 76:711-720 [description of the
Alaska pedigree]
Terwilliger JD, Ott J (1994) Handbook
of Human Genetic Linkage. Johns
Hopkins University Press, Baltimore