The Ising Model for Detecting Epistatic Interactions in Genetic Sib-pair Analysis

 

 

This is a brief manual for the program "Ising", which can be used to calculate a likelihood based statistic for an epistatic genetic model. The primary use of this program is to scan the genome for genetic interactions given a conditional locus (usually this will be a locus for which linkage has already been demonstrated).

 

The Ising model is described in detail in:

Majewski J, Li H, Ott J.  The Ising Model in Physics and Statistical Genetics. Am J Hum Genet. 2001 Oct;69(4):853-62. [PDF]

 

The program requires two input files: pedfile.dat and param.dat

 

param.dat contains a description of the dataset: number of markers on chromosome1, number of markers on chromosome2, position of the locus on chromosome1 for which the epistatic interaction is to be tested, fast/slow version of the calculation (1/0) e.g.

 

7

3

4

1

 

specifies 7 markers on chromosome1, 3 markers on chromosome2, and the 4th marker on chromosome1 is selected to be tested for possible epistatic interaction with all markers on chromosome2. If 0 is entered for the number of markers on chromosome2, it is assumed that all markers are on chromosome1 and the conditional locus is tested for interaction with all other markers on chromosome1. The fast/slow option (1 = fast, 0 = slow) has the slow version as default. The fast option produces approximate results by estimating the coupling parameters between markers only under the null hypothesis, but not under alternative hypotheses (linkage, interaction). The slow version is exact and separately estimates linkage between markers under each hypothesis. In most cases the results are almost identical for both options, but any promising results obtained with the fast version should be then checked using the exact, slow option. NOTE: the fast version should not be used when interactions between markers on the same chromosome (i.e. with chromosome2 = 0) are tested, as the results are likely to be very inaccurate. The current version of the program will run with a maximum of 500 sib-pairs, and a total of 50 markers on two chromosomes combined.

 

pedfile.dat is the pedigree file in LINKAGE pre-MAKEPED format:

 

[pedigree id] [person id] [father id] [mother id] [sex] [affection] [marker 1a] [marker 1b] ...

 

The markers should be arranged in the order that they are present on the chromosomes, with markers for chromosome2 directly following markers on chromosome1 in the pedigree file.

 

Ising uses subroutines modified from the the sib_ibd module of the ASPEX package (Copyright (C) 1998 David A. Hinds and Neil Risch -- All Rights Reserved) to produce an intermediate file, "data.dat", which is then processed using the Ising model. "data.dat" contains IBD sharing states for each parent and each marker. The data is arranged in rows corresponding to: father1, mother1, father2, mother2, etc. Each row consists of IBD sharing states at each marker, e.g.:

 

 

<         chromosome1         > <  chromosome2 >

 

1    1    1   -1   -1   0    -1   1    1    1    (father1)

-1   -1   -1 -1   -1   -1   -1   -1   -1   -1   (mother1)

1    1    -1 -1   -1   -1   -1   1    0    1    (father2)

0    1    1   -1   -1   -1   -1   -1   0    -1   (mother2)

...........................................................

 

The coding for IBD sharing states is as follows:

1, both sibs share the same allele from this parent

-1, the sibs inherited different alleles from this parent

0, unknown (undetermined) IBD sharing

3, this is the unique case where it is possible to determine that one allele is shared and the other is not, but impossible to tell which IBD state corresponds to which parent.

 

If multiplex families, containing more than two affected siblings, are used, all possible affected sibling pairs are used in the analysis. No weighting scheme is employed. At this time, the Ising model does not use information on population allele frequencies to estimate IBD sharing probabilities in cases where some parents are not genotyped.

 

The tar archive for the ising model includes example files: pedfile.dat, data.dat, param.dat, and ising.out.

 

pedfile.dat is a pedigree file in pre-makeped linkage format

data.dat is the IBD-sharing file

param.dat is the description of the data required by Ising

ising.out is the result of an Ising run.

Readme.txt is this manual in text format.

Ising is the executable binary for the Ising program.

 

The output file, ising.out has the following format:

 

marker shared/unshared LOD-EPI h_gene1 h_gene2 j_int LOD-MPT h_gene2 h_gene1 LOD-CHI2

-------------------------------------------------------------------------------------

1      232/238         0.7787  0.5046  0.0000  0.0503 0.0003 0.5077  0.0017  0.0166

 

Column:

 

1) Marker no. on chromosome 2

 

2) Shared vs. unshared number of alleles at this marker

 

3) LOD-EPI, this is the LOD score in support of the model with two effect loci and an epistatic interaction over a model with only one effect at locus1. This statistic is distributed as a 50:50 mixture of chi square with one degree of freedom and chi square with two degrees of freedom.

 

4,5,6) The maximum likelihood estimates of the Ising parameters for the effects at locus1, locus2, and the epistatic interaction.

 

7) LOD-MPT is the multipoint LOD score in support of a two locus model with effect genes at locus1 and locus2 (but no interaction) over a model with only one effect at locus1 (this should be roughly equivalent to a multipoint Zlr statistic computed by Allegro for locus2). This statistic is distributed as a 50:50 mixture of chi square with one degree of freedom and a point mass at the origin.

 

8, 9) The maximum likelihood estimates of the Ising parameters for the effects at locus1, locus2 (with no interaction).

 

10) A LOD score equivalent of a single point CHI2 statistic obtained by counting shared vs. unshared alleles at the locus.

 

 

Speed and Memory considerations

 

The program does not require much memory, but it is fairly slow, since all the relevant parameters are maximized simultaneously. I've been testing it on a PIII 800 Mhz running Linux. Typical times for a dataset of 356 sib-pairs with 14 markers on chromosome1 and 9 markers on chromosome2 are 15 minutes for the full slow version, and 2:30 minutes for the approximate fast version. The running time increases roughly linearly with the number of markers on chromosome1 and quadratically with the number of markers on chromosome2.

 

I have compiled the program to run on LINUX and SunOS:

 

Ising_linux.tar

 

Ising_sun.tar

 

This is an early release version. I expect it still has some bugs. Please contact me with any problems/comments:

 

majewski@complex.rockefeller.edu