crimap documentation (version 2.4)


4.1 .gen file

The data items in this file are as follows:
{# of families} {# of loci} {locus name1} {locus name2} . . . (each name consisting of at most 15 characters)
For each family:
{family ID} (consisting of an arbitrary character string)
{# of members}
For each member in the family:
{ID #} {mother's ID #} {father's ID #} {sex: 0 if female, 1 if male, 3 if unknown}
{locus1 allele1}{locus1 allele2}{locus2 allele1}{locus2 allele2} . . .

The pedigree structure must be completely specified; this means that for any individual with an ancestor in the pedigree, both parents must be assigned ID #'s and must appear in the file. The family ID may be any string of characters (without embedded blanks) but individual IDs must be numbers. For "original parents" (individuals without ancestors in the pedigree), the mother and father IDs are coded as 0. Alleles must be represented by integers, with missing alleles scored as 0. To handle new mutations in a dominant disease gene correctly, ancestors of the mutant individual should have their disease locus genotypes coded as missing. To handle non-pseudoautosomal X linked loci, a dummy allele number (e.g. 9) must be created and assigned as the second allele for all males in the pedigree. Example: For a data set with two loci, LOCA and LOCB, scored on the single pedigree depicted on the next page, the corresponding .gen file would be

1 
2 
LOCA LOCB
P100 
6 
1001 0 0 0
1 2 2 2
1002 1001 1009 0
1 2 1 2
1003  0 0 1
1 1 1 2
1004 1002 1003 0
1 2 1 1
1005 1002 1003 0
1 1 2 2
1009 0 0 1
0 0 0 0

Note that the missing maternal grandparent has been assigned an ID of 1009 and included in the file.


up: 4. file structures

next section: 4.2 .par file