crimap documentation (version 2.4)


5.2 build

Builds a map by sequential incorporation of loci.

This is the main option used for RFLP map construction. To describe the method used in greater detail, define an "orders object" to be a collection of loci, together with a set of permissible orders for those loci. The "orders database" contains a collection of orders objects. At any given stage during the map construction "the map" (designated "current_orders" in the program source code, and output) consists of one such orders object.

At the beginning of the run the orders database is empty, and the map consists of the "ordered loci" specified in the .par file. (In the absence of any prior information concerning locus order, such as physical localization data, one usually chooses the "ordered loci" to be a pair of linked and highly informative loci, in order to accelerate the map construction process). The first locus in the list of inserted loci (specified in the .par file) is placed in each possible interval in the map. The resulting locus orders are then tested for compatibility with the database; each order not excluded is subjected to a full maximum likelihood estimation. The order having the highest log10 likelihood is found, and any order whose log10 likelihood is less than this one by more than a specified tolerance (PUK_LIKE_TOL or PK_LIKE_TOL) is eliminated.

The resulting collection of non-excluded orders form an orders object (called orders_temp in the program output). The database is then updated by

If orders_temp has fewer, or the same number, of orders as does the map, then it becomes the new map and the added locus is deleted from the list of inserted loci. Otherwise the next locus in the list is tried in the same manner. If no locus in the list meets this criterion, the one with the smallest orders_temp is added to the map.

Each time a locus is added to the map, the program returns to the beginning of the (revised) list of inserted loci. The orders database is kept in memory during the program run; a copy of it is written to the .ord file every time a new locus is added to the map, so that the program can be stopped and restarted later without loss of the information in the database.

Build first analyzes the "phase known" data (see description of .dat file structure), for which maximum likelihood computation is much more rapid than for the full data set. When the map gets too large (i.e. the number of orders in the map exceeds PK_NUM_ORDS_TOL), build proceeds to find a set of uniquely ordered loci, starting with the original ordered loci and adding additional loci which now have a unique placement (during this latter step, the program only makes use of information in the data base; it performs no likelihood calculations). Using this uniquely ordered set of loci as the new "map", the program then proceeds to analyze the full data set, again adding loci sequentially according to the procedure described above. This portion of the program stops when the number of orders in the map exceeds PUK_NUM_ORDS_TOL.

Build then again extracts a set of uniquely ordered loci on the basis of information in the orders database, and prints out sex-specific and sex-averaged recombination fractions and the corresponding Kosambi centiMorgans for these loci. For each remaining locus, it prints out the possible placements with respect to the uniquely ordered loci, along with the log10 likelihoods for each placement.

If one starts the map with a pair of unlinked loci, then construction of the map is slower, because in the initial stages the map will consist of two unlinked linkage groups which can be in either orientation with respect to each other so that there are more orders to keep track of (and to place remaining loci in). Conversely, if the initial loci are at 0 recombination fraction (for example, two RFLPs detected by the same probe), then in all subsequent maps the loci will appear in both possible orientations, thereby doubling the number of orders. What is worse, the parts of the program which extract the maximum set of uniquely ordered loci will never get past these two (since no other locus is uniquely placed with respect to them). To avoid this, it is advisable to put these loci later in the list, or (better) to haplotype them using hap_sys or hap_sys0 in the .par file.

It is a good idea to exclude relatively uninformative loci from the initial map construction process: they tend to have multiple possible placements, resulting in a large number of orders to test. Their positions can be determined later using all, or instant. With large numbers of loci, it may be necessary to perform several map runs using build until one arrives at "the best" set of uniquely ordered loci. Information in the orders database is cumulative between runs, provided it is not reinitialized using prepare; it is not necessary to reinitialize the orders file when using different sets of loci from the same file.


NOTE: The goal of multilocus linkage analysis is to find the locus order having the highest likelihood, and identify alternative orders with comparable likelihoods. Because it is impossible to consider all possible orders for a large set of loci, it is necessary to adopt a strategy which makes decisions on the basis of subsets of the loci.

Multiple statistical tests are performed, each with a nonzero probability for Type I error. Because incorrect orders will sometimes have significantly higher likelihoods than the correct one, and the chance of this happening increases with the number of sets of loci which are examined during the map construction process, it is possible that the maximum likelihood order would in fact be rejected by build (or any other published method for map construction, for that matter) because for some subset of the loci, an alternative order has a significantly higher likelihood.

In our experience this happens only very rarely (probably because the tests are not independent), and when it does is usually due to errors in the data itself. Nevertheless, three precautions should be taken to guard against the possibility of an erroneous order being adopted by build.


up: 5. program options

previous section: 5.1 all

next section: 5.3 chrompic