From: softlib.cs.rice.edu
Last mod: December 23, 1995

Running FASTLINK 3.0P in parallel on networks of workstations


Introduction

TreadMarks is a parallel programming system developed at Rice University to allow programs written in a shared-memory style to run on network of unix workstations. The TreadMarks project is separate from the FASTLINK project, but we collaborate on making FASTLINK run in parallel on top of TreadMarks. References for TreadMarks can be found in README.parallel.

Retrieving and Installing TreadMarks

For the moment, we are not making TreadMarks available by ftp. Write to treadmarks@ece.rice.edu and to schaffer@cs.rice.edu if you are interested and we will arrange with you to do the installation.

You can obtain a free 30-day demo copy of TreadMarks or you can get a normal license. Licensing and pricing are available by writing to treadmarks@ece.rice.edu.

Hooking TreadMarks and FASTLINK together

The file called Makefile has two variables that need to be set: TmkDIR should be set to the root directory of your TreadMarks installation The ARCH flag needs to be set for your specific architecture The PARINCLPATH flag line that reads
    -I$(TmkDIR)/include -I/usr/local/include
needs to be uncommented. See README.Makefile for more details.

To compile parallel ilink, mlink, or linkmap, do

  make ilink.udp
  make mlink.udp
  make linkmap.udp
respectively.

These will put the corresponding executables wherever the BINDIR flag points to.

Running TreadMarks FASTLINK

You would typically want to make a soft link between the regular name without the udp extension in your data directory and the actual executable. E.g.
  ln -s ../bin/ilink.udp ilink
There are some compilation flags you may want to set to prepare for a run. See README.Makefile or the Makefile itself for instructions.

Running TreadMarks FASTLINK, Command line flags

Specific command line flags are discussed below. First, though, it is important to note how TreadMarks and FASTLINK distinguish which flags are meant for which program. Every command line *must* contain the string `--' (without the quotes). This string delimits FASTLINK flags from TreadMarks flags. All arguments *before* the `--' are read by FASTLINK, while those appearing *after* the `--' are read by TreadMarks.

For example, in the command:

  linkmap -w 40 -- -f machines
the "-w 40" is seen by FASTLINK, while the "-f machines" is seen by TreadMarks.

Running TreadMarks FASTLINK, Specifying number of processors

One small modification is needed either in the command line (if you call ilink, linkmap, or mlink directly) or in the lcp-produced shell script. At the line where the main program is invoked, the string
  -- -f machines 
is appended.

E.g.

  linkmap
becomes
  linkmap -- -f machines
When modifying lcp-produced scripts be careful that the first occurrence of the string ilink, linkmap, or mlink is a parameter to lsp. It is the second occurrence which is actually the call to the program.

In the data directory you need to have a file called

   machines
that specifies which machines should be used. The format is one machine name per line. This is the only thing you need to change if you want to change which machines the program runs on.

Running TreadMarks FASTLINK, Specifying maxworkingset

You may also specify a value for the variable maxworkingset, which represents the maximum number of people active during the analysis (sometimes known as maximum cutset). In FASTLINK 3.0P, maxworkingset is estimated automatically at runtime. In some pedigrees with loops, the estimate is unnecessarily high, so you may wish to override the estimate with a different value.

If you use the automatic estimate of maxworkingset, and the code complains that this estimate is too low, you have hit a bug and should report it (see README.bugreport). However, you can still use the -w flag to work around the bug, while I fix it.

For example, to run ILINK with maxworkingset defined to 40, you would type:

  ilink -w 40 -- -f machines
The error message you would encounter if maxworkingset is too low will report what the current value is. You may try incrementally larger values until the run succeeds.

Running TreadMarks FASTLINK, Analyzing memory usage

Depending on your system configuration, you may not be able to complete arbitrarily large (in terms of memory usage) FASTLINK runs. As described in README.makefile , there are different levels of memory usage for FASTLINK.

The -m option for parallel FASTLINK provides a simple way to determine shared memory requirements for a specific run. When you run with this option, FASTLINK will do some brief i/o and computation, and then exit (before starting the actual linkage analysis) with a diagnostic message. Because TreadMarks itself is never actually started up for this kind of run, you must also specify the number of processors you plan to run on, so FASTLINK can properly calculate memory requirements. You can specify number of processors with

  -n 
where is the number of processors. Also note that in this case, the `--' is not necessary.

For example, to calculate memory usage for a LINKMAP run on 2 processors, you would type:

  linkmap -n 2 -m
A sample run of LINKMAP with this command yielded:
  LINKMAP is currently compiled with PRECOMPUTE=1.
  This run will require at least:

          7004071 bytes of shared memory on 2 processor(s)
          6421943 bytes of shared memory on 1 processor(s)

  Recompiling with PRECOMPUTE=0 would yield:

          6733607 bytes of shared memory on 2 processor(s)
          6094621 bytes of shared memory on 1 processor(s)

  Please refer to the 
README.makefile  and
 README.TreadMarks for details.
You can see from this message the difference in memory usage between compiling with PRECOMPUTE=0 and PRECOMPUTE=1. This can be useful when determining whether or not you can expect to compute a given run on your system.

As with the -f flag above, -w, -n, and -m would need to be added to your lcp-produced script if you use them in conjunction with a script.

Parallel FASTLINK, Running times

Due to the way theta evaluations are done in parallel, unlike the sequential version, output will not appear on the screen after each theta is complete. However, you will see periodic reports of execution times for each group of thetas as they are evaluated. A sample run might show:
  Execution time (!parallelThetas) =   0.099
  Execution time (parallelThetas) for 1 =   0.052
  Execution time (parallelThetas) for 2 =   0.896
  Execution time (parallelThetas) for 3 =   0.096
  Execution time (!parallelThetas) =   0.086
  Elapsed time: 1.28 seconds
Each of the "Execution time" statements will appear one at a time as computation progresses. The difference between parallelThetas and !parallelThetas has to do with whether all processors are working together on a single theta, or whether they are working independently on different thetas. The "Elapsed time" statement shows the total execution time for the entire run.
back to fastlink